Gawwad
My way or Haddaway!
+212|6930|Espoo, Finland
I'm trying to make a regular expression to match lines like this:
02:Aura,Eura,Eurajoki,Harjavalta,Honkajoki, (...) ,Uusikaupunki,Vehmaa,Ypaja
03:Akaa,Artjarvi,Asikkala,Forssa,Hartola, (...) ,Vesilahti,Virrat,Ylojarvi


Python lets me compile the regular expression so that it ignores case so I don't have to worry about that.

Beginning of the regex would be: '[0-9]:' but the repeating city names get tricky.
It would be easy if they were all just single words, but I just can't come up with something
that would work with ones that have a hyphen or space in them.

Help.

Last edited by Gawwad (2010-08-11 09:29:09)

jsnipy
...
+3,277|6768|...

^\d{2,2}\:{1,1}((\w+\,{1,1})|(\w+)){0,}$

was the elispes in parens a valid value as well?

you can lock the string values down more by replacing the slash w+

the 'or' is in there to account for no comma being at the end
Gawwad
My way or Haddaway!
+212|6930|Espoo, Finland
What's elispes? Or misspelled?
Using your example my code seems to go in an infinite loop if there is a name with a hyphen in it
(like Lansi-Turunmaa) and same seems to hapen with a space as well.

This is kind of difficult since I'm pretty sure I can't just split the line at commas and work with that, since the
program is reading from a file which can contain invalid lines. (empty lines, wrongly formatted lines or such)

Here is the code I'm using to test the expression:

Code:

import re

def match(str):
    p = re.compile('^\d{2,2}\:{1,1}((\w+\,{1,1})|(\w+)){0,}$', re.IGNORECASE)
    if p.match(str):
        return True
    else:
        return False
    
def main():
    str = raw_input("enter a string:\n")
    if match(str):
        print "match"
    else:
        print "no match"
        
main()
Examples of properly formatted lines on the file (long)
Each line starts with a number and a colon followed by names of cities separated by commas.


First part was supposed to be "[0-9]+:" in op
jsnipy
...
+3,277|6768|...

Gawwad wrote:

What's elispes? Or misspelled?
sorry "ellipsis". Is the highlight text to be accepted by your expression?

02:Aura,Eura,Eurajoki,Harjavalta,Honkajoki, (...) ,Uusikaupunki,Vehmaa,Ypaja

Does it stack overflow at "p = re.compile('^\d{2,2}\:{1,1}((\w+\,{1,1})|(\w+)){0,}$', re.IGNORECASE)"

Also does your expression need to consider the line breaks or is it only validating a single line?
Gawwad
My way or Haddaway!
+212|6930|Espoo, Finland

jsnipy wrote:

Gawwad wrote:

What's elispes? Or misspelled?
sorry "ellipsis". Is the highlight text to be accepted by your expression?
No, it was just in place of the middle of the list as it's long.

jsnipy wrote:

Does it stack overflow at "p = re.compile('^\d{2,2}\:{1,1}((\w+\,{1,1})|(\w+)){0,}$', re.IGNORECASE)"
I'm not sure. It just keeps running forewer if there is a '-' or a space in the string. I'm guessing it get's stuck at match().

jsnipy wrote:

Also does your expression need to consider the line breaks or is it only validating a single line?
It's always a number followed by a colon and then a list of cities separated by commas.
City names can have a space or hyphen in them.
(dont't think any have more than one though)
012:Name,Another,One WithASpace\n
jsnipy
...
+3,277|6768|...

^\d{1,2}\:{1,1}(((\w{1,1}|\s{1,1}|\-){1,}\,{1,1})|((\w{1,1}|\s{1,1}|\-){1,})){0,}$


The blue bits are what dictates what a name can look like ...
^\d{1,2}\:{1,1}((((\w{1,1}|\s{1,1}|\-){1,}\,{1,1})|((\w{1,1}|\s{1,1}|\-){1,})){0,}$

tested true with ...
"2:Aura,Eura,Eurajoki,Harjavalta,Honkajoki,Uusikaupunki,Vehmaa,Ypaja";
"2:Aura,Eura,Eurajoki,Har javalta,Honkajoki,Uusikaupunki,Vehmaa,Ypaja";
"2:Aura,Eura,Eurajoki,Har-javalta,Honkajoki,Uusikaupunki,Vehmaa,Ypa ja";

Also changed to allow one or two leading digits
Gawwad
My way or Haddaway!
+212|6930|Espoo, Finland
Thanks alot man!
jsnipy
...
+3,277|6768|...

np, good luck

shit will make you go blind

Board footer

Privacy Policy - © 2024 Jeff Minard