Regular Expressions and Ruby

George_Lunsford · 21 November 2005 16:38

Hi, I'm new to the list and I hope this is the right place to ask the
question, but I recently started learning ruby and I was writing a short
little program to test if a word was made up of symbols from the periodic
table. I figured I'd use ruby's built-in regex support, so I ended up
writing a regex that looks something like this:
(element1|element2|element3...|elementx)+
Basically it's just got all 115 or so of the elements in there, and it's
testing for one or more, right? Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes. Thanks. The
full expression is below.

(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

Christian_Neukirche1 · 21 November 2005 16:46

George Lunsford <george.lunsford@gmail.com> writes:

Hi, I'm new to the list and I hope this is the right place to ask the
question, but I recently started learning ruby and I was writing a short
little program to test if a word was made up of symbols from the periodic
table. I figured I'd use ruby's built-in regex support, so I ended up
writing a regex that looks something like this:
(element1|element2|element3...|elementx)+
Basically it's just got all 115 or so of the elements in there, and it's
testing for one or more, right? Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes. Thanks. The
full expression is below.

(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

Are you matching case insensitive (/i)?

By the way, an optimized version of that regexp is

(Thanks to emacs's regexp-opt)

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Dirk_Meijer · 21 November 2005 16:48

hi,
i'm not entirely sure, but i'd say it's because S appears before Se (in your
regexp), so you'd have to start with all elements that have two letters.. i
think that will work, not sure though..
greetings, Dirk.

···

2005/11/21, George Lunsford <george.lunsford@gmail.com>:

Hi, I'm new to the list and I hope this is the right place to ask the
question, but I recently started learning ruby and I was writing a short
little program to test if a word was made up of symbols from the periodic
table. I figured I'd use ruby's built-in regex support, so I ended up
writing a regex that looks something like this:
(element1|element2|element3...|elementx)+
Basically it's just got all 115 or so of the elements in there, and it's
testing for one or more, right? Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that
it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes. Thanks.
The
full expression is below.

(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

Dave_Burt2 · 21 November 2005 16:57

George Lunsford <george.lunsford@gmail.com> writes:

Hi, I'm new to the list and I hope this is the right place to ask the
question, but I recently started learning ruby and I was writing a short
little program to test if a word was made up of symbols from the periodic
table. I figured I'd use ruby's built-in regex support, so I ended up
writing a regex that looks something like this:
(element1|element2|element3...|elementx)+
Basically it's just got all 115 or so of the elements in there, and it's
testing for one or more, right? Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that
it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes. Thanks.
The
full expression is below.

(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

You'll note if you match against /^(He|...|Uaal)$/i it matches the full
string "presenti". Read perlretut for more info on how regexps work.
Basically, it is greedy, but it won't backtrack unless it has failed to find
a match. With your example, the S matches, then the following characters
don't match anything, but it has already succeeded, so it doesn't bother
trying Se.

http://www.cs.rit.edu/~afb/20013/plc/perl5/doc/perlretut.html#grouping
things and hierarchical matching

Cheers,
Dave

Jeff_Wood · 21 November 2005 16:59

well, not that it's perfect, but I would do something like:

elements = %w( He LI Be C O Ne Na Mg Al Si S Cl Ar Ca Sc Ti Cr Mn Fe Co Ni
Cu Zn Ga Ge As Se Br Kr Rb Sr Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te Xe Cs
Ba Hf Ta Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Rf Db Sg Bh Hs Mt Uun Uuu
Uub Uuq Uuh Uuo Ds La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Ac Th Pa Np
Pu Am Cm Bk Cf Es Fm Md No Lr H B N F P V Y I W Uaal )

class Array
def to_re( regex_modifier )
eval "/#{ self.join "|" }/#{ regex_modifier }"
end
end

"present".scan elements.to_re( "i" ) # the "i" means case insensitive
#=> [ "pr", "es", "n" ]

... hope that helps. I'm sure there are more efficient regex patterns for
this, but I couldn't imagine anything that was easier to write.

j.

···

On 11/21/05, George Lunsford <george.lunsford@gmail.com> wrote:

Hi, I'm new to the list and I hope this is the right place to ask the
question, but I recently started learning ruby and I was writing a short
little program to test if a word was made up of symbols from the periodic
table. I figured I'd use ruby's built-in regex support, so I ended up
writing a regex that looks something like this:
(element1|element2|element3...|elementx)+
Basically it's just got all 115 or so of the elements in there, and it's
testing for one or more, right? Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that
it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes. Thanks.
The
full expression is below.

(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

gregarican · 21 November 2005 17:03

George Lunsford wrote:

I'm wondering if anyone can explain this
behavior to me, as I'm new to both regexps and ruby and the only regex
experience I've had previous to this is in some theory classes.

Here is a valuable reference for this area -->
http://www.oreilly.com/catalog/regex/\. This book is a fixture on my
bookshelf for sure...

Hugh_Sasse · 21 November 2005 17:08

[...]

little program to test if a word was made up of symbols from the periodic
table. [...]
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough. The problem seems to be that it
parses [p][re][s] instead of [se]. I'm wondering if anyone can explain this

Try putting a $ on the end. The regexp won't succeed the first time
because when it gets to sulphur that's enough to satisfy + as you
found. To get selenium and the rest it must backtrack to get to the
end.

Hugh

···

On Tue, 22 Nov 2005, George Lunsford wrote:

Bill_Kelly · 21 November 2005 17:19

Unfortunately this doesn't seem to work
properly as a (nonsense) word like "presenti" should be periodic according
to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
behavior with the regex coach oddly enough.

Hi, welcome to Ruby!

Not sure if this helps, but here's what I get:

elems = "He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|\
As>Se>Br>Kr>Rb>Sr>Zr>Nb>Mo>Tc>Ru>Rh>Pd>Ag>Cd>In>Sn>Sb>Te>Xe>Cs>Ba>Hf>Ta>Re>Os>Ir>\
Pt>Au>Hg>Tl>Pb>Bi>Po>At>Rn>Fr>Ra>Rf>Db>Sg>Bh>Hs>Mt>Uun>Uuu>Uub>Uuq>Uuh>Uuo>Ds>La>\
Ce>Pr>Nd>Pm>Sm>Eu>Gd>Tb>Dy>Ho>Er>Tm>Yb>Lu>Ac>Th>Pa>Np>Pu>Am>Cm>Bk>Cf>Es>Fm>Md>No>\
Lr>H>B>N>F>P>V>Y>I>W>Uaal"

# using \b to anchor at word boundaries
"presenti" =~ /\b(?:#{elems})+\b/i
0 # matches whole word

# using scan to see which elements are being matched
# (note that this approach would inappropriately skip
# over letters at the beginning of the word that weren't
# elements... but any match that it does find will
# be contiguous elements from that point to the end of
# the word... however one could perform the above check
# first to make sure the whole word matched, and then
# use the scan to obtain the individual elements)

"presenti".scan(/(?:#{elems})(?=(?:#{elems})*\b)/i)
["p", "re", "se", "n", "ti"]

"prese".scan(/(?:#{elems})(?=(?:#{elems})*\b)/i)
["p", "re", "se"]

Regards,

Bill

···

From: "George Lunsford" <george.lunsford@gmail.com>

George_Lunsford · 21 November 2005 16:53

I am matching +i, sorry for leaving that out. I copied the expression from
the regex coach, which has that handy checkbox. Thanks for the optimized
info for the future. Ruby/Regex coach doesn't seem to like the optimized
version though either...

···

On 11/21/05, Christian Neukirchen <chneukirchen@gmail.com> wrote:

George Lunsford <george.lunsford@gmail.com> writes:

> Hi, I'm new to the list and I hope this is the right place to ask the
> question, but I recently started learning ruby and I was writing a short
> little program to test if a word was made up of symbols from the
periodic
> table. I figured I'd use ruby's built-in regex support, so I ended up
> writing a regex that looks something like this:
> (element1|element2|element3...|elementx)+
> Basically it's just got all 115 or so of the elements in there, and it's
> testing for one or more, right? Unfortunately this doesn't seem to work
> properly as a (nonsense) word like "presenti" should be periodic
according
> to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
> behavior with the regex coach oddly enough. The problem seems to be that
it
> parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
this
> behavior to me, as I'm new to both regexps and ruby and the only regex
> experience I've had previous to this is in some theory classes. Thanks.
The
> full expression is below.
>
>
(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

Are you matching case insensitive (/i)?

By the way, an optimized version of that regexp is

/(A[cglmr-u]|B[aehikr]|C[adeflmorsu]|D[bsy]|E[rsu]|F[emr]|G[ade]|H[efgos]|I[nr]|Kr|L[airu]|M[dgnot]|N[abdeiop]|Os|P[abdmortu]|R[abefhnu]|S[bcegimnr]|T[abcehilm]|U(?:aal|u[bhnoqu])|Xe|Yb|Z[nr]|[BCFHINOPSVWY])+/

(Thanks to emacs's regexp-opt)
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

George_Lunsford · 21 November 2005 16:56

I've thought of that too, but I thought that in theory, order shouldn't
matter. Now, I know there are differences in theory and practicality, but
even if I match all of the two-letter expressions first, what about a string
like agau, where maybe it will match ga first, and leave A and U out? And
even if I order alphabetically, there may be a case where it matches
something else out-of-order in another word, etc...

···

On 11/21/05, Dirk Meijer <hawkman.gelooft@gmail.com> wrote:

hi,
i'm not entirely sure, but i'd say it's because S appears before Se (in
your
regexp), so you'd have to start with all elements that have two letters..
i
think that will work, not sure though..
greetings, Dirk.

2005/11/21, George Lunsford <george.lunsford@gmail.com>:
>
> Hi, I'm new to the list and I hope this is the right place to ask the
> question, but I recently started learning ruby and I was writing a short
> little program to test if a word was made up of symbols from the
periodic
> table. I figured I'd use ruby's built-in regex support, so I ended up
> writing a regex that looks something like this:
> (element1|element2|element3...|elementx)+
> Basically it's just got all 115 or so of the elements in there, and it's
> testing for one or more, right? Unfortunately this doesn't seem to work
> properly as a (nonsense) word like "presenti" should be periodic
according
> to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
> behavior with the regex coach oddly enough. The problem seems to be that
> it
> parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
> this
> behavior to me, as I'm new to both regexps and ruby and the only regex
> experience I've had previous to this is in some theory classes. Thanks.
> The
> full expression is below.
>
>
>
(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+
>
>

Jon_Lim · 21 November 2005 16:56

It is matching [pr] and [es]

···

On 21/11/05, Dirk Meijer <hawkman.gelooft@gmail.com> wrote:

hi,
i'm not entirely sure, but i'd say it's because S appears before Se (in your
regexp), so you'd have to start with all elements that have two letters.. i
think that will work, not sure though..
greetings, Dirk.

2005/11/21, George Lunsford <george.lunsford@gmail.com>:
>
> Hi, I'm new to the list and I hope this is the right place to ask the
> question, but I recently started learning ruby and I was writing a short
> little program to test if a word was made up of symbols from the periodic
> table. I figured I'd use ruby's built-in regex support, so I ended up
> writing a regex that looks something like this:
> (element1|element2|element3...|elementx)+
> Basically it's just got all 115 or so of the elements in there, and it's
> testing for one or more, right? Unfortunately this doesn't seem to work
> properly as a (nonsense) word like "presenti" should be periodic according
> to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
> behavior with the regex coach oddly enough. The problem seems to be that
> it
> parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
> this
> behavior to me, as I'm new to both regexps and ruby and the only regex
> experience I've had previous to this is in some theory classes. Thanks.
> The
> full expression is below.
>
>
> (He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+
>
>

--

George_Lunsford · 21 November 2005 17:08

That does seem to do it. Thanks for the help everybody, guess I've got some
reading to do about regexps.

···

On 11/21/05, Dave Burt <dave@burt.id.au> wrote:

George Lunsford <george.lunsford@gmail.com> writes:

> Hi, I'm new to the list and I hope this is the right place to ask the
> question, but I recently started learning ruby and I was writing a short
> little program to test if a word was made up of symbols from the
periodic
> table. I figured I'd use ruby's built-in regex support, so I ended up
> writing a regex that looks something like this:
> (element1|element2|element3...|elementx)+
> Basically it's just got all 115 or so of the elements in there, and it's
> testing for one or more, right? Unfortunately this doesn't seem to work
> properly as a (nonsense) word like "presenti" should be periodic
according
> to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
> behavior with the regex coach oddly enough. The problem seems to be that
> it
> parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
> this
> behavior to me, as I'm new to both regexps and ruby and the only regex
> experience I've had previous to this is in some theory classes. Thanks.
> The
> full expression is below.
>
>
(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+

You'll note if you match against /^(He|...|Uaal)$/i it matches the full
string "presenti". Read perlretut for more info on how regexps work.
Basically, it is greedy, but it won't backtrack unless it has failed to
find
a match. With your example, the S matches, then the following characters
don't match anything, but it has already succeeded, so it doesn't bother
trying Se.

perlretut - Perl regular expressions tutorial
things and hierarchical matching

Cheers,
Dave

Christian_Neukirche1 · 21 November 2005 17:48

Jeff Wood <jeff.darklight@gmail.com> writes:

well, not that it's perfect, but I would do something like:

elements = %w( He LI Be C O Ne Na Mg Al Si S Cl Ar Ca Sc Ti Cr Mn Fe Co Ni
Cu Zn Ga Ge As Se Br Kr Rb Sr Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te Xe Cs
Ba Hf Ta Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Rf Db Sg Bh Hs Mt Uun Uuu
Uub Uuq Uuh Uuo Ds La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Ac Th Pa Np
Pu Am Cm Bk Cf Es Fm Md No Lr H B N F P V Y I W Uaal )

class Array
def to_re( regex_modifier )
eval "/#{ self.join "|" }/#{ regex_modifier }"
end
end

"present".scan elements.to_re( "i" ) # the "i" means case insensitive
#=> [ "pr", "es", "n" ]

... hope that helps. I'm sure there are more efficient regex patterns for
this, but I couldn't imagine anything that was easier to write.

Regexp.union exists.

···

Jeff Wood

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

George_Lunsford · 21 November 2005 17:00

Right, but technically, say for "prese" that string is periodic, and should
be matched as [p][re][se] but it's not getting that last e.

···

On 11/21/05, Jon Lim <snowblink@gmail.com> wrote:

It is matching [pr] and [es]

On 21/11/05, Dirk Meijer <hawkman.gelooft@gmail.com> wrote:
> hi,
> i'm not entirely sure, but i'd say it's because S appears before Se (in
your
> regexp), so you'd have to start with all elements that have two
letters.. i
> think that will work, not sure though..
> greetings, Dirk.
>
> 2005/11/21, George Lunsford <george.lunsford@gmail.com>:
> >
> > Hi, I'm new to the list and I hope this is the right place to ask the
> > question, but I recently started learning ruby and I was writing a
short
> > little program to test if a word was made up of symbols from the
periodic
> > table. I figured I'd use ruby's built-in regex support, so I ended up
> > writing a regex that looks something like this:
> > (element1|element2|element3...|elementx)+
> > Basically it's just got all 115 or so of the elements in there, and
it's
> > testing for one or more, right? Unfortunately this doesn't seem to
work
> > properly as a (nonsense) word like "presenti" should be periodic
according
> > to my list of elements, [p][re][se][n][ti]. Now, I can duplicate this
> > behavior with the regex coach oddly enough. The problem seems to be
that
> > it
> > parses [p][re][s] instead of [se]. I'm wondering if anyone can explain
> > this
> > behavior to me, as I'm new to both regexps and ruby and the only regex
> > experience I've had previous to this is in some theory classes.
Thanks.
> > The
> > full expression is below.
> >
> >
> >
(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+
> >
> >
>
>

--
http://www.snowblink.co.uk/

Jeff_Wood · 21 November 2005 19:44

I'll have to go look @ that one ... haven't played with it yet.

j.

···

On 11/21/05, Christian Neukirchen <chneukirchen@gmail.com> wrote:

Jeff Wood <jeff.darklight@gmail.com> writes:

> well, not that it's perfect, but I would do something like:
>
> elements = %w( He LI Be C O Ne Na Mg Al Si S Cl Ar Ca Sc Ti Cr Mn Fe Co
Ni
> Cu Zn Ga Ge As Se Br Kr Rb Sr Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te Xe
Cs
> Ba Hf Ta Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Rf Db Sg Bh Hs Mt Uun
Uuu
> Uub Uuq Uuh Uuo Ds La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Ac Th Pa
Np
> Pu Am Cm Bk Cf Es Fm Md No Lr H B N F P V Y I W Uaal )
>
> class Array
> def to_re( regex_modifier )
> eval "/#{ self.join "|" }/#{ regex_modifier }"
> end
> end
>
> "present".scan elements.to_re( "i" ) # the "i" means case insensitive
> #=> [ "pr", "es", "n" ]
>
> ... hope that helps. I'm sure there are more efficient regex patterns
for
> this, but I couldn't imagine anything that was easier to write.

Regexp.union exists.

> Jeff Wood
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Jeff_Wood · 21 November 2005 17:08

nope, it's already found pr and es, therefore there isn't anything left for
matching the e by itself.

you'd have to have a permutation function to run multiple orders of regex
and see which one matches the most of the input string.

completely do-able, just time consuming...

shouldn't take TOO much time to write ... just gonna take a while to run.

j.

···

On 11/21/05, George Lunsford <george.lunsford@gmail.com> wrote:

Right, but technically, say for "prese" that string is periodic, and
should
be matched as [p][re][se] but it's not getting that last e.

On 11/21/05, Jon Lim <snowblink@gmail.com> wrote:
>
> It is matching [pr] and [es]
>
> On 21/11/05, Dirk Meijer <hawkman.gelooft@gmail.com> wrote:
> > hi,
> > i'm not entirely sure, but i'd say it's because S appears before Se
(in
> your
> > regexp), so you'd have to start with all elements that have two
> letters.. i
> > think that will work, not sure though..
> > greetings, Dirk.
> >
> > 2005/11/21, George Lunsford <george.lunsford@gmail.com>:
> > >
> > > Hi, I'm new to the list and I hope this is the right place to ask
the
> > > question, but I recently started learning ruby and I was writing a
> short
> > > little program to test if a word was made up of symbols from the
> periodic
> > > table. I figured I'd use ruby's built-in regex support, so I ended
up
> > > writing a regex that looks something like this:
> > > (element1|element2|element3...|elementx)+
> > > Basically it's just got all 115 or so of the elements in there, and
> it's
> > > testing for one or more, right? Unfortunately this doesn't seem to
> work
> > > properly as a (nonsense) word like "presenti" should be periodic
> according
> > > to my list of elements, [p][re][se][n][ti]. Now, I can duplicate
this
> > > behavior with the regex coach oddly enough. The problem seems to be
> that
> > > it
> > > parses [p][re][s] instead of [se]. I'm wondering if anyone can
explain
> > > this
> > > behavior to me, as I'm new to both regexps and ruby and the only
regex
> > > experience I've had previous to this is in some theory classes.
> Thanks.
> > > The
> > > full expression is below.
> > >
> > >
> > >
>
(He|Li|Be|C|O|Ne|Na|Mg|Al|Si|S|Cl|Ar|Ca|Sc|Ti|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|Xe|Cs|Ba|Hf|Ta|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Rf|Db|Sg|Bh|Hs|Mt|Uun|Uuu|Uub|Uuq|Uuh|Uuo|Ds|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Ac|Th|Pa|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|H|B|N|F|P|V|Y|I|W|Uaal)+
> > >
> > >
> >
> >
>
>
> --
> http://www.snowblink.co.uk/
>
>

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Jon_Lim · 21 November 2005 17:08

By default, matchines are greedy. Greedy matching means that it will
try to match the maximum number of characters possible. Perhaps you
want non-greedy matching?

/(...)+?/i

···

On 21/11/05, George Lunsford <george.lunsford@gmail.com> wrote:

Right, but technically, say for "prese" that string is periodic, and should
be matched as [p][re][se] but it's not getting that last e.

On 11/21/05, Jon Lim <snowblink@gmail.com> wrote:
>
> It is matching [pr] and [es]

--

Topic		Replies	Views
Bug is ruby regexp ruby-talk	5	95	3 February 2007
Regular expressions ruby-talk	26	145	17 April 2003
Learning Ruby ruby-talk	5	86	1 November 2006
About Regular Expressions ruby-talk	30	118	20 November 2004
Regex help please? ruby-talk	5	99	10 September 2010

Regular Expressions and Ruby

Related topics