Regular Expressions

Mmcolli00_Mom · 1 December 2008 19:08

Hi everyone.

Just a question about regualar expression in Ruby. Is there a way to
check in each line of a document for always beginning with "23430000"
and somewhere on that same line another string 'CodeRed'?

line1: 23430000 @#$#$3455000CodeRed 24AAWERE 740000000

This is what I have so far when I import each textdata from another
file.

textdata.should =~ /23430000/ |CodeRed/

...the pipe was supposed to determine if CodeRed exists somewhere after
the identity code 23430000. However, it doesn't work like this. Also for
my code I am not using OR logic with it.

···

--
Posted via http://www.ruby-forum.com/.

Yaser_Sulaiman · 1 December 2008 19:34

I'm not sure if this is what you are asking for, but /23430000.*CodeRed/
will match a line that contains '23430000', followed by zero or more
characters, and then 'CodeRed'.

try in IRB:

irb(main):001:0> line1 = '23430000 @#$#$3455000CodeRed 24AAWERE 740000000'
=> "23430000 @\#$\#$3455000CodeRed 24AAWERE 740000000"
irb(main):002:0> line1 =~ /23430000.*CodeRed/
=> 0
irb(main):003:0> line2 = '23430000 @#$#$3455000CodeGreen 24AAWERE 740000000'
=> "23430000 @\#$\#$3455000CodeGreen 24AAWERE 740000000"
irb(main):004:0> line2 =~ /23430000.*CodeRed/
=> nil

Regards,
Yaser Sulaiman

···

On Mon, Dec 1, 2008 at 10:08 PM, Mmcolli00 Mom <mmc_collins@yahoo.com>wrote:

Hi everyone.

Just a question about regualar expression in Ruby. Is there a way to
check in each line of a document for always beginning with "23430000"
and somewhere on that same line another string 'CodeRed'?

line1: 23430000 @#$#$3455000CodeRed 24AAWERE 740000000

This is what I have so far when I import each textdata from another
file.

textdata.should =~ /23430000/ |CodeRed/

...the pipe was supposed to determine if CodeRed exists somewhere after
the identity code 23430000. However, it doesn't work like this. Also for
my code I am not using OR logic with it.
--
Posted via http://www.ruby-forum.com/\.

James_Edward_Gray_II · 1 December 2008 19:37

Hi everyone.

Hello.

Just a question about regualar expression in Ruby. Is there a way to
check in each line of a document for always beginning with "23430000"
and somewhere on that same line another string 'CodeRed'?

Sure.

line1: 23430000 @#$#$3455000CodeRed 24AAWERE 740000000

This is what I have so far when I import each textdata from another
file.

textdata.should =~ /23430000/ |CodeRed/

textdata.should =~ /\A23430000.*CodeRed/

My changes are simple:

* \A is a regex atom the only matches at the beginning of the input. I used this to make sure 23430000 is at the beginning of the line and not later.
* .* matches zero or more of pretty much anything. Newlines are the only character excluded. Thus this allows anything to appear between the number and CodeRed.

Now the regex above really just checks one line. If you want to check all lines, you'll want something like:

   textdata.each do |line|
     line.should =~ /\A23430000.*CodeRed/
   end

Hope that helps.

James Edward Gray II

···

On Dec 1, 2008, at 1:08 PM, Mmcolli00 Mom wrote:

Kyle_Schmitt · 1 December 2008 19:50

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

···

On Mon, Dec 1, 2008 at 1:08 PM, Mmcolli00 Mom <mmc_collins@yahoo.com> wrote:

Hi everyone.

Just a question about regualar expression in Ruby. Is there a way to
check in each line of a document for always beginning with "23430000"
and somewhere on that same line another string 'CodeRed'?

line1: 23430000 @#$#$3455000CodeRed 24AAWERE 740000000

This is what I have so far when I import each textdata from another
file.

textdata.should =~ /23430000/ |CodeRed/

...the pipe was supposed to determine if CodeRed exists somewhere after
the identity code 23430000. However, it doesn't work like this. Also for
my code I am not using OR logic with it.
--
Posted via http://www.ruby-forum.com/\.

Richard_Conroy1 · 4 December 2008 00:00

I just simply have to pimp http://www.rubular.com/

I don't think that I have closed that browser tab containing it in weeks.
Its killer feature is the ability to supply your own test data.

/^23430000.*CodeRed.*$/

···

On Mon, Dec 1, 2008 at 7:08 PM, Mmcolli00 Mom <mmc_collins@yahoo.com> wrote:

Hi everyone.

Just a question about regualar expression in Ruby. Is there a way to
check in each line of a document for always beginning with "23430000"
and somewhere on that same line another string 'CodeRed'?

line1: 23430000 @#$#$3455000CodeRed 24AAWERE 740000000

This is what I have so far when I import each textdata from another
file.

textdata.should =~ /23430000/ |CodeRed/

...the pipe was supposed to determine if CodeRed exists somewhere after
the identity code 23430000. However, it doesn't work like this. Also for
my code I am not using OR logic with it.

Kyle_Schmitt · 1 December 2008 19:52

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex regexes

···

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt@gmail.com> wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Mmcolli00_Mom · 1 December 2008 20:18

thanks everyone!

···

--
Posted via http://www.ruby-forum.com/.

Yaser_Sulaiman · 1 December 2008 20:13

Still, I maintain that this is a way of staying sane with complex regexes

It certainly looks helpful. I didn't know about it before. Thanks for
sharing

Regards,
Yaser

···

On Mon, Dec 1, 2008 at 10:52 PM, Kyle Schmitt <kyleaschmitt@gmail.com> wrote:

Mmcolli00_Mom · 1 December 2008 20:17

Kyle Schmitt wrote:

···

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt@gmail.com> > wrote:

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex
regexes

Alright. thanks for the tip. I was just thinking...what is the
regexpression for starts with anyway? I don't know..I figured maybe I
could use union with the starts with expression and then just grab that
value.
--
Posted via http://www.ruby-forum.com/\.

Joe_Wolfel · 1 December 2008 21:32

Interesting that there is a union function but no intersection function.

···

On 1 déc. 08, at 14:52, Kyle Schmitt wrote:

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt > <kyleaschmitt@gmail.com> wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex regexes

Rob_Biedenharn1 · 1 December 2008 22:08

How would you even define a regexp (re) that matched only when both of two other regexps (re1, re2) matched?

     class Regexp
       def self.intersection(re1,re2)
         union(compile(/(?>#{re1}).*#{re2}/),
               compile(/(?>#{re2}).*#{re1}/))
       end
     end

re = Regexp.intersection(re1,re2)

What would you expect the value to be? And while Regexp.union is well-behaved for multiple arguments, the expansion for more arguments in the intersection gets ugly fast.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Dec 1, 2008, at 4:32 PM, Joe Wölfel wrote:

On 1 déc. 08, at 14:52, Kyle Schmitt wrote:

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt >> <kyleaschmitt@gmail.com> wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex regexes

Interesting that there is a union function but no intersection function.

Sebastian_Hungereck1 · 1 December 2008 22:08

Joe Wölfel wrote:

Interesting that there is a union function but no intersection function [for

regexen].

Well, the union of two regexen /foo/ and /bar/ is simply /foo|bar/, so the
union method is rather easily implemented. An intersection method would be
somewhat more complex. Of course that's not really a reason not to implement
it, but it might be the reason why it's not implemented yet.

HTH,
Sebastian

···

--
Jabber: sepp2k@jabber.org
ICQ: 205544826

Joe_Wolfel · 1 December 2008 22:41

Not sure I understand. Are you arguing that an intersection cannot exist as a regular expression or merely that it is hard?

···

On 1 déc. 08, at 17:08, Rob Biedenharn wrote:

On Dec 1, 2008, at 4:32 PM, Joe Wölfel wrote:

On 1 déc. 08, at 14:52, Kyle Schmitt wrote:

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt >>> <kyleaschmitt@gmail.com> wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex regexes

Interesting that there is a union function but no intersection function.

How would you even define a regexp (re) that matched only when both of two other regexps (re1, re2) matched?

    class Regexp
      def self.intersection(re1,re2)
        union(compile(/(?>#{re1}).*#{re2}/),
              compile(/(?>#{re2}).*#{re1}/))
      end
    end

    re = Regexp.intersection(re1,re2)

What would you expect the value to be? And while Regexp.union is well-behaved for multiple arguments, the expansion for more arguments in the intersection gets ugly fast.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Ken_Bloom · 2 December 2008 01:34

It's very well defined if you're talking about an underlying
deterministic finite automaton. That's an simple proof that is usually
assigned as an exercise for the student in a computer science theory
course.

If re1 compiles to a DFA dfa1=(S1,s1,A1,f1)
where S1 is the set of all states, s1 is the start state, A1 is the set
of accepting states, and f1(s'1,input) is the transition function
and re2 compiles to a DFA dfa2=(S2,s2,A2,f2)

Then the intersection of these two languages can be recognized by the DFA
dfa3=(S1 x S2, (s1,s2), A1 x A2, f3)
where x means the cartesian product, and
f3((s'1,s'2),input)=(f1(s'1,input),f2(s'2,input))

Now, how you'd turn that back into a regexp is not so easy...
(but still doable)

--Ken

···

On Mon, 01 Dec 2008 17:08:16 -0500, Rob Biedenharn wrote:

On Dec 1, 2008, at 4:32 PM, Joe Wölfel wrote:

On 1 déc. 08, at 14:52, Kyle Schmitt wrote:

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt@gmail.com> >>> wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like.. startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at
in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex
regexes

Interesting that there is a union function but no intersection
function.

How would you even define a regexp (re) that matched only when both of
two other regexps (re1, re2) matched?

     class Regexp
       def self.intersection(re1,re2)
         union(compile(/(?>#{re1}).*#{re2}/),
               compile(/(?>#{re2}).*#{re1}/))
       end
     end

     re = Regexp.intersection(re1,re2)

What would you expect the value to be? And while Regexp.union is well-
behaved for multiple arguments, the expansion for more arguments in the
intersection gets ugly fast.

--
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/

Rob_Biedenharn1 · 1 December 2008 22:58

That it becomes combinatorially hard to construct such a regexp in general. If I want a regexp that matches the intersection of /a/ and /b/ and /c/ (i.e., contains each of 'a', 'b', and 'c'), I have to account for all the permutations (manually):
/a.*b.*c/
/a.*c.*b/
/b.*a.*c/
/b.*c.*a/
/c.*a.*b/
/c.*b.*a/

Or combined as: /(?:a.*(?:b.*c)|(?:c.*b))|(?:b.*(?:a.*c)|(?:c.*a))|(?:c.*(?:b.*a)|(?:a.*b))/

That's nasty and so much worse than the union /[abc]/ or /a|b|c/ even for this relatively simple case. It would be better to do this at the application level if you can't guarantee order:

[/a/, /b/, /c/].all? {|re| mystring =~ re }

And then the value of the match can be whatever the application wants to track.

-Rob

···

On Dec 1, 2008, at 5:41 PM, Joe Wölfel wrote:

On 1 déc. 08, at 17:08, Rob Biedenharn wrote:

On Dec 1, 2008, at 4:32 PM, Joe Wölfel wrote:

On 1 déc. 08, at 14:52, Kyle Schmitt wrote:

On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt@gmail.com >>>> > wrote:

I just wanted to mention another way of combining regexes that may
help you stay sane: union.

#You write each regex nice and simple like..
startswith=/~23430000/
codered=/CodeRed/

#Then combine them to a complex one
combined_regex=Regexp.union(startswith,codered)

When you've got to build up some large regular expressions, this can
be a godsend, especially when revisiting code you haven't looked at in
awhile.

--Kyle

Scratch that, not thinking clearly! This is to match startswith OR
codered, not necessarily both.

Still, I maintain that this is a way of staying sane with complex regexes

Interesting that there is a union function but no intersection function.

How would you even define a regexp (re) that matched only when both of two other regexps (re1, re2) matched?

   class Regexp
     def self.intersection(re1,re2)
       union(compile(/(?>#{re1}).*#{re2}/),
             compile(/(?>#{re2}).*#{re1}/))
     end
   end

   re = Regexp.intersection(re1,re2)

What would you expect the value to be? And while Regexp.union is well-behaved for multiple arguments, the expansion for more arguments in the intersection gets ugly fast.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Not sure I understand. Are you arguing that an intersection cannot exist as a regular expression or merely that it is hard?

Topic		Replies	Views
Merging regular expressions ruby-talk	10	113	6 February 2006
regex ruby-talk	12	507	20 June 2020
Regular Expression Intersection ruby-talk	4	78	12 March 2006
Regexp operators ruby-talk	6	113	31 May 2003
Grep and regular expressions in ruby ruby-talk	14	142	24 April 2003

Regular Expressions

Related topics