Regular expression to find a break in a pattern

7stud2 · 23 May 2013 16:13

I have a large file which lots of gibberish in and I'm trying to find
the meaningful sections.

Essentially I'll have something like this:

···

________________

To: "1313131"
From: "1313131"
random data lines

To: "1313132"
From: "1313132"
random data lines

To: "1313133"
From: "1313132"
random data lines

To: "1313134"
From: "1313134"
random data lines

________________

What I need to do is locate the line(s) where From is different from To.
In this case, the one From "1313132" To "1313133".

I don't know how to do this kind of match, but I assume that Ruby has a
way?

--
Posted via http://www.ruby-forum.com/.

7stud2 · 23 May 2013 17:33

regex capturing with assignment. Capture your string or number in the
From field using parenthesis to capture and assign: /(\d+)/
...then compare the assigned value, which is $1, with the next string

Sorry to do this to your all, but this is a quick Perl example I just
whipped up, but is easy to convert to Ruby:

perl -le '$x = "12345"; $y = "123465";
if ( $x =~ /(\d+)/ ) {
     if ( $y == $1 ) {
         print "yep"
     }
     else {
         print "nope!"
     }
}
'

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 23 May 2013 18:58

I use Rubular a lot, it's great!

Thanks for the ideas. I've decided to loop through the file using 2
variables, similarly to Derrick's suggestion.

Nice trick with "\1" Chris, I haven't tried using that inside the same
expression before. I'll see whether I can use that in this instance.

I was wondering whether Ruby's Regexp had this kind of option built in,
but I guess this scenario is more on the conditional side of
programming.

···

--
Posted via http://www.ruby-forum.com/.

Josh_Cheek · 24 May 2013 04:38

I have a large file which lots of gibberish in and I'm trying to find
the meaningful sections.

Essentially I'll have something like this:

________________

To: "1313131"
From: "1313131"
random data lines

To: "1313132"
From: "1313132"
random data lines

To: "1313133"
From: "1313132"
random data lines

To: "1313134"
From: "1313134"
random data lines

________________

What I need to do is locate the line(s) where From is different from To.
In this case, the one From "1313132" To "1313133".

I don't know how to do this kind of match, but I assume that Ruby has a
way?

--
Posted via http://www.ruby-forum.com/\.

Here is a regex that works for your example data.

text = '
random data lines

random data lines

random data lines
'

regex = /To: "(.*?)"\nFrom: "(?!\1)(.*?)"$/

text.scan(regex) # => [["1313133", "1313132"], ["abc", "def"]]

···

On Thu, May 23, 2013 at 11:13 AM, Joel Pearson <lists@ruby-forum.com> wrote:
To: "1313131"
From: "1313131"
To: "1313132"
From: "1313132"
To: "1313133"
From: "1313132"
To: "1313134"
From: "1313134"
To: "abc"
From: "def"

7stud2 · 24 May 2013 10:43

Excellent! I tried negatives using (?!\1) before but I couldn't get them
to work. Thanks for the help.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 26 May 2013 18:06

A group within a group, and scan with a block? I had no idea!
Ruby, you continually delight me

···

--
Posted via http://www.ruby-forum.com/.

Chris6 · 23 May 2013 18:36

Here's a regex that captures all the cases where To matches From:

Can't find an easy switch to find the mismatch as you need, but maybe it'll
provide
a starting point.
Plus Rubular is a great resource for exploring regex

cheers

···

On Thu, May 23, 2013 at 1:33 PM, Derrick B. <lists@ruby-forum.com> wrote:

regex capturing with assignment. Capture your string or number in the
From field using parenthesis to capture and assign: /(\d+)/
...then compare the assigned value, which is $1, with the next string

Sorry to do this to your all, but this is a quick Perl example I just
whipped up, but is easy to convert to Ruby:

perl -le '$x = "12345"; $y = "123465";
if ( $x =~ /(\d+)/ ) {
     if ( $y == $1 ) {
         print "yep"
     }
     else {
         print "nope!"
     }
}
'

--
Posted via http://www.ruby-forum.com/\.

Stu1 · 23 May 2013 22:53

You can do you regex test against both contexts ^To: and ^From: and
use post_match to reveal the contents after:

http://ruby-doc.org/core-2.0/MatchData.html#method-i-post_match

~Stu

Robert_K1 · 26 May 2013 17:18

Excellent! I tried negatives using (?!\1) before but I couldn't get them
to work. Thanks for the help.

You can even get the whole line content if you like

irb(main):053:0> s.scan %r{(To:\s+("\d+")\s*$\s*From:\s+(?!\2).*?(?=To))}m
=> [["To: \"1313133\"\nFrom: \"1313132\"\nrandom data lines\n\n",
"\"1313133\""]]
irb(main):054:0>
s.scan(%r{(To:\s+("\d+")\s*$\s*From:\s+(?!\2).*?(?=To))}m).map(&:first)
=> ["To: \"1313133\"\nFrom: \"1313132\"\nrandom data lines\n\n"]
irb(main):055:0> puts
s.scan(%r{(To:\s+("\d+")\s*$\s*From:\s+(?!\2).*?(?=To))}m).map(&:first)
random data lines

Or with a block:

irb(main):057:0> s.scan %r{(To:\s+("\d+")\s*$\s*From:\s+(?!\2).*?(?=To))}m
do puts $1 end;nil
random data lines

Kind regards

robert

···

On Fri, May 24, 2013 at 12:43 PM, Joel Pearson <lists@ruby-forum.com> wrote:
To: "1313133"
From: "1313132"
To: "1313133"
From: "1313132"

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert_K1 · 26 May 2013 18:10

A group within a group,

This is regular regular expression functionality: I don't know a single
regexp engines with support for groups which can't do that.

and scan with a block? I had no idea!

That is a fairly old feature of the standard lib - even in 1.8.6 - and so
important when scanning large volumes of text.

Ruby, you continually delight me

Good!

For spec about the regexp language I find this site pretty useful
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Kind regards

robert

···

On Sun, May 26, 2013 at 8:06 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Ruby regexp Match ruby-talk	4	51	4 December 2007
Simple regexp question ruby-talk	0	49	26 October 2005
Regexp issue on parsing from file ruby-talk	10	113	15 August 2009
Perl to Ruby: regex captures to assignment ruby-talk	36	198	25 December 2012
Regexp problem - differences in Perl and Ruby ruby-talk	3	91	10 January 2006

Regular expression to find a break in a pattern

Related Topics