Knocking Lines Out Of A Multiline String

Andrew_Stewart · 22 March 2007 14:43

Hello,

What's a (good!) way to remove lines matching a pattern from a multiline string?

For example, I would like to remove lines matching /usr/local/lib from the multiline string:

     /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'
     /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'
     test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

...to give:

test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

I tried matching the pattern-to-remove with gsub and substituting an empty string, but that leaves me with lots of blank lines and not really any nearer to the answer.

Thanks and regards,
Andy Stewart

Robert_K1 · 22 March 2007 15:10

What's a (good!) way to remove lines matching a pattern from a multiline string?

For example, I would like to remove lines matching /usr/local/lib from the multiline string:

    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'
    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

..to give:

    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

I tried matching the pattern-to-remove with gsub and substituting an empty string, but that leaves me with lots of blank lines and not really any nearer to the answer.

Convert it to an array and select like

>> "foo\nbar\n".to_a.select {|l| /^f/ =~ l}
=> ["foo\n"]

Kind regards

robert

···

On 22.03.2007 15:43, Andrew Stewart wrote:

Leslie_Viljoen1 · 22 March 2007 15:15

You could change your lines into an array of lines and then remove the
lines that match:

lines =
File.new("text.txt").read.each_line {|line| lines << line }
lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

···

On 3/22/07, Andrew Stewart <boss@airbladesoftware.com> wrote:

Hello,

What's a (good!) way to remove lines matching a pattern from a
multiline string?

For example, I would like to remove lines matching /usr/local/lib
from the multiline string:

     /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
action_controller/test_process.rb:382:in `process'
     /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
action_controller/test_process.rb:353:in `post'
     test/functional/orders_controller_test.rb:241:in
`test_should_handle_errors_on_edit'

...to give:

     test/functional/orders_controller_test.rb:241:in
`test_should_handle_errors_on_edit'

I tried matching the pattern-to-remove with gsub and substituting an
empty string, but that leaves me with lots of blank lines and not
really any nearer to the answer.

--
If you could create a machine that copies hamburgers — you put one
hamburger in and two equally good hamburgers come out the other side —
it would be unethical not to do so and make it freely available.

Rob_Biedenharn1 · 22 March 2007 15:22

Hello,

What's a (good!) way to remove lines matching a pattern from a multiline string?

For example, I would like to remove lines matching /usr/local/lib from the multiline string:

    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'
    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

...to give:

    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

I tried matching the pattern-to-remove with gsub and substituting an empty string, but that leaves me with lots of blank lines and not really any nearer to the answer.

Thanks and regards,
Andy Stewart

>> input = " /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'
/usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'
test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'
"
=> " /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'\n /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'\n test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'\n"

>> input.gsub(%r{^.*/usr/local/lib/.*\n?},'')
=> " test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'\n"

If you showed your code, an explanation could be added as to your regexp, but the concept certainly works as I've shown.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Mar 22, 2007, at 10:43 AM, Andrew Stewart wrote:

Robert_K1 · 22 March 2007 15:15

What's a (good!) way to remove lines matching a pattern from a multiline string?

For example, I would like to remove lines matching /usr/local/lib from the multiline string:

    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in `process'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in `post'
    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

..to give:

    test/functional/orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'

I tried matching the pattern-to-remove with gsub and substituting an empty string, but that leaves me with lots of blank lines and not really any nearer to the answer.

Convert it to an array and select like

>> "foo\nbar\n".to_a.select {|l| /^f/ =~ l}
=> ["foo\n"]

Bullshit: just use select:

>> "foo\nbar\n".select {|l| /^f/ =~ l}
=> ["foo\n"]

Sorry for the noise.

robert

···

On 22.03.2007 16:09, Robert Klemme wrote:

On 22.03.2007 15:43, Andrew Stewart wrote:

Gavin_Kistner2 · 22 March 2007 15:45

Leslie, as a public service announcement, you should be aware of
IO.readlines:

C:\>qri IO.readlines

···

On Mar 22, 9:15 am, "Leslie Viljoen" <leslievilj...@gmail.com> wrote:

You could change your lines into an array of lines and then remove the
lines that match:

lines =
File.new("text.txt").read.each_line {|line| lines << line }
lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

----------------------------------------------------------
IO::readlines
     IO.readlines(name, sep_string=$/) => array
------------------------------------------------------------------------
     Reads the entire file specified by _name_ as individual lines,
and
     returns those lines in an array. Lines are separated by
     _sep_string_.

a = IO.readlines("testfile")
a[0] #=> "This is line one\n"

For that matter, you should also be aware of IO.read:
---------------------------------------------------------------
IO::read
     IO.read(name, [length [, offset]] ) => string
------------------------------------------------------------------------
     Opens the file, optionally seeks to the given offset, then
returns
     _length_ bytes (defaulting to the rest of the file). +read+
ensures
     the file is closed before returning.

        IO.read("testfile") #=> "This is line one\nThis is
line two\nThis is line three\nAnd so on...\n"
        IO.read("testfile", 20) #=> "This is line one\nThi"
        IO.read("testfile", 20, 10) #=> "ne one\nThis is line "

You should also be aware of the block form of #open, which opens the
IO object and then closes it when done.

What you wrote creates a new File object and opens it, but never
closes it. I'm not really sure what badness can result from this, but
I gather it's not a good idea.

Andrew_Stewart · 22 March 2007 16:48

Leslie, thanks for that. That works for me (with a join chained on the end).

Regards,
Andy Stewart

···

On 22 Mar 2007, at 15:15, Leslie Viljoen wrote:

lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

Andrew_Stewart · 22 March 2007 16:52

Aha! You have proved that I chose my regexp poorly. Here's what I tried:

input.gsub(%r{^.*/usr/local/lib/.*$}i, '')

The difference is that yours consumes the new line character but mine doesn't. I should have just matched it explicitly like you rather than using an anchor.

Thanks and regards,
Andy Stewart

···

On 22 Mar 2007, at 15:22, Rob Biedenharn wrote:

>> input.gsub(%r{^.*/usr/local/lib/.*\n?},'')

If you showed your code, an explanation could be added as to your regexp, but the concept certainly works as I've shown.

Andrew_Stewart · 22 March 2007 16:47

Thanks for that. So simple once you've seen it.

Regards,
Andy Stewart

···

On 22 Mar 2007, at 15:15, Robert Klemme wrote:

>> "foo\nbar\n".select {|l| /^f/ =~ l}
=> ["foo\n"]

Leslie_Viljoen1 · 22 March 2007 20:56

This does sound rather frightening! What *is* the effect of opening a
file and not closing it?

Also, doesn't the above say that IO.read closes the file afterwards?

···

On 3/22/07, Phrogz <gavin@refinery.com> wrote:

On Mar 22, 9:15 am, "Leslie Viljoen" <leslievilj...@gmail.com> wrote:
> You could change your lines into an array of lines and then remove the
> lines that match:
>
> lines =
> File.new("text.txt").read.each_line {|line| lines << line }
> lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

Leslie, as a public service announcement, you should be aware of
IO.readlines:

C:\>qri IO.readlines
----------------------------------------------------------
IO::readlines
     IO.readlines(name, sep_string=$/) => array
------------------------------------------------------------------------
     Reads the entire file specified by _name_ as individual lines,
and
     returns those lines in an array. Lines are separated by
     _sep_string_.

        a = IO.readlines("testfile")
        a[0] #=> "This is line one\n"

For that matter, you should also be aware of IO.read:
---------------------------------------------------------------
IO::read
     IO.read(name, [length [, offset]] ) => string
------------------------------------------------------------------------
     Opens the file, optionally seeks to the given offset, then
returns
     _length_ bytes (defaulting to the rest of the file). +read+
ensures
     the file is closed before returning.

        IO.read("testfile") #=> "This is line one\nThis is
line two\nThis is line three\nAnd so on...\n"
        IO.read("testfile", 20) #=> "This is line one\nThi"
        IO.read("testfile", 20, 10) #=> "ne one\nThis is line "

You should also be aware of the block form of #open, which opens the
IO object and then closes it when done.

What you wrote creates a new File object and opens it, but never
closes it. I'm not really sure what badness can result from this, but
I gather it's not a good idea.

--
If you could create a machine that copies hamburgers — you put one
hamburger in and two equally good hamburgers come out the other side —
it would be unethical not to do so and make it freely available.

Rick_DeNatale1 · 23 March 2007 18:39

Or, since he's really trying to exclude lines which include /usr/local/lib:

str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

···

On 3/22/07, Robert Klemme <shortcutter@googlemail.com> wrote:

Bullshit: just use select:

>> "foo\nbar\n".select {|l| /^f/ =~ l}
=> ["foo\n"]

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Gavin_Kistner2 · 22 March 2007 21:45

> > lines =
> > File.new("text.txt").read.each_line {|line| lines << line }
> > lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

[snip]

> What you wrote creates a new File object and opens it, but never
> closes it. I'm not really sure what badness can result from this, but
> I gather it's not a good idea.

This does sound rather frightening! What *is* the effect of opening a
file and not closing it?

Also, doesn't the above say that IO.read closes the file afterwards?

IO.read (the class method) does open/close the file, but IO#read (the
instance method) does not. Manually managing an IO object, you need
something like:
  f = File.new('foo.txt')
  f.read
  f.close

···

On Mar 22, 2:56 pm, "Leslie Viljoen" <leslievilj...@gmail.com> wrote:

On 3/22/07, Phrogz <g...@refinery.com> wrote:
> On Mar 22, 9:15 am, "Leslie Viljoen" <leslievilj...@gmail.com> wrote:

Brian_Candler · 23 March 2007 20:34

Why does nobody seem to anchor regexps? I have seen so many documentation
examples now which suggest that something like

raise "Invalid data" unless /[A-Za-z0-9]+/ =~ data

is a good example of data validation

It would be a drift away from Perl, but I wonder if regexps should be
anchored by default, and you'd have to add a flag to them to make them
unanchored...

Sorry, just my gripe of the day.

Regards,

Brian.

···

On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:

On 3/22/07, Robert Klemme <shortcutter@googlemail.com> wrote:

>Bullshit: just use select:
>
> >> "foo\nbar\n".select {|l| /^f/ =~ l}
>=> ["foo\n"]

Or, since he's really trying to exclude lines which include /usr/local/lib:

str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

Gavin_Kistner2 · 26 March 2007 15:30

Note that the above results in Regexp#new being called for each line
in the string; not very efficient:
require 'Benchmark'

s = "foobar"
N = 1000000

  Benchmark.bmbm{ |x|
    x.report( 'Regexp.new' ){
      N.times{ Regexp.new( "foobar" ) =~ s }
    }
    x.report( 'inline literal' ){
      N.times{ /foobar/ =~ s }
    }
    x.report( 'as variable' ){
      r = /foobar/
      N.times{ r =~ s }
    }
  }

  #=> Rehearsal --------------------------------------------------
  #=> Regexp.new 20.844000 1.875000 22.719000 ( 22.750000)
  #=> inline literal 0.906000 0.000000 0.906000 ( 0.906000)
  #=> as variable 1.094000 0.000000 1.094000 ( 1.094000)
  #=> ---------------------------------------- total: 24.719000sec
  #=>
  #=> user system total real
  #=> Regexp.new 21.234000 1.671000 22.905000 ( 22.922000)
  #=> inline literal 0.891000 0.000000 0.891000 ( 0.891000)
  #=> as variable 1.047000 0.000000 1.047000 ( 1.047000)

If you just want to avoid having to escape the forward slashes in the
literal, you can use the %r notation for a regexp literal. For
example:
str.reject{ |s| %r{/usr/local/lib} =~ s }

···

On Mar 23, 12:39 pm, "Rick DeNatale" <rick.denat...@gmail.com> wrote:

str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

Vince_H_K · 23 March 2007 20:48

Brian Candler wrote:

Bullshit: just use select:

"foo\nbar\n".select {|l| /^f/ =~ l}

=> ["foo\n"]

Or, since he's really trying to exclude lines which include /usr/local/lib:

str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

Why does nobody seem to anchor regexps? I have seen so many documentation
examples now which suggest that something like

raise "Invalid data" unless /[A-Za-z0-9]+/ =~ data

is a good example of data validation

I agree that it isn't... However, most of my uses of regexps are to
extract some pieces of data from a bigger text - such as

`identify biniou.jpg` =~ /(\d+)x(\d+)/

It would be a pain to have to unanchor those... Moreover, it is way
better (in my opinion), to write something like

raise "Invalid data" if /[^A-Za-z0-9]/ =~ data

if you really require the full data to be alnum. Or, rather, if you
want to be somehow more flexible (stripping whitespace and other):

raise "Invalid data" unless /([A-Za-z0-9]+)/ =~ data

data = $1 # Cleaning up data...

Cheers,

Vincent

···

On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:

On 3/22/07, Robert Klemme <shortcutter@googlemail.com> wrote:

--
Vincent Fourmond, PhD student (not for long anymore)
http://vincent.fourmond.neuf.fr/

Rick_DeNatale1 · 23 March 2007 21:29

Well, the OP said "I would like to remove lines matching
/usr/local/lib from the multiline string:", no mention of at the start
of a line.

I did actually consider pointing out that he might have meant at the
beginning of each line, but he didn't so I didn't.

···

On 3/23/07, Brian Candler <B.Candler@pobox.com> wrote:

On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
> On 3/22/07, Robert Klemme <shortcutter@googlemail.com> wrote:
>
> >Bullshit: just use select:
> >
> > >> "foo\nbar\n".select {|l| /^f/ =~ l}
> >=> ["foo\n"]
>
> Or, since he's really trying to exclude lines which include /usr/local/lib:
>
> str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

Why does nobody seem to anchor regexps?

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Rick_DeNatale1 · 27 March 2007 16:12

Quoting my old friend Kent Beck:

"Make it run before you make it fast."

···

On 3/26/07, Phrogz <gavin@refinery.com> wrote:

On Mar 23, 12:39 pm, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

Note that the above results in Regexp#new being called for each line
in the string; not very efficient:

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Andrew_Stewart · 26 March 2007 11:40

>
> >Bullshit: just use select:
> >
> > >> "foo\nbar\n".select {|l| /^f/ =~ l}
> >=> ["foo\n"]
>
> Or, since he's really trying to exclude lines which include /usr/local/lib:
>
> str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

That's neat. I should have realised that enumberable's methods come into play with a multiline string. No need to mess about with \n characters -- let Ruby handle those.

Why does nobody seem to anchor regexps?

Well, the OP said "I would like to remove lines matching
/usr/local/lib from the multiline string:", no mention of at the start
of a line.

I did actually consider pointing out that he might have meant at the
beginning of each line, but he didn't so I didn't.

And you were right -- the '/usr/local/lib' sequence isn't at the start of the line. (Though the part between the start of the line and this sequence is predictable and could be worked into the regexp.)

For those interested, the context of the problem was filtering stack traces produced under autotest:

http://blog.airbladesoftware.com/2007/3/22/filtering-autotest-s-output

Thanks for all the help. I greatly appreciate it.

Regards,
Andy Stewart

···

On 23 Mar 2007, at 21:29, Rick DeNatale wrote:

On 3/23/07, Brian Candler <B.Candler@pobox.com> wrote:

On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
> On 3/22/07, Robert Klemme <shortcutter@googlemail.com> wrote:

Gavin_Kistner2 · 27 March 2007 23:10

On the one hand:
"Premature optimization is the root of all evil"[1]

On the other - knowing that calling a constructor on each iteration of
a block is much slower when a literal doesn't change - and when it's
less characters to type as well - might be considered not so much
premature optimization as...well...reasonable code planning.

[1] Premature optimization is the root of all evil

···

On Mar 27, 10:12 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:

On 3/26/07, Phrogz <g...@refinery.com> wrote:

> On Mar 23, 12:39 pm, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

> Note that the above results in Regexp#new being called for each line
> in the string; not very efficient:

Quoting my old friend Kent Beck:

"Make it run before you make it fast."

Topic		Replies	Views
Remove regex matched line question ruby-talk	12	153	27 March 2008
Save only first line from string? ruby-talk	20	179	7 October 2010
Extracting multiple lines from a file ruby-talk	17	111	31 December 2003
Is there a more efficient way to remove data from a string? ruby-talk	8	144	15 July 2011
Index of string from beginning of line vs beginning of file ruby-talk	9	129	27 March 2010

Knocking Lines Out Of A Multiline String

Related topics