Reading file after a particular line in file

Vandana · 12 May 2010 17:20

Hello All,

I would like to read a file in ruby. It is a 2G file, but
contain useless data in the beginning portion of the file.

There is a particular pattern towards the middle of the file after
which useful data begins. Is there a way to grep for this pattern and
then read every line henceforth, but ignore all lines previous to line
on which pattern found?

Thanks,
Vandana

Thomas_Volkmar_Worm · 12 May 2010 23:20

File.open("myfile", "r") do |f|

# Skip the garbage before pattern:
while f.gets !~ /pattern/ do; end

  # Read your data:
  while l = f.gets
    puts l
  end

end

Vandana · 13 May 2010 00:45

Thank you very much.

···

On May 12, 4:18 pm, Thomas Volkmar Worm <t...@s4r.de> wrote:

File.open("myfile", "r") do |f|

# Skip the garbage before pattern:
while f.gets !~ /pattern/ do; end

# Read your data:
while l = f.gets
puts l
end

end

Robert_K1 · 13 May 2010 09:10

There's also the flip flop operator:

File.foreach "myfile" do |line|
   if /pattern/ =~ line .. false
     puts line
   end
end

The trick I am using is that the FF operator starts to return true if the first expression returns true and stays true until the last expression returns true - in this case never since you want to read until the end of the file.

Kind regards

robert

···

On 05/13/2010 02:40 AM, Vandana wrote:

On May 12, 4:18 pm, Thomas Volkmar Worm <t...@s4r.de> wrote:

File.open("myfile", "r") do |f|

  # Skip the garbage before pattern:
  while f.gets !~ /pattern/ do; end

  # Read your data:
  while l = f.gets
    puts l
  end

end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Unbewusst_Sein1 · 13 May 2010 14:35

coud that trick be used for start and stop tags ? like :

File.foreach "myfile" do |line|
   if /<body/ =~ line .. /<\/body/ =~ line
     puts line
   end
end

if true, that's clever !

···

Robert Klemme <shortcutter@googlemail.com> wrote:

There's also the flip flop operator:

File.foreach "myfile" do |line|
   if /pattern/ =~ line .. false
     puts line
   end
end

The trick I am using is that the FF operator starts to return true if
the first expression returns true and stays true until the last
expression returns true - in this case never since you want to read
until the end of the file.

--
« La vie ne se comprend que par un retour en arrière,
mais on ne la vit qu'en avant. »
(Sören Kierkegaard)

botp1 · 13 May 2010 14:44

yes.
but like every case, you should test it.

kind regards -botp

···

2010/5/13 Une Bévue <unbewusst.sein@google.com.invalid>:

coud that trick be used for start and stop tags ? like :

File.foreach "myfile" do |line|
if /<body/ =~ line .. /<\/body/ =~ line
puts line
end
end

Robert_K1 · 13 May 2010 16:35

Yes, that could be done. However, I would not use this for languages from the SGML family (XML, HTML) because there are no guarantees as to how many tags you'll find on a single line of text. There are better tools do deal with that (REXML, Nokogiri...).

Kind regards

robert

···

On 13.05.2010 16:34, Une Bévue wrote:

Robert Klemme<shortcutter@googlemail.com> wrote:

There's also the flip flop operator:

File.foreach "myfile" do |line|
    if /pattern/ =~ line .. false
      puts line
    end
end

The trick I am using is that the FF operator starts to return true if
the first expression returns true and stays true until the last
expression returns true - in this case never since you want to read
until the end of the file.

coud that trick be used for start and stop tags ? like :

File.foreach "myfile" do |line|
    if /<body/ =~ line .. /<\/body/ =~ line
      puts line
    end
end

if true, that's clever !

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Xavier_Noria · 13 May 2010 14:51

Line-oriented solutions assume small lines, and that the pattern has
no beeline. Perhaps that is true, but it is unknown.

Unbewusst_Sein1 · 13 May 2010 18:25

Right, however REXML isn't working for badly balanced tags.
I dis some test, today, of Nokogiri, it works even better than tidy for
the first step cleaning unbalanced tags.

the only question i have about Nokogiri is how to avoid the DOCTYPE
because it outputs :
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">

even if i'm using #to_xhtml :

then, the DOCTYPE is wrong...

···

Robert Klemme <shortcutter@googlemail.com> wrote:

Yes, that could be done. However, I would not use this for languages
from the SGML family (XML, HTML) because there are no guarantees as to
how many tags you'll find on a single line of text. There are better
tools do deal with that (REXML, Nokogiri...).

--
« La vie ne se comprend que par un retour en arrière,
mais on ne la vit qu'en avant. »
(Sören Kierkegaard)

Xavier_Noria · 13 May 2010 14:53

newline (beeline is damn phone autocorrection)

···

On Thursday, May 13, 2010, Xavier Noria <fxn@hashref.com> wrote:

Line-oriented solutions assume small lines, and that the pattern has
no beeline.

Topic		Replies	Views
Reading file after a particular line in file ruby-talk	2	145	12 May 2010
Reading past a file header ruby-talk	11	117	31 January 2003
Grep a block ruby-talk	5	90	19 December 2009
Grep a block of code ruby-talk	2	139	8 May 2009
Search string in a file ruby-talk	6	114	15 October 2003

Reading file after a particular line in file

Related topics