I would like to read a file in ruby. It is a 2G file, but
contain useless data in the beginning portion of the file.
There is a particular pattern towards the middle of the file after
which useful data begins. Is there a way to grep for this pattern and
then read every line henceforth, but ignore all lines previous to line
on which pattern found?
File.foreach "myfile" do |line|
if /pattern/ =~ line .. false
puts line
end
end
The trick I am using is that the FF operator starts to return true if the first expression returns true and stays true until the last expression returns true - in this case never since you want to read until the end of the file.
Kind regards
robert
···
On 05/13/2010 02:40 AM, Vandana wrote:
On May 12, 4:18 pm, Thomas Volkmar Worm <t...@s4r.de> wrote:
File.open("myfile", "r") do |f|
# Skip the garbage before pattern:
while f.gets !~ /pattern/ do; end
coud that trick be used for start and stop tags ? like :
File.foreach "myfile" do |line|
if /<body/ =~ line .. /<\/body/ =~ line
puts line
end
end
if true, that's clever !
···
Robert Klemme <shortcutter@googlemail.com> wrote:
There's also the flip flop operator:
File.foreach "myfile" do |line|
if /pattern/ =~ line .. false
puts line
end
end
The trick I am using is that the FF operator starts to return true if
the first expression returns true and stays true until the last
expression returns true - in this case never since you want to read
until the end of the file.
--
« La vie ne se comprend que par un retour en arrière,
mais on ne la vit qu'en avant. »
(Sören Kierkegaard)
Yes, that could be done. However, I would not use this for languages from the SGML family (XML, HTML) because there are no guarantees as to how many tags you'll find on a single line of text. There are better tools do deal with that (REXML, Nokogiri...).
Kind regards
robert
···
On 13.05.2010 16:34, Une Bévue wrote:
Robert Klemme<shortcutter@googlemail.com> wrote:
There's also the flip flop operator:
File.foreach "myfile" do |line|
if /pattern/ =~ line .. false
puts line
end
end
The trick I am using is that the FF operator starts to return true if
the first expression returns true and stays true until the last
expression returns true - in this case never since you want to read
until the end of the file.
coud that trick be used for start and stop tags ? like :
File.foreach "myfile" do |line|
if /<body/ =~ line .. /<\/body/ =~ line
puts line
end
end
Right, however REXML isn't working for badly balanced tags.
I dis some test, today, of Nokogiri, it works even better than tidy for
the first step cleaning unbalanced tags.
the only question i have about Nokogiri is how to avoid the DOCTYPE
because it outputs :
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
even if i'm using #to_xhtml :
then, the DOCTYPE is wrong...
···
Robert Klemme <shortcutter@googlemail.com> wrote:
Yes, that could be done. However, I would not use this for languages
from the SGML family (XML, HTML) because there are no guarantees as to
how many tags you'll find on a single line of text. There are better
tools do deal with that (REXML, Nokogiri...).
--
« La vie ne se comprend que par un retour en arrière,
mais on ne la vit qu'en avant. »
(Sören Kierkegaard)