Fair enough to enquire about performance, especially relative to similar
languages. However, you simply shouldn’t expect to be able to read 4MB and
create such a large data structure (i.e. an array with thousands of
elements) in a quick time.
If you expect the file to be large, you really should try to read it a line
at a time:
while line = file.gets
Process “line”
end
(or similar)
Cheers,
Gavin
···
----- Original Message -----
From: “Shashank Date” ADATE@kc.rr.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Friday, August 02, 2002 11:33 AM
Subject: IO.readlines is slow ?
I really like the convenience of doing:
arr = IO.readlines(“test1.txt”)
and then using [arr] to massage my data.
But, when “test1.txt” is a big file (say 4MB) it takes for ever to read
the
file.
Is there any way to make it faster without sacrificing the terseness.
(Similar constructs are much faster in Python and Perl world).
But, when “test1.txt” is a big file (say 4MB) it takes for ever to read the
file.
Is there any way to make it faster without sacrificing the terseness.
on this file :
oct@carafon:~$ ls -la sample.txt
-rw-rw-rw- 1 oct oct 29445164 2002-07-09 11:05 sample.txt
your version takes:
oct@carafon:~$ time ruby read2.rb
content is 233373 lines long
real 0m5.434s
user 0m3.770s
sys 0m0.460s
Using sysread is much faster than any other read i could experience in
ruby. Then you can split this file using :split (although i’m not sure
it is the fastest way to do that). Here are my results:
oct@carafon:~$ cat read.rb
input=File.open(“sample.txt”)
all=input.sysread(File::size(“sample.txt”))
content=all.split(“\n”)
print “content is “+content.length.to_s+” lines long\n”
oct@carafon:~$ time ruby read.rb
content is 233373 lines long
real 0m1.938s
user 0m1.130s
sys 0m0.710s
you can then easily wrap this inside a module and get your one-line
program doing the readlines.
hth,
···
On Fri, Aug 02, 2002, Shashank Date wrote:
Pierre Baillet
Il faut pomper pour vivre et donc vivre pour pomper.
Devise Shadok
Just to report, I have a similar problem: on my system (Ruby 1.66,
pragmatic programmers ruby install for Windows), openning a 4Mb file takes
between half a minute and 40 seconds using File.read (not File.readline).
But, when “test1.txt” is a big file (say 4MB) it takes for ever to read the
file.
I hope this isn’t a big step-out on my part, but ISTR that there is
known problem with file I/O in Ruby 1.6.6 for Windows. You should check
the mailing list or newsgroup archives for more information. It is also
my understanding that this problem has been fixed in Ruby 1.7.2. So if
you’re running under Windows this would explain the significant time
differences that others are seeing when running under Ruby 1.6 on Unix.
$ time ruby -e ‘a=File.open(“bigfile”).sysread(4194304).split(“\n”)’
real 0m0.328s
user 0m0.230s
sys 0m0.080s
$ time ruby -e “a=IO.readlines(‘bigfile’)”
real 0m0.156s
user 0m0.130s
sys 0m0.010s
···
On Friday 02 August 2002 02:16 am, Pierre Baillet wrote:
Using sysread is much faster than any other read i could experience
in ruby. Then you can split this file using :split (although i’m
not sure it is the fastest way to do that). Here are my results:
On Friday 02 August 2002 11:51 am, Maurício wrote:
Just to report, I have a similar problem: on my system (Ruby
1.66, pragmatic programmers ruby install for Windows), openning a
4Mb file takes between half a minute and 40 seconds using File.read
(not File.readline).
Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…
Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??
Hi
No, the package doesn’t have built-in support for ‘include’ type directives.
However, you can add this capability into the package by first creating a
custom glossary for your parser that will recognise ‘include’ tags. There’s
a class called Syntax::Glossary in the package that shows how these work.
Then you’ll need to create a subclass of the Syntax::Parser that reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want to alter
it so that it can read in extra template files when it finds an ‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.
The code I’ve got for this at the moment is slightly fiddly and longwinded,
but I’ve sent it over off-list in just in case it gives you a few ideas.
Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…
Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??
Hi
No, the package doesn’t have built-in support for ‘include’ type
directives.
However, you can add this capability into the package by first creating a
custom glossary for your parser that will recognise ‘include’ tags.
There’s
a class called Syntax::Glossary in the package that shows how these work.
Then you’ll need to create a subclass of the Syntax::Parser that reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want
to alter
it so that it can read in extra template files when it finds an
‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.
The code I’ve got for this at the moment is slightly fiddly and
longwinded,
···
but I’ve sent it over off-list in just in case it gives you a few ideas.
Hello,
Thanks to Alex Fenton whom has given me a hint of how to add INCLUDE tag
for PageTemplate. Here’s a patch for the original PageTemplate.rb.
But there’s a kludge; I can’t seem to do a File.new as Ruby says it’s a
tainted operation; so I just do a .untaint on the argument…not very
nice, unless someone can try to unkludge it…
Perhaps Brian Wisti can add an [%include %] in the next PT release… hinthint
Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…
Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??
Hi
No, the package doesn’t have built-in support for ‘include’ type
directives.
However, you can add this capability into the package by first
creating a
custom glossary for your parser that will recognise ‘include’ tags.
There’s
a class called Syntax::Glossary in the package that shows how these
work.
Then you’ll need to create a subclass of the Syntax::Parser that
reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want
to alter
it so that it can read in extra template files when it finds an
‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.
The code I’ve got for this at the moment is slightly fiddly and
longwinded,
but I’ve sent it over off-list in just in case it gives you a few ideas.
Perhaps, you should not simply untaint it but do some testing against
“…/”-attacks or like that.
-billy.
···
On Wed, Aug 07, 2002 at 12:51:43PM +0900, Wai-Sun Chia wrote:
But there’s a kludge; I can’t seem to do a File.new as Ruby says it’s a
tainted operation; so I just do a .untaint on the argument…not very
nice, unless someone can try to unkludge it…