IO.readlines is slow?

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read the
file.

Is there any way to make it faster without sacrificing the terseness.

(Similar constructs are much faster in Python and Perl world).

Please help.

– Shanko

On my machine it takes 0.1 seconds or so to read a 4Mb file made
up of 128 byte lines.

In fact, it’s faster than Perl:

$ time perl -e “open(F,‘bigfile’);@a=”

real 0m0.239s
user 0m0.170s
sys 0m0.060s

$ time ruby -e “a=IO.readlines(‘bigfile’)”

real 0m0.152s
user 0m0.130s
sys 0m0.010s

···

On Thursday 01 August 2002 06:33 pm, Shashank Date wrote:

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to
read the file.

Is there any way to make it faster without sacrificing the
terseness.


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

Fair enough to enquire about performance, especially relative to similar
languages. However, you simply shouldn’t expect to be able to read 4MB and
create such a large data structure (i.e. an array with thousands of
elements) in a quick time.

If you expect the file to be large, you really should try to read it a line
at a time:

while line = file.gets

Process “line”

end

(or similar)

Cheers,
Gavin

···

----- Original Message -----
From: “Shashank Date” ADATE@kc.rr.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Friday, August 02, 2002 11:33 AM
Subject: IO.readlines is slow ?

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read
the
file.

Is there any way to make it faster without sacrificing the terseness.

(Similar constructs are much faster in Python and Perl world).

Please help.

– Shanko

Hello,
Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…

Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??

If not, how does one decompose a page from (say):

  1. header
  2. navbar
  3. sidebar
  4. footer
  5. body

Items 1-4 is typically very static and is to be included in every page
(simplifying matters here), but it’s the body that changes.

So what is the best way of doing this?

···


Wai-Sun “Squidster” Chia
Consulting & Integration
Linux/Unix/Web Developer Dude

Hello,

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read the
file.

Is there any way to make it faster without sacrificing the terseness.

on this file :
oct@carafon:~$ ls -la sample.txt
-rw-rw-rw- 1 oct oct 29445164 2002-07-09 11:05 sample.txt

your version takes:
oct@carafon:~$ time ruby read2.rb
content is 233373 lines long

real 0m5.434s
user 0m3.770s
sys 0m0.460s

Using sysread is much faster than any other read i could experience in
ruby. Then you can split this file using :split (although i’m not sure
it is the fastest way to do that). Here are my results:

oct@carafon:~$ cat read.rb
input=File.open(“sample.txt”)
all=input.sysread(File::size(“sample.txt”))
content=all.split(“\n”)
print “content is “+content.length.to_s+” lines long\n”
oct@carafon:~$ time ruby read.rb
content is 233373 lines long

real 0m1.938s
user 0m1.130s
sys 0m0.710s

you can then easily wrap this inside a module and get your one-line
program doing the readlines.

hth,

···

On Fri, Aug 02, 2002, Shashank Date wrote:

Pierre Baillet
Il faut pomper pour vivre et donc vivre pour pomper.
Devise Shadok

Just to report, I have a similar problem: on my system (Ruby 1.66,
pragmatic programmers ruby install for Windows), openning a 4Mb file takes
between half a minute and 40 seconds using File.read (not File.readline).

Maurício

“Shashank Date” ADATE@kc.rr.com wrote in message
news:3jl29.3939$ua1.140046@twister.rdc-kc.rr.com

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read
the

···

file.

Is there any way to make it faster without sacrificing the terseness.

(Similar constructs are much faster in Python and Perl world).

Please help.

– Shanko

Shashank Date wrote:

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read the
file.

I hope this isn’t a big step-out on my part, but ISTR that there is
known problem with file I/O in Ruby 1.6.6 for Windows. You should check
the mailing list or newsgroup archives for more information. It is also
my understanding that this problem has been fixed in Ruby 1.7.2. So if
you’re running under Windows this would explain the significant time
differences that others are seeing when running under Ruby 1.6 on Unix.

Thank you all for pointing me in the right direction.

I have downloaded Ruby 1.7.2 and tried out reading very big (> 200 MB)
files, and it works like a champ.

(By way of comparison, Python 2.2 ran for a while and then aborted with some
memory error and Perl 5.6 did not even load !)

Thanks again.

– Shanko

“Shashank Date” ADATE@kc.rr.com wrote in message
news:3jl29.3939$ua1.140046@twister.rdc-kc.rr.com

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to read
the

···

file.

Is there any way to make it faster without sacrificing the terseness.

(Similar constructs are much faster in Python and Perl world).

Please help.

– Shanko

Hi,

I really like the convenience of doing:

arr = IO.readlines(“test1.txt”)

and then using [arr] to massage my data.

But, when “test1.txt” is a big file (say 4MB) it takes for ever to
read the file.

On my machine it takes 0.1 seconds or so to read a 4Mb file made
up of 128 byte lines.

It would depend on the version.

In fact, it’s faster than Perl:

Hmmm, Perl has got slower in 5.x?

···

At Fri, 2 Aug 2002 11:24:33 +0900, Ned Konz wrote:


Nobu Nakada

Under 1.7.2, it looks like readlines() is faster:

$ time ruby -e ‘a=File.open(“bigfile”).sysread(4194304).split(“\n”)’

real 0m0.328s
user 0m0.230s
sys 0m0.080s

$ time ruby -e “a=IO.readlines(‘bigfile’)”

real 0m0.156s
user 0m0.130s
sys 0m0.010s

···

On Friday 02 August 2002 02:16 am, Pierre Baillet wrote:

Using sysread is much faster than any other read i could experience
in ruby. Then you can split this file using :split (although i’m
not sure it is the fastest way to do that). Here are my results:


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

This is a file on your local disk?

···

On Friday 02 August 2002 11:51 am, Maurício wrote:

Just to report, I have a similar problem: on my system (Ruby

1.66, pragmatic programmers ruby install for Windows), openning a
4Mb file takes between half a minute and 40 seconds using File.read
(not File.readline).


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

“Maurício” briqueabraque@yahoo.com writes:

Just to report, I have a similar problem: on my system (Ruby 1.66,

pragmatic programmers ruby install for Windows), openning a 4Mb file takes
between half a minute and 40 seconds using File.read (not File.readline).

I believe this is a known problem on 1.6.6: I believe the installer
for the next version is coming along shortly.

Dave

Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…

Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??

Hi

No, the package doesn’t have built-in support for ‘include’ type directives.
However, you can add this capability into the package by first creating a
custom glossary for your parser that will recognise ‘include’ tags. There’s
a class called Syntax::Glossary in the package that shows how these work.
Then you’ll need to create a subclass of the Syntax::Parser that reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want to alter
it so that it can read in extra template files when it finds an ‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.

The code I’ve got for this at the moment is slightly fiddly and longwinded,
but I’ve sent it over off-list in just in case it gives you a few ideas.

best
alex

···

__
alex fenton
http://www.pressure.to/cccp/ --still anxious–

It would depend on the version.

Hmm… quite right… Ruby 1.6.7 is three or four times as slow as the
CVS tip (1.7.2 something).

$ ruby16 -v
ruby 1.6.7 (2002-03-01) [i686-linux]
$ time ruby16 -e “a=IO.readlines(‘bigfile’)”

real 0m0.542s
user 0m0.430s
sys 0m0.000s

In fact, it’s faster than Perl:

Hmmm, Perl has got slower in 5.x?

I don’t even remember Perl 4.

Perl 5.8.0 seems to be a bit faster than 5.6.1:

$ time perl561 -e “open(F,‘bigfile’);@a=”

real 0m0.246s
user 0m0.150s
sys 0m0.070s

$ perl -v

This is perl, v5.8.0 built for i686-linux

$ time perl -e “open(F,‘bigfile’);@a=”

real 0m0.239s
user 0m0.170s
sys 0m0.060s

$ ruby -v
ruby 1.7.2 (2002-07-30) [i686-linux]
$ time ruby -e “a=IO.readlines(‘bigfile’)”

real 0m0.152s
user 0m0.130s
sys 0m0.010s

···

On Friday 02 August 2002 12:45 am, nobu.nokada@softhome.net wrote:


Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

(...)openning a 4Mb file takes between half

a minute and 40 seconds using File.read
(not File.readline).

This is a file on your local disk?
(…)

Yes. EMACS takes less than 2 seconds to open it.

Thanks for the hint. I’ll try it out right away…

f.lex wrote:

Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…

Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??

Hi

No, the package doesn’t have built-in support for ‘include’ type
directives.
However, you can add this capability into the package by first creating a
custom glossary for your parser that will recognise ‘include’ tags.
There’s
a class called Syntax::Glossary in the package that shows how these work.
Then you’ll need to create a subclass of the Syntax::Parser that reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want
to alter
it so that it can read in extra template files when it finds an
‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.

The code I’ve got for this at the moment is slightly fiddly and
longwinded,

···

but I’ve sent it over off-list in just in case it gives you a few ideas.

best
alex

__
alex fenton
http://www.pressure.to/cccp/ --still anxious–


Wai-Sun “Squidster” Chia
Consulting & Integration
Linux/Unix/Web Developer Dude

Hello,
Thanks to Alex Fenton whom has given me a hint of how to add INCLUDE tag
for PageTemplate. Here’s a patch for the original PageTemplate.rb.

But there’s a kludge; I can’t seem to do a File.new as Ruby says it’s a
tainted operation; so I just do a .untaint on the argument…not very
nice, unless someone can try to unkludge it…

Perhaps Brian Wisti can add an [%include %] in the next PT release…
:wink: hint hint

Wai-Sun Chia wrote:

pt-include.diff (1.13 KB)

···

Thanks for the hint. I’ll try it out right away…

f.lex wrote:

Just started out with PageTemplate 0.3.2 with mod_ruby 0.9.9 and ruby
1.7.2 and have some questions regarding PT…

Does PT support nested templates? i.e. templates within a template?
I was expecting PT to have an “include” kind of directive??

Hi

No, the package doesn’t have built-in support for ‘include’ type
directives.
However, you can add this capability into the package by first
creating a
custom glossary for your parser that will recognise ‘include’ tags.
There’s
a class called Syntax::Glossary in the package that shows how these
work.
Then you’ll need to create a subclass of the Syntax::Parser that
reads in
file where it finds an ‘include’ tag. The ‘compile’ method of the parser
handles the meaning of each individual type of tag, and you’ll want
to alter
it so that it can read in extra template files when it finds an
‘include’.
Finally, you’ll need to create a subclass of PageTemplate that uses the
altered Syntax::Parser.

The code I’ve got for this at the moment is slightly fiddly and
longwinded,
but I’ve sent it over off-list in just in case it gives you a few ideas.

best
alex

__
alex fenton
http://www.pressure.to/cccp/ --still anxious–


Wai-Sun “Squidster” Chia
Consulting & Integration
Linux/Unix/Web Developer Dude

Perhaps, you should not simply untaint it but do some testing against
“…/”-attacks or like that.

-billy.

···

On Wed, Aug 07, 2002 at 12:51:43PM +0900, Wai-Sun Chia wrote:

But there’s a kludge; I can’t seem to do a File.new as Ruby says it’s a
tainted operation; so I just do a .untaint on the argument…not very
nice, unless someone can try to unkludge it…


Meisterbohne Söflinger Straße 100 Tel: +49-731-399 499-0
eLösungen 89077 Ulm Fax: +49-731-399 499-9