Parsing large file into matrix

Qubert · 24 April 2003 05:26

I work on a lot of flat files containing data with many columns
and thousands of rows. I have recently started using ruby, in
hopes to forgo the overworking of sed and awk or tedious coding of C.
So far I like it.

Question: It would be ideal if I could open a file, read in the data
and then start working on the resulting array by a [m,n] indexing. Thus
far using Ruby I can do this fairly quickly:
x = File.open(“file.txt”).readlines
j = 0
ffsize = Array.new
name = Array.new
while j < x.length
y = x[j].split(’\t’)
ffsize[j] = y.shift.to_f/1024.0
name[j] = y.shift
j += 1
end

Currently I am just using a small dummy file as a test, but this will become
huge once I start using files with many columns of data and I will likely
join these field arrays together again.

Is there a Ruby-way of creating an m by n array from the beginning?
I know the following doesn’t work but something like it code would be nice:
x = File.open(“file.txt”).readlines.split(’\t’)

Then x would be an m by n array of m different data types.

Thanks,
Qubert

Joel_VanderWerf1 · 24 April 2003 05:58

Qubert wrote:

Is there a Ruby-way of creating an m by n array from the beginning?
I know the following doesn’t work but something like it code would be nice:
x = File.open(“file.txt”).readlines.split(‘\t’)

How about this?

x = File.open(“file.txt”).readlines.
map {|line| line.chomp.split(“\t”).
map {|s|s.to_f}}

Note that “\t” is necessary so that you get an actual tab char.

Brian_Candler · 24 April 2003 08:55

x = File.open(“file.txt”).readlines.collect! do |line|
line.chomp.split(“\t”)
end

Although that leaves the file open until garbage collection time. The
following does not have that problem; it also processes the lines as they
are read.

x = File.open(“file.txt”) do |f|
f.collect do |line|
line.chomp.split(“\t”)
end
end

Regards,

Brian.

···

On Thu, Apr 24, 2003 at 02:26:08PM +0900, Qubert wrote:

Is there a Ruby-way of creating an m by n array from the beginning?
I know the following doesn’t work but something like it code would be nice:
x = File.open(“file.txt”).readlines.split(‘\t’)

Robert · 24 April 2003 08:47

“Joel VanderWerf” vjoel@PATH.Berkeley.EDU schrieb im Newsbeitrag
news:3EA77CED.8010201@path.berkeley.edu…

Qubert wrote:

Is there a Ruby-way of creating an m by n array from the beginning?
I know the following doesn’t work but something like it code would be
nice:
x = File.open(“file.txt”).readlines.split(‘\t’)

How about this?

x = File.open(“file.txt”).readlines.
map {|line| line.chomp.split(“\t”).
map {|s|s.to_f}}

I don’t think this closes the file properly. And, another disadvantage, it
keeps the whole file in memory. How about

read

matrix=
File.open(“file.txt”).each{|line|
matrix << ( line.chomp.split(“\t”).map {|s|s.to_f} )

alternative use split( /\s+/ ) to use all white space as separator

}

write again

matrix.each{|x| puts x.join “\t”}

robert

Robert · 24 April 2003 10:08

“Brian Candler” B.Candler@pobox.com schrieb im Newsbeitrag
news:20030424085535.GA33201@uk.tiscali.com…

Is there a Ruby-way of creating an m by n array from the beginning?
I know the following doesn’t work but something like it code would be
nice:
x = File.open(“file.txt”).readlines.split(‘\t’)

x = File.open(“file.txt”).readlines.collect! do |line|
line.chomp.split(“\t”)
end

Although that leaves the file open until garbage collection time. The
following does not have that problem; it also processes the lines as
they
are read.

x = File.open(“file.txt”) do |f|
f.collect do |line|
line.chomp.split(“\t”)
end
end

Thanks for correcting me. That was what I intended. What bugs me is that
documentation says that File.open() with a block returns nil. But
apparently it works. So with the additional conversion to float we get

matrix = File.open(“file.txt”) do |f|
f.collect do |line|
line.chomp.split(“\t”).map {|s|s.to_f}
end
end

Cheers

robert

···

On Thu, Apr 24, 2003 at 02:26:08PM +0900, Qubert wrote:

Jim_Freeze2 · 25 April 2003 04:14

matrix = File.open(“file.txt”) do |f|
f.collect do |line|
line.chomp.split(“\t”).map {|s|s.to_f}
-----------------------^^^^
/\t/ is faster
end
end

Here’s a common scenario that I use.
This file allows comments that start with ‘#’.
All other lines that are not empty are data.

def to_number(str)
Integer(str) rescue Float(str) rescue str
end

matrix =
File.foreach(file) { |line|
next if /^\s*$/ =~ line
next if /^\s*#/ =~ line
matrix << line.strip.split(/\t/).collect { |i| to_number(i) }
}

···

On Thursday, 24 April 2003 at 19:08:17 +0900, Robert Klemme wrote:

–
Jim Freeze

“It’s Fabulous! We haven’t seen anything like it in the last half an
hour!”
– Macy’s

Robert · 25 April 2003 10:18

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425002319.A47026@freeze.org…

···

On Thursday, 24 April 2003 at 19:08:17 +0900, Robert Klemme wrote:

matrix = File.open(“file.txt”) do |f|
f.collect do |line|
line.chomp.split(“\t”).map {|s|s.to_f}
-----------------------^^^^
/\t/ is faster
end
end

Here’s a common scenario that I use.
This file allows comments that start with ‘#’.
All other lines that are not empty are data.

def to_number(str)
Integer(str) rescue Float(str) rescue str
end

matrix =
File.foreach(file) { |line|
next if /^\s*$/ =~ line
next if /^\s*#/ =~ line
matrix << line.strip.split(/\t/).collect { |i| to_number(i) }
}

next if /^\s*(#.*)?$/ =~ line

sorry, couldn’t resist.

robert

Jim_Freeze2 · 25 April 2003 10:28

Sorry, I don’t understand. Are you trying to capture the comment
and then throw it away?

···

On Friday, 25 April 2003 at 19:18:40 +0900, Robert Klemme wrote:

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425002319.A47026@freeze.org…

On Thursday, 24 April 2003 at 19:08:17 +0900, Robert Klemme wrote:

Here’s a common scenario that I use.
This file allows comments that start with ‘#’.
All other lines that are not empty are data.

def to_number(str)
Integer(str) rescue Float(str) rescue str
end

matrix =
File.foreach(file) { |line|
next if /^\s*$/ =~ line
next if /^\s*#/ =~ line
matrix << line.strip.split(/\t/).collect { |i| to_number(i) }
}

next if /^\s*(#.*)?$/ =~ line

sorry, couldn’t resist.

–
Jim Freeze

If I had a plantation in Georgia and a home in Hell, I’d sell the
plantation and go home.
– Eugene P. Gallagher

Robert · 25 April 2003 11:39

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425063643.A47876@freeze.org…

···

On Friday, 25 April 2003 at 19:18:40 +0900, Robert Klemme wrote:

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425002319.A47026@freeze.org…

On Thursday, 24 April 2003 at 19:08:17 +0900, Robert Klemme wrote:

Here’s a common scenario that I use.
This file allows comments that start with ‘#’.
All other lines that are not empty are data.

def to_number(str)
Integer(str) rescue Float(str) rescue str
end

matrix =
File.foreach(file) { |line|
next if /^\s*$/ =~ line
next if /^\s*#/ =~ line
matrix << line.strip.split(/\t/).collect { |i| to_number(i) }
}

next if /^\s*(#.*)?$/ =~ line

sorry, couldn’t resist.

Sorry, I don’t understand. Are you trying to capture the comment
and then throw it away?

Yes, that’s what he did. I just combined the two “next” lines into one.

robert

Jim_Freeze2 · 25 April 2003 12:27

Very clever. I did not catch that at first.

···

On Friday, 25 April 2003 at 20:39:22 +0900, Robert Klemme wrote:

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425063643.A47876@freeze.org…

On Friday, 25 April 2003 at 19:18:40 +0900, Robert Klemme wrote:

next if /^\s*(#.*)?$/ =~ line

sorry, couldn’t resist.

Sorry, I don’t understand. Are you trying to capture the comment
and then throw it away?

Yes, that’s what he did. I just combined the two “next” lines into one.

–
Jim Freeze

What the hell, go ahead and put all your eggs in one basket.

Bermejo_Rodrigo · 25 April 2003 18:24

Dear ruby philosophers :
Where is the limit of the finite world ? =)

def buzz_light_year
require 'matrix’
a=[]
b=[]

400.times {|x|
400.times {|e| b << rand(2).to_f }
a << b
b = [ ]
}
m=Matrix::rows(a)
det=m.det
print det #-> -Infinity
end

···

Novody is perfect

Robert · 25 April 2003 14:00

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425083558.A48085@freeze.org…

“Jim Freeze” jim@freeze.org schrieb im Newsbeitrag
news:20030425063643.A47876@freeze.org…

next if /^\s*(#.*)?$/ =~ line

sorry, couldn’t resist.

Sorry, I don’t understand. Are you trying to capture the comment
and then throw it away?

Yes, that’s what he did. I just combined the two “next” lines into
one.

Very clever.

Thanks!

I did not catch that at first.

Nevermind, I could’ve been more verbose.

robert

···

On Friday, 25 April 2003 at 20:39:22 +0900, Robert Klemme wrote:

On Friday, 25 April 2003 at 19:18:40 +0900, Robert Klemme wrote:

Topic		Replies	Views
Questions from a Ruby Newbie (file io and data structures) ruby-talk	2	102	28 January 2003
Use mulit-dim. Arrays? [Questions from a Ruby Newbie (file io and d ata structures)...] ruby-talk	3	113	28 January 2003
Questions from a Ruby Newbie (file io and data structures) ruby-talk	3	130	28 January 2003
Problem: Newbie with a question on file arrays and file insertion ruby-talk	2	138	2 June 2002
Repeatedly open file or save entire file to memory? ruby-talk	8	101	18 September 2009

Parsing large file into matrix

read

alternative use split( /\s+/ ) to use all white space as separator

write again

– Jim Freeze

– Jim Freeze

– Jim Freeze

Related Topics

–
Jim Freeze

–
Jim Freeze

–
Jim Freeze