I'm trying to emulate the new feature in 1.9 that allows you to specify
the maximum length of a line read in Ruby 1.8.6. Can anyone help?
···
--
Posted via http://www.ruby-forum.com/.
I'm trying to emulate the new feature in 1.9 that allows you to specify
the maximum length of a line read in Ruby 1.8.6. Can anyone help?
--
Posted via http://www.ruby-forum.com/.
Tristin Davis wrote:
I'm trying to emulate the new feature in 1.9 that allows you to specify
the maximum length of a line read in Ruby 1.8.6. Can anyone help?
max = 3
count = 0
IO.foreach('data.txt') do |line|
if count == max
break
else
count += 1
end
puts line
end
--
Posted via http://www.ruby-forum.com/\.
But by the time you actually get count, isn't the line already read in
memory. So if the line is 7 gigabytes, it'll probably crash the system.
7stud -- wrote:
Tristin Davis wrote:
I'm trying to emulate the new feature in 1.9 that allows you to specify
the maximum length of a line read in Ruby 1.8.6. Can anyone help?max = 3
count = 0IO.foreach('data.txt') do |line|
if count == max
break
else
count += 1
endputs line
end
--
Posted via http://www.ruby-forum.com/\.
Hi,
On Fri, Mar 7, 2008 at 9:37 AM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:
max = 3
count = 0IO.foreach('data.txt') do |line|
if count == max
break
else
count += 1
endputs line
end
Not quite the solution. This reads a number of lines, as opposed to
limiting the length of a single line read.
Arlen
Tristin Davis wrote:
But by the time you actually get count, isn't the line already read in
memory. So if the line is 7 gigabytes, it'll probably crash the system.
Is this what you are looking for:
max_bytes = 30
text = IO.read('data.txt', max_bytes)
puts text
--
Posted via http://www.ruby-forum.com/\.
On Behalf Of Tristin Davis:
# But by the time you actually get count, isn't the line
# already read in
# memory. So if the line is 7 gigabytes, it'll probably crash
# the system.
read will accept arg on how many bytes to read.
so how about,
irb(main):040:0> File.open "test.rb" do |f| f.read end
=> "a=(1..2)\n\na\nputs a\n\nputs a.each{|x| puts x}"
irb(main):041:0> File.open "test.rb" do |f| f.read 2 end
=> "a="
irb(main):042:0> File.open "test.rb" do |f| f.read 2; f.read 2 end
=> "(1"
irb(main):043:0> File.open "test.rb" do |f| while x=f.read(2); p x; end; end
"a="
"(1"
".."
"2)"
"\n\n"
"a\n"
"pu"
"ts"
" a"
"\n\n"
"pu"
"ts"
" a"
".e"
"ac"
"h{"
"|x"
"| "
"pu"
"ts"
" x"
"}"
=> nil
kind regards -botp
On Behalf Of Tristin Davis:
# But by the time you actually get count, isn't the line
# already read in
# memory. So if the line is 7 gigabytes, it'll probably crash
# the system.read will accept arg on how many bytes to read.
so how about,
...
irb(main):043:0> File.open "test.rb" do |f| while x=f.read(2); p x; end; end
That solution essentially ignores linebreaks.
If you want to read up to a linebreak or N characters, whichever comes
first, you could one of these:
On 3/6/08, Peña, Botp <botp@delmonte-phil.com> wrote:
------
class IO
#read by characters
def for_eachA(linelen)
c=0
while (c)
buf=''
linelen.times {
break unless c=getc
buf<<c
break if c.chr== $/
}
yield buf
end
end
#read by lines
def for_eachB(linelen)
re = Regexp.new(".*?#{Regexp.escape($/)}")
buf=''
while (line = read(linelen-buf.length))
buf = (buf+line).gsub(re){|l| yield l;''}
if buf.length == linelen
yield buf
buf=''
end
end
yield buf
end
end
File.open("foreach.rb") do |f|
f.for_eachA(10){|l| p l}
end
File.open("foreach.rb") do |f|
f.for_eachB(10){|l| p l}
end
------
I'd guess the second version would be faster, but I didn't time it.
-Adam
Thanks for the ideas Adam. I thought someone might be able to use it so
I figured i'd post it. It processed about 675,000 1100+ byte records in
an hour. Not fantastic performance, but it works. If someone can tell
me how to improve the performance then have at it.
module Util
def too_large?(buffer,max=10)
return true if buffer.length >= max
false
end
end
include Util
file = ARGV.shift #"C:/Documents and Settings/trdavi/Desktop/a1-1k.aa"
buf=''
record = 1
frequency = 100
f = File.open(file,'r')
while c=f.getc
buf << c
if too_large?(buf,max=102400)
p "record #{record} is too long, skipping to end"
while(x=f.getc)
if x.chr == $/
buf=''
record += 1
p "At record #{record}" if( (record % frequency ) == 0 )
break
end
end
end
if c.chr == $/
record += 1
print "At record #{record}" if( (record % frequency ) == 0 )
buf = ''
end
end
#If we still have something in the buffer, then it is probably the last
record.
unless buf.empty?
#record += 1
p "Last record is:" + buf
end
f.close
p record
Adam Shelly wrote:
On 3/6/08, Pe�a, Botp <botp@delmonte-phil.com> wrote:
On Behalf Of Tristin Davis:
# But by the time you actually get count, isn't the line
# already read in
# memory. So if the line is 7 gigabytes, it'll probably crash
# the system.read will accept arg on how many bytes to read.
so how about,
...
irb(main):043:0> File.open "test.rb" do |f| while x=f.read(2); p x; end; end
That solution essentially ignores linebreaks.
If you want to read up to a linebreak or N characters, whichever comes
first, you could one of these:------
class IO
#read by characters
def for_eachA(linelen)
c=0
while (c)
buf=''
linelen.times {
break unless c=getc
buf<<c
break if c.chr== $/
}
yield buf
end
end#read by lines
def for_eachB(linelen)
re = Regexp.new(".*?#{Regexp.escape($/)}")
buf=''
while (line = read(linelen-buf.length))
buf = (buf+line).gsub(re){|l| yield l;''}
if buf.length == linelen
yield buf
buf=''
end
end
yield buf
end
endFile.open("foreach.rb") do |f|
f.for_eachA(10){|l| p l}
endFile.open("foreach.rb") do |f|
f.for_eachB(10){|l| p l}
end
------I'd guess the second version would be faster, but I didn't time it.
-Adam
--
Posted via http://www.ruby-forum.com/\.
Tristin Davis wrote:
Thanks for the ideas Adam. I thought someone might be able to use it so
I figured i'd post it. It processed about 675,000 1100+ byte records in
an hour. Not fantastic performance, but it works. If someone can tell
me how to improve the performance then have at it.module Util
def too_large?(buffer,max=10)
return true if buffer.length >= max
false
end
endinclude Util
file = ARGV.shift #"C:/Documents and Settings/trdavi/Desktop/a1-1k.aa"
buf=''
record = 1
frequency = 100f = File.open(file,'r')
while c=f.getc
if buf.length < max #(but what if you find a '\n' before max?)
buf << c
else
buf = ''
f.gets
end
--
Posted via http://www.ruby-forum.com/\.
That's what the 2nd if statement is; for catching the delimiter if the
buffer isn't too large. I can't use gets b/c I may expend all the
memory before the actual line is read. I'm reading variable length
records, but some of them are bad data and exceed a max length of 100k.
That's what the script is scanning for.
7stud -- wrote:
Tristin Davis wrote:
Thanks for the ideas Adam. I thought someone might be able to use it so
I figured i'd post it. It processed about 675,000 1100+ byte records in
an hour. Not fantastic performance, but it works. If someone can tell
me how to improve the performance then have at it.module Util
def too_large?(buffer,max=10)
return true if buffer.length >= max
false
end
endinclude Util
file = ARGV.shift #"C:/Documents and Settings/trdavi/Desktop/a1-1k.aa"
buf=''
record = 1
frequency = 100f = File.open(file,'r')
while c=f.getc
if buf.length < max #(but what if you find a '\n' before max?)
buf << c
else
buf = ''
f.gets
end
--
Posted via http://www.ruby-forum.com/\.
Tristin Davis wrote:
That's what the 2nd if statement is; for catching the delimiter if the
buffer isn't too large. I can't use gets b/c I may expend all the
memory before the actual line is read.
Look. A string and a file are really no different--except reading from
a file is slow. Therefore, to speed things up read in the maximum every
time you read from the file, and store it in a string. Process the
string just like you would the file. Then read from the file again.
--
Posted via http://www.ruby-forum.com/\.
Gotcha, I'll post the code once i revamp
7stud -- wrote:
Tristin Davis wrote:
That's what the 2nd if statement is; for catching the delimiter if the
buffer isn't too large. I can't use gets b/c I may expend all the
memory before the actual line is read.Look. A string and a file are really no different--except reading from
a file is slow. Therefore, to speed things up read in the maximum every
time you read from the file, and store it in a string. Process the
string just like you would the file. Then read from the file again.
--
Posted via http://www.ruby-forum.com/\.
Here's the benchmarks for the old and new code:
Old: 5.484000 0.031000 5.515000 ( 5.782000)
New: 5.094000 0.047000 5.141000 ( 5.407000)
=cut
module DataVerifier
require 'strscan'
def too_large?(buffer,max=1024)
return true if buffer.length >= max
false
end
def verify_vbl(file,frequency,max,delimiter,out,cache_size)
$/=delimiter
buffer=''
buf=''
record = 1
o = File.new(out,"w")
f = File.open(file,'r')
while(buffer=f.read(cache_size=1048576))
cache=StringScanner.new(buffer)
while(c = cache.getch)
buf << c
if too_large?(buf,max)
o.print "record #{record} is too long, skipping to end\n"
while(x=cache.getch)
if x == $/
buf=''
record += 1
print "At record #{record}\n" if( (record % frequency )
== 0 ) unless frequency.nil?
break
end
end
end
if c == $/
record += 1
print "At record #{record}\n" if( (record % frequency ) == 0
) unless frequency.nil?
buf = ''
end
end
end
f.close
o.close
record
end
end
--
Posted via http://www.ruby-forum.com/.