[Newbie Alert] Probably missing something simple here ... help {wimper}

(Daniel Sheppard) #1

Sorting wont help minimise the number of files open with the code that
Wilson posted, and if your file is hooooj, sorting it could be extremely
time consuming in itself.

If you want to minimise opening and closing files with a sorted (or at
least, mostly sorted) input file, you'd need something like:

out_key = nil
out_file = nil
begin
  File.open("meshdata.txt").each_line do |line|
    thingy = line[0,4]
    unless thingy == out_key do
      out_file.close if out_file
      out_file = File.open(thingy, "a+")
      out_key = thingy
    end
    out_file.puts line
  end
ensure
  out_file.close if out_file
end

If you want to minimise opening and closing of files of an unsorted
file, you'd need something like:

out_files = Hash.new do |h,k|
  h[k] = File.open(k,"a+")
end
begin
  File.open("meshdata.txt").each_line do |line|
    out_files[line[0,4]].puts line
  end
ensure
  out_files.each_value {|v| v.close}
end

The above technique will probably cause you to run out of file handles,
so unless you know there's only a few output files, you'd need to
account for that:

MAX_FILES=100
open_file_keys = []
out_files = Hash.new do |h,k|
  while h.length >= MAX_FILES
    file = h.delete(open_file_keys.shift)
    file.close
  end
  h[k] = File.open(k,"a+")
  open_file_names << k
end
begin
  File.open("meshdata.txt").each_line do |line|
    out_files[line[0,4]].puts line
  end
ensure
  out_files.each_value {|v| v.close}
end

But if the keys were fairly randomly distributed in the file, you'd
probably end up opening an closing files as often as using Wilson's
original technique, so all that extra code is pointless.

···

-----Original Message-----
From: Wilson Bilkovich [mailto:wilsonb@gmail.com]
Sent: Wednesday, 17 August 2005 7:18 AM
To: ruby-talk ML
Subject: Re: [Newbie Alert] Probably missing something simple here ...
help {wimper}

On 8/16/05, B. Angell <lists@activepipes.com> wrote:

Am trying to read from a file and write to a number of files based on
the first 4 letters of the data line. Therefore, I want to be able to

write the line: A0037775830|lkajsdlkfjsaljf;lsakjfdsa;jf to file A003
*and* append all of the A003 lines as well. Here is the hack I have
below, however, produces errors and I know I am missing something
simple/easy ..... as follows:

#!/usr/bin/ruby -w
File.open("meshdata.txt") do |file|
while line = file.gets
a = line
b = a[0,4]
  File.open(b,"w") do |afile|
   puts a.afile

  end
end
end

I'm at the end of a long day here, and I haven't bothered to try to
understand your goals/requirements/etc, so forgive me if this is just a
senseless babble. However, here's what comes to mind:

#!/usr/bin/ruby -w
File.open("meshdata.txt").each_line do |line| thingy = line[0,4]
File.open(thingy,"a+") do |new_file|
  new_file.puts line
end
end

If your input file is hooooj, you would probably want to sort it first,
to minimize the number of times you open and close files.

#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################

(Wilson Bilkovich) #2

Yeah, sorry.. I meant to write a couple more sentences there about how
the code was opening and closing the file, etc. I was headed out the
door when I saw the e-mail. Heh.
Unless the file is more than 100MB or so in size, it probably doesn't
need any optimization, though.

--Wilson.

···

On 8/16/05, Daniel Sheppard <daniels@pronto.com.au> wrote:

Sorting wont help minimise the number of files open with the code that
Wilson posted, and if your file is hooooj, sorting it could be extremely
time consuming in itself.

(B. Angell) #3

I want to thank all of you who helped in this! The file was approximately 140MB and it chugged through in approximately 1.5 hours (Laptop IBM T41 w/ 2MB RAM 1.6MHz processor). Anyway, am grateful to have the resource(s) here, especially as I am learning a new language and have not quite got the nuances.

···

On Aug 16, 2005, at 19:47, Wilson Bilkovich wrote:

On 8/16/05, Daniel Sheppard <daniels@pronto.com.au> wrote:

Sorting wont help minimise the number of files open with the code that
Wilson posted, and if your file is hooooj, sorting it could be extremely
time consuming in itself.

Yeah, sorry.. I meant to write a couple more sentences there about how
the code was opening and closing the file, etc. I was headed out the
door when I saw the e-mail. Heh.
Unless the file is more than 100MB or so in size, it probably doesn't
need any optimization, though.

--Wilson.