Input file, change data, write to file

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's “Enterprise
Integration with Ruby” book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

For what it's worth, I'll be working with files that contain between
20,000 – 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here's what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

Filename: fixtest1.rb

class FixedLengthRecordFile
  def FixedLengthRecordFile.open(filename, field_sizes)

    if field_sizes.nil? or field_sizes.empty?
      raise ArgumentError, "Empty field sizes not allowed!"
    end

    field_pattern = 'a' + field_sizes.join('a')
    IO.foreach(filename) do |line|
      record = line.chomp.unpack(field_pattern)
      record.map { |f| f.strip! }
      yield record
    end
  end
end

Filename: rw1.rb

require 'fixtest1'

FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
  puts
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}"

Any feedback is greatly appreciated!

···

--
Posted via http://www.ruby-forum.com/.

Paul Br wrote:

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's “Enterprise
Integration with Ruby” book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

For what it's worth, I'll be working with files that contain between
20,000 – 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here's what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

Filename: fixtest1.rb

class FixedLengthRecordFile
  def FixedLengthRecordFile.open(filename, field_sizes)

    if field_sizes.nil? or field_sizes.empty?
      raise ArgumentError, "Empty field sizes not allowed!"
    end

    field_pattern = 'a' + field_sizes.join('a')
    IO.foreach(filename) do |line|
      record = line.chomp.unpack(field_pattern)
      record.map { |f| f.strip! }
      yield record
    end
  end
end

Filename: rw1.rb

require 'fixtest1'

FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
  puts

"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}"

Any feedback is greatly appreciated!

It is very important in a case like this to define the problem clearly.

For example, it would greatly improve the code if you were to clearly say
what the field sizes are. A list of field sizes would be a first step
toward a much more elegant and understandable program. In my solution
below, I guess about some of the field sizes.

Also, you do not want fixed width fields in your output file, as in your
diagram. The first step in the project is to understand that modern
database files use variable width fields, separated by delimiters like
tabs. In your example, you refer to tabs as delimiters, but you still show
the output format with a column scale as though fixed withs were in force.
It's not clear from your diagram that you understand that the output
record's fields won't fall on specific columns, and don't need to.

Sample code:

···

------------------------------------------

#!/usr/bin/ruby -w

data = [
   "00123 random text 3.1210/20/200610/21/2006 -3.45",
   "00253 more text 275.8707/01/200606/12/2006 13.46",
   "00254 more text 777.3407/01/200606/12/2006 14.47",
   "00255 more text 555.2107/01/200606/12/2006 15.48"
]

out_file = File.open("outfile.txt","w")

data.each do |record|
   fields = [ record[0 .. 4],record[5 .. 17],record[18 .. 23],
   record[24 .. 33],record[34 .. 43],record[44 .. 51] ]
   fields[3 .. 4].each do |field|
      field.gsub!(%r{/},"-")
   end
   out_record = fields.join("\t") + "\n"
   out_file.write out_record
end

out_file.close

----------------------------------------

Output (may wrap when posted):

00123 random text 3.12 10-20-2006 10-21-2006 -3.45
00253 more text 275.87 07-01-2006 06-12-2006 13.46
00254 more text 777.34 07-01-2006 06-12-2006 14.47
00255 more text 555.21 07-01-2006 06-12-2006 15.48

--
Paul Lutus
http://www.arachnoid.com

I think the most newbie-appealing approach to files is with

open() do |f|
end

because when the block finishes the file closes automatically. The docs
on the modes for opening files, and on the methods you need after that,
are here:

<http://www.ruby-doc.org/core/classes/IO.html&gt;

With big data that comes in lines, where each line is to be processed
independently, you presumably want two files, reading and writing a line
at a time, so the whole operation could be structured like this:

def munge(s)
  return s.gsub(/[aeiou]/, '') # but do your own task here instead
end
open("path1", "r") do |f1|
  open("path2", "w") do |f2|
    f1.each { |line| f2.puts munge(line) }
  end
end

m.

···

Paul Br <brr@blueridgeremedy.com> wrote:

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's "Enterprise
Integration with Ruby" book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

--
matt neuburg, phd = matt@tidbits.com, Matt Neuburg’s Home Page
Tiger - http://www.takecontrolbooks.com/tiger-customizing.html
AppleScript - http://www.amazon.com/gp/product/0596102119
Read TidBITS! It's free and smart. http://www.tidbits.com

Hi --

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

I found a partial solution on page 138 in Maik Schmidt's “Enterprise
Integration with Ruby” book but it lacks a means to write the output to
a file.

How can I write the output to a file using the below code?

For what it's worth, I'll be working with files that contain between
20,000 – 60,000 rows.

Below is a data sample:

01234567890123456789012345678901234567890123456789012

00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Here's what I want the file to look like with tabs between each section:

01234567890123456789012345678901234567890123456789012
123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

You might find scanf helpful. Here's a little example. Note that the
lines of data come from the DATA array, which is automatically read
from after __END__. Also, I'm using values_at to manipulate the order
in which the values get inserted into the printf string, so that I can
put the years first.

   require 'scanf'
   DATA.each do |line|
     values = line.scanf("%5d %11c %4f %d/%d/%4d%d/%d/%d %f")
     printf("%3d%12s %6.2f %04d-%02d-%02d %04d-%02d-%02d %3.2f\n",
       *values.values_at(0,1,2,5,3,4,8,6,7,9))
   end

__END__
00123 random text 3.0010/20/200610/21/2006 -3.45
00253 more text 275.0007/01/200606/12/2006 12.45

Output:

123 random text 3.00 2006-10-20 2006-10-21 -3.45
253 more text 275.00 2006-07-01 2006-06-12 12.45

David

···

On Thu, 2 Nov 2006, Paul Br wrote:

--
                   David A. Black | dblack@wobblini.net
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] Ruby for Rails | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

I'm not sure I understood what you want to do. Do you want to write the
modified data to another file or to the same file?

In the first case, all you need to do is the following (in file rw1.rb):

File.open('output_file','w'){|f|
FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4, 2,
1, 2, 1, 4, 10]) do |row|
f.write
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n"
end
}

Instead, if you want to write the data back to the same file, you could
write your FixedLengthRecordFile.open method as

def FixedLengthRecordFile.open(filename, field_sizes)
    if field_sizes.nil? or field_sizes.empty?
      raise ArgumentError, "Empty field sizes not allowed!"
    end

    field_pattern = 'a' + field_sizes.join('a')
    File.open(filename, 'r+'){|file|
      IO.foreach(filename) do |line|
        record = line.chomp.unpack(field_pattern)
        record.map { |f| f.strip! }
        file.write(yield(record))
      end
    }
end

or you could write

def FixedLengthRecordFile.open(filename, field_sizes)

    if field_sizes.nil? or field_sizes.empty?
      raise ArgumentError, "Empty field sizes not allowed!"
    end

    field_pattern = 'a' + field_sizes.join('a')
    lines=File.readlines(filename)
    File.open(filename, 'w'){|file|
      lines.each do |line|
        record = line.chomp.unpack(field_pattern)
        record.map { |f| f.strip! }
        file.write(yield(record))
      end
    }
end

I don't know whether this approach would lead to worst performances,
given the length of your files.

In both cases, the block you pass to the open method should return the
string to write:
  FixedLengthRecordFile.open('test1.abc', [2, 3, 12, 7, 2, 1, 2, 1, 4,
2,
1, 2, 1, 4, 10]) do |row|
"#{row[1]}\t#{row[2]}\t#{row[3]}\t#{row[8]}-#{row[4]}-#{row[6]}\t#{row[13]}-#{row[9]}-#{row[11]}\t#{row[14]}\n"
end

A couple of notes:
* you need to add the "\n" at the end of your string in the rw1 file,
otherwise all the rows in the original file will be written as one line
* this method will only work when all the lines of the data file have
the same structure (for example, it won't work with the first line of
your data file example above)

···

--
Posted via http://www.ruby-forum.com/.

Paul Lutus wrote:

/ ...

I'm a ruby newbie trying to read data from a file, make a few changes,
and write the output to a file so it can be imported into a MySQL
database.

A correction. I just noticed that you mentioned MySQL, and your output has
the date format 2006-07-01, typical of MySQL, something I managed to
overlook on the first read. So (note the single changed line):

···

---------------------------------------------------

#!/usr/bin/ruby -w

data = [
   #01234567890123456789012345678901234567890123456789012
   "00123 random text 3.1210/20/200610/21/2006 -3.45",
   "00253 more text 275.8707/01/200606/12/2006 13.46",
   "00254 more text 777.3407/01/200606/12/2006 14.47",
   "00255 more text 555.2107/01/200606/12/2006 15.48"
]

out_file = File.open("outfile.txt","w")

data.each do |record|
   fields = [ record[0 .. 4],record[5 .. 17],record[18 .. 23],record[24 ..
33],record[34 .. 43],record[44 .. 51] ]
   fields[3 .. 4].each do |field|
      field.gsub!(%r{(\d+)/(\d+)/(\d+)},"\\3-\\1-\\2")
   end
   out_record = fields.join("\t") + "\n"
   out_file.write out_record
end

out_file.close

-------------------------------------------

Output:

00123 random text 3.12 2006-10-20 2006-10-21 -3.45
00253 more text 275.87 2006-07-01 2006-06-12 13.46
00254 more text 777.34 2006-07-01 2006-06-12 14.47
00255 more text 555.21 2006-07-01 2006-06-12 15.48

--
Paul Lutus
http://www.arachnoid.com

Stefano,

Thanks for your reply!

I want to write the modified data to another file. Your solution was
terrific!

I should have been clearer in the initial post about the long row of
numbers. That shouldn't have been part of the data sample, as its
purpose was to document character spacing.

Thanks for the alternate solutions too. You've provided this ruby
newbie with lots of valuable tidbits!

Paul

···

--
Posted via http://www.ruby-forum.com/.