Parsing challenge

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

“Ara.T.Howard” ahoward@fsl.noaa.gov wrote in message
news:Pine.LNX.4.53.0310072218560.32521@eli.fsl.noaa.gov

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :slight_smile:

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt

pat = %r{([^^]+)^([^^]+)^([^^]+)^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}

~/eg/ruby > ./parse.rb

[" CODE#1", “DESCRIPTION”, “CODE#2”, “NOTES”]
[" NN-110", “an info of NN-001”, “BRY234”, “some notes”]
[" NN-111", “1st line data\n 2nd line data\n 3rd line data”,
“BRT345”, “another notes”]
[" NN-112", “description of NN-112”, “BBC23”, “multiline\n notes
blah\n blah\n blah”]
[" NN-113", “info info”, “MNO12”, “some notes here”]

-a

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
The difference between art and science is that science is what we
understand
well enough to explain to a computer. Art is everything else.
– Donald Knuth, “Discover”
~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e “print
"\x3a\x2d\x29\x0a"”; done’

···

On Tue, 7 Oct 2003, Artco News wrote:

Useko Netsumi wrote:

this script failed if any of the cell is blank/no-value,
e.g:

You may be able to simply change each of the “+” (one or more) in the regex into “*” (zero or more). I gave it a quick test and it seems to work OK, but I didn’t test very hard.

pat = %r{([^^])^([^^])^([^^])^([^^])\n}mox

Good luck.

Harry O.

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

“Useko Netsumi” usenets@nyc.rr.com wrote in message
news:bm08ec$gd9eb$1@ID-159205.news.uni-berlin.de

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

“Ara.T.Howard” ahoward@fsl.noaa.gov wrote in message
news:Pine.LNX.4.53.0310072218560.32521@eli.fsl.noaa.gov

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following
format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g.
MySQL/Access?

···

On Tue, 7 Oct 2003, Artco News wrote:

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :slight_smile:

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt

pat = %r{([^^]+)^([^^]+)^([^^]+)^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}

~/eg/ruby > ./parse.rb

[" CODE#1", “DESCRIPTION”, “CODE#2”, “NOTES”]
[" NN-110", “an info of NN-001”, “BRY234”, “some notes”]
[" NN-111", “1st line data\n 2nd line data\n 3rd line data”,
“BRT345”, “another notes”]
[" NN-112", “description of NN-112”, “BBC23”, “multiline\n notes
blah\n blah\n blah”]
[" NN-113", “info info”, “MNO12”, “some notes here”]

-a

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
The difference between art and science is that science is what we
understand
well enough to explain to a computer. Art is everything else.
– Donald Knuth, “Discover”
~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e “print
"\x3a\x2d\x29\x0a"”; done’
====================================

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

file: parse.rb
----CUT----
#!/usr/bin/env ruby
require ‘mysql’

command line args

host, user, passwd, db, relation = ARGV
db ||= ‘test’
relation ||= ‘test’

connect to db

mysql = Mysql.connect host, user, passwd
mysql.select_db db

parse

txt = DATA.read
pat = %r{([^^])^([^^])^([^^])^([^^])\n}mox
tuples = txt.scan pat

insert tuples

sql = “insert into %s values(‘%s’,‘%s’,‘%s’,‘%s’)”
tuples.each do |tuple|
begin
insert = sql % [relation, *tuple]
mysql.query insert
rescue Exception => e
p e
end
end

show results

res = mysql.query(‘select * from %s’ % [relation])
while((row = res.fetch_row))
p row
end

sample input is embedded below - can be read via DATA object

END
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
----CUT----

running it looks like:

~/eg/ruby > ./parse.rb
[“CODE#1”, “DESCRIPTION”, “CODE#2”, “NOTES”]
[“NN-110”, “an info of NN-001”, “BRY234”, “some notes”]
[“NN-111”, “1st line data\n2nd line data\n3rd line data”, “BRT345”, “another notes”]
[“NN-112”, “description of NN-112”, “BBC23”, “multiline\nnotes blah\nblah\nblah”]
[“NN-113”, “info info”, “MNO12”, “some notes here”]

i created a database named ‘test’, and a table named ‘test’ using ‘create
table test(f0 text,f1 text,f2 text,f3 text)’

hope that gets you going.

-a

···

On Wed, 8 Oct 2003, Useko Netsumi wrote:

“Useko Netsumi” usenets@nyc.rr.com wrote in message
news:bm08ec$gd9eb$1@ID-159205.news.uni-berlin.de

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

“Ara.T.Howard” ahoward@fsl.noaa.gov wrote in message
news:Pine.LNX.4.53.0310072218560.32521@eli.fsl.noaa.gov

On Tue, 7 Oct 2003, Artco News wrote:

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following
format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g.
MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :slight_smile:

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt

pat = %r{([^^]+)^([^^]+)^([^^]+)^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}

~/eg/ruby > ./parse.rb

[" CODE#1", “DESCRIPTION”, “CODE#2”, “NOTES”]
[" NN-110", “an info of NN-001”, “BRY234”, “some notes”]
[" NN-111", “1st line data\n 2nd line data\n 3rd line data”,
“BRT345”, “another notes”]
[" NN-112", “description of NN-112”, “BBC23”, “multiline\n notes
blah\n blah\n blah”]
[" NN-113", “info info”, “MNO12”, “some notes here”]

-a

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
The difference between art and science is that science is what we
understand
well enough to explain to a computer. Art is everything else.
– Donald Knuth, “Discover”
~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e “print
"\x3a\x2d\x29\x0a"”; done’
====================================

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ara.t.howard@noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
The difference between art and science is that science is what we understand
well enough to explain to a computer. Art is everything else.
– Donald Knuth, “Discover”
~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e “print "\x3a\x2d\x29\x0a"”; done’
====================================