File data extraction

Rolf_Pedersen · 5 July 2010 08:48

Hi

I have a file with the following format (example):

Save Format v3.0(19990112)
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;

The data in the two begin/end blocks are lists, which may be longer than
shown.

I'd like to extract an array of the filenames (first quote) in the @begin
Objects ... @end; block.
For the example above this should return ["n_cst_xml_utils.sru",
"n_melding.sru"]

My initial idea was to treat the whole thing as one long string, and extract
the part within the being-end-block by using regexp, converting the result
back to individual lines (split '\n') and doing array.map and regexp to
single out the name in the first quote on each line.
But I keep hitting the wall, especially with the first step in this
approach... :o(

I know this should be easily done in a couple of lines of code, but I can't
get it right.

Appreciate any help!

Best regards,
Rolf

Brian_Candler · 5 July 2010 10:12

Rolf Pedersen wrote:

My initial idea was to treat the whole thing as one long string, and
extract
the part within the being-end-block by using regexp, converting the
result
back to individual lines (split '\n') and doing array.map and regexp to
single out the name in the first quote on each line.
But I keep hitting the wall, especially with the first step in this
approach... :o(

How about this for starters:

p src.scan(/^@begin(.*?)^@end;/m)

···

--
Posted via http://www.ruby-forum.com/\.

W_James · 5 July 2010 18:25

puts DATA.read.scan(/^@begin Objects(.*?)^@end;/m).flatten.
map{|s| s.strip.to_a}.flatten.map{|s| s.split(/"/)[1]}

__END__
Save Format v3.0(19990112)
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;

···

On Jul 5, 3:48 am, Rolf Pedersen <rolf...@gmail.com> wrote:

[Note: parts of this message were removed to make it a legal post.]

Hi

I have a file with the following format (example):

Save Format v3.0(19990112)
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;

The data in the two begin/end blocks are lists, which may be longer than
shown.

I'd like to extract an array of the filenames (first quote) in the @begin
Objects ... @end; block.
For the example above this should return ["n_cst_xml_utils.sru",
"n_melding.sru"]

My initial idea was to treat the whole thing as one long string, and extract
the part within the being-end-block by using regexp, converting the result
back to individual lines (split '\n') and doing array.map and regexp to
single out the name in the first quote on each line.
But I keep hitting the wall, especially with the first step in this
approach... :o(

I know this should be easily done in a couple of lines of code, but I can't
get it right.

Appreciate any help!

Best regards,
Rolf

Rolf_Pedersen · 5 July 2010 11:52

Thanks Brian, that helped me a lot ! :o)

The code now looks like this:

filenames = File.open(filename).readlines.join.scan(/^@begin
Objects\n(.*?)^@end;/m)[0][0].split("\n").map{|l| l.scan(/"(.*?)"/)[0][0]}
Probably far from optimal, but it seems to do the trick.

Best regards,
Rolf

···

On Mon, Jul 5, 2010 at 12:12 PM, Brian Candler <b.candler@pobox.com> wrote:

Rolf Pedersen wrote:
> My initial idea was to treat the whole thing as one long string, and
> extract
> the part within the being-end-block by using regexp, converting the
> result
> back to individual lines (split '\n') and doing array.map and regexp to
> single out the name in the first quote on each line.
> But I keep hitting the wall, especially with the first step in this
> approach... :o(

How about this for starters:

p src.scan(/^@begin(.*?)^@end;/m)
--
Posted via http://www.ruby-forum.com/\.

Rolf_Pedersen · 7 July 2010 09:35

Robert:
The use of flip flop operator opened a new door for me. Didn't know of this
before...
And new knowledge is the best knowledge! :o)

w_a_x_man:
I can't believe I didn't think of the possibility to use a simple split
instead of a scan to extract the filenames between the first two quotation
marks!

Thanks to all for the great input I've gotten on this issue!
:o)

Best regards,
Rolf

···

On Mon, Jul 5, 2010 at 8:25 PM, w_a_x_man <w_a_x_man@yahoo.com> wrote:

On Jul 5, 3:48 am, Rolf Pedersen <rolf...@gmail.com> wrote:
> [Note: parts of this message were removed to make it a legal post.]
>
> Hi
>
> I have a file with the following format (example):
>
> Save Format v3.0(19990112)
> @begin Libraries
> "felles.pbl" "";
> @end;
> @begin Objects
> "n_cst_xml_utils.sru" "felles.pbl";
> "n_melding.sru" "felles.pbl";
> @end;
>
> The data in the two begin/end blocks are lists, which may be longer than
> shown.
>
> I'd like to extract an array of the filenames (first quote) in the @begin
> Objects ... @end; block.
> For the example above this should return ["n_cst_xml_utils.sru",
> "n_melding.sru"]
>
> My initial idea was to treat the whole thing as one long string, and
extract
> the part within the being-end-block by using regexp, converting the
result
> back to individual lines (split '\n') and doing array.map and regexp to
> single out the name in the first quote on each line.
> But I keep hitting the wall, especially with the first step in this
> approach... :o(
>
> I know this should be easily done in a couple of lines of code, but I
can't
> get it right.
>
> Appreciate any help!
>
> Best regards,
> Rolf

puts DATA.read.scan(/^@begin Objects(.*?)^@end;/m).flatten.
map{|s| s.strip.to_a}.flatten.map{|s| s.split(/"/)[1]}

__END__
Save Format v3.0(19990112)
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;

Brian_Candler · 5 July 2010 12:07

Rolf Pedersen wrote:

The code now looks like this:

filenames = File.open(filename).readlines.join.scan(/^@begin
Objects\n(.*?)^@end;/m)[0][0].split("\n").map{|l|
l.scan(/"(.*?)"/)[0][0]}
Probably far from optimal, but it seems to do the trick.

That's the most important thing

I actually misread your example. If there's only one @begin Objects
section, then 'scan' is overkill; a simple regexp match will do.

res = if File.read(filename) =~ /^@begin Objects$(.*?)^@end;$/m
$1.scan(/^\s*"(.*?)"/).map { |r| r.first }
end

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 5 July 2010 15:51

If files are large than the line based approach is usually more
feasible. In this case you can use the flip flop operator in an if
condition to select the lines we want:

17:31:49 Temp$ ./lextr.rb
["n_cst_xml_utils.sru", "n_melding.sru"]
17:48:32 Temp$ cat lextr.rb
#!/bin/env ruby19

ar =

DATA.each_line do |line|
  if /^@begin Objects/ =~ line .. /^end;/ =~ line
    name = line[/^\s*"([^"]*)"/, 1] and ar << name
  end
end

p ar

__END__
Save Format v3.0(19990112)
@begin Libraries
"felles.pbl" "";
@end;
@begin Objects
"n_cst_xml_utils.sru" "felles.pbl";
"n_melding.sru" "felles.pbl";
@end;
17:49:30 Temp$

Kind regards

robert

···

2010/7/5 Brian Candler <b.candler@pobox.com>:

Rolf Pedersen wrote:

The code now looks like this:

filenames = File.open(filename).readlines.join.scan(/^@begin
Objects\n(.*?)^@end;/m)[0][0].split("\n").map{|l|
l.scan(/"(.*?)"/)[0][0]}
Probably far from optimal, but it seems to do the trick.

That's the most important thing

I actually misread your example. If there's only one @begin Objects
section, then 'scan' is overkill; a simple regexp match will do.

res = if File.read(filename) =~ /^@begin Objects$(.*?)^@end;$/m
$1.scan(/^\s*"(.*?)"/).map { |r| r.first }
end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Help regarding extracting a particular value in a file ruby-talk	1	105	26 May 2006
Help regarding extracting a particular value in a file ruby-talk	0	111	26 May 2006
Extracting part of a string ruby-talk	4	153	17 July 2012
Regexp issue on parsing from file ruby-talk	10	135	15 August 2009
Pulling values from strings ruby-talk	4	81	15 April 2009

File data extraction

Related topics