Regexp issue on parsing from file

Alpha_Blue · 15 August 2009 02:53

Hi guys,

I've always had trouble with regexp and matching. I'm trying to figure
out how to parse a very lengthy file that contains information similar
to below:

["Name"] = {
80, -- [1]
"Company", -- [2]
"2009-08-14", -- [3]
},
["NameTwo"] = {
24, -- [1]
"Company Two", -- [2]
"2009-08-14", -- [3]
},

The only information I am interested in is the following:

["Name"]
["NameTwo"]

And to further break it down, I'm only interested out of that:

Name
NameTwo

The large file contains roughly 10,000 - 20,000 lines of the similar
data below.

I just want to do open the file, parse out the information as it finds
it, and place it into an array for writing to another file which
contains the proper syntax.

Any help would be appreciative on how to match the specific line to get
that piece of data, and how to ignore the rest of the lines it finds
that contains no matches.

Many thanks.

···

--
Posted via http://www.ruby-forum.com/.

Alpha_Blue · 15 August 2009 03:45

Okay I'm getting closer..

counter = 1
name_re = /(.*)\[(.*)\{/
File.open("census.lua", "r") do |infile|
  while (line = infile.gets)
    if (line =~ name_re)
      puts "#{counter}: #{line}"
      counter = counter + 1
    end
  end
end

This gets me the data from the one line:

["Name"] = {
["NameTwo"] = {

···

--
Posted via http://www.ruby-forum.com/.

Ben_Giddings1 · 15 August 2009 03:54

Try something like this:

File.foreach("filename") do |line|
   if md = /^\["(\w+)"\]/.match(line)
     puts md[1]
   end
end

···

On Aug 14, 2009, at 22:53, Alpha Blue wrote:

["Name"] = {
80, -- [1]
"Company", -- [2]
"2009-08-14", -- [3]
},
["NameTwo"] = {
24, -- [1]
"Company Two", -- [2]
"2009-08-14", -- [3]
},

Alpha_Blue · 15 August 2009 04:04

Hi Ben,

this actually did the trick:

counter = 1
name_re = /(.*)\[(.*)\{/
File.open("census.lua", "r") do |infile|
  while (line = infile.gets)
    if (line =~ name_re)
      s_line = line.strip
      c_line = s_line.size
      left = c_line
      right = c_line - 7
      puts "#{counter}: #{s_line.slice!(2..right)}"
      counter = counter + 1
    end
  end
end

···

--
Posted via http://www.ruby-forum.com/.

Alpha_Blue · 15 August 2009 04:05

Correction

counter = 1
name_re = /(.*)\[(.*)\{/
File.open("census.lua", "r") do |infile|
  while (line = infile.gets)
    if (line =~ name_re)
      s_line = line.strip
      c_line = s_line.size
      right = c_line - 7
      puts "#{counter}: #{s_line.slice!(2..right)}"
      counter = counter + 1
    end
  end
end

···

--
Posted via http://www.ruby-forum.com/.

7stud · 15 August 2009 07:03

Alpha Blue wrote:

Correction

counter = 1
name_re = /(.*)\[(.*)\{/
File.open("census.lua", "r") do |infile|
  while (line = infile.gets)
    if (line =~ name_re)
      s_line = line.strip
      c_line = s_line.size
      right = c_line - 7
      puts "#{counter}: #{s_line.slice!(2..right)}"
      counter = counter + 1
    end
  end
end

Ben's regex is better, and once you've done the matching, if your regex
has a group around the desired information, you can retrieve the desired
information--there is no need to slice and dice the string to extract
the information:

count = 1

IO.foreach("data.txt") do |line|
   if md = /^\["(\w+?)"\]/.match(line)
     puts "#{count}: #{md[1]}"
     count += 1
   end
end

--output:--
1: Name
2: NameTwo

The term:

/^\["(\w+)"\]/.match(line)

returns a MatchData object if there is a match. Otherwise, it returns
nil. That result is assigned to the variable md. A MatchData object
will evaluate to true. So if a MatchData object is returned, i.e. there
is a match, the if condition will evaluate to true, and the body of the
if statement will execute. Inside the if statement, you can ask the
MatchData object for the match for the group in your regex. md[0] is
the whole match, md[1] is the match for group 1, md[2], is the match for
group 2, etc.

The regexp says to look for the start of the string(^), then an opening
bracket, then a double quote mark, then any character [a-zA-Z0-9] one or
more times(+), then a quote mark, then a closing bracket.

···

--
Posted via http://www.ruby-forum.com/\.

Ben_Giddings1 · 15 August 2009 17:19

BTW, you might want to look at the more rubyish ways of iterating over the lines of a file. The way you do it works, but is more complex than it needs to be.

Ben

···

On Aug 15, 2009, at 00:05, Alpha Blue wrote:

File.open("census.lua", "r") do |infile|
while (line = infile.gets)

7stud · 15 August 2009 07:16

7stud -- wrote:

Ben's regex is better, and once you've done the matching, if your regex
has a group around the desired information, you can retrieve the desired
information--there is no need to slice and dice the string to extract
the information:

On the other hand, if you would rather avoid regex's altogether and
stick to string slicing and dicing, you could do this:

count = 1

IO.foreach("data.txt") do |line|
   if line[0, 1] == "[":
     first_piece = line.split(" ", 2)[0]
     puts "#{count}: #{first_piece[2...-2]}"
     count += 1
   end
end

--output:--
1: Name
2: NameTwo

···

--
Posted via http://www.ruby-forum.com/\.

Glenn_Jackman · 15 August 2009 13:45

7stud -- wrote:
> Ben's regex is better, and once you've done the matching, if your regex
> has a group around the desired information, you can retrieve the desired
> information--there is no need to slice and dice the string to extract
> the information:
>

On the other hand, if you would rather avoid regex's altogether and
stick to string slicing and dicing, you could do this:

count = 1

IO.foreach("data.txt") do |line|
    if line[0, 1] == "[":
      first_piece = line.split(" ", 2)[0]
      puts "#{count}: #{first_piece[2...-2]}"

Or replace those 2 lines with:
puts %Q{#{count}: #{ line.split('"')[1] }}

···

At 2009-08-15 03:16AM, "7stud --" wrote:

count += 1
end
end

--output:--
1: Name
2: NameTwo

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

7stud · 15 August 2009 14:35

Glenn Jackman wrote:

···

At 2009-08-15 03:16AM, "7stud --" wrote:

count = 1

IO.foreach("data.txt") do |line|
    if line[0, 1] == "[":
      first_piece = line.split(" ", 2)[0]
      puts "#{count}: #{first_piece[2...-2]}"

Or replace those 2 lines with:
        puts %Q{#{count}: #{ line.split('"')[1] }}

I'm not a big fan of cramming stuff into one liners, but I like this:

line.split('"')[1]
--
Posted via http://www.ruby-forum.com/\.

Alpha_Blue · 15 August 2009 16:14

Thanks for the information fellas. It's good to know of multiple ways
for accomplishing the same task and which one is better.

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
RegExp question ruby-talk	2	70	30 May 2005
Regexp.match(line) question ruby-talk	4	95	14 June 2011
Regexp Parsing -- What's the right way? ruby-talk	6	128	12 August 2006
More informative return value for Regexp ruby-talk	4	122	13 May 2005
RegExp & File read help ruby-talk	3	75	27 June 2007

Regexp issue on parsing from file

Related topics