Suggestion for string parsing

Hi all,
I would like to know if there's a better way to parse a string and
assing values to variables;

Ex:

Client=MPEG-4,390000,700000,24000

I can do

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9]*),([0-9]*),([0-9]*)/

and

var1 = $1
var2 = $2
var3 = $3
var4 = $4
var4 = $5

But I'm sure there's a better way, even considering that the number of
parameters can increase and I don't want to write a long regular
expression rule, that is hard to read.

Thanks a lot for any tips

···

--
Posted via http://www.ruby-forum.com/.

# var1 = $1
# var2 = $2
# var3 = $3
# var4 = $4
# var5 = $5

hint: array

eg,

line

=> "Client=MPEG-4,390000,700000,24000"

re

=> /(\w*?)=([0-9A-Za-z -.:]*?),(\d*?),(\d*?),(\d*)/

line.match(re).captures

=> ["Client", "MPEG-4", "390000", "700000", "24000"]

also,

x,y,z=[1,2,3]

=> [1, 2, 3]

x

=> 1

z

=> 3

···

From: Me Me [mailto:emanuelef@tiscali.it]

But I'm sure there's a better way, even considering that the number of
parameters can increase and I don't want to write a long regular
expression rule, that is hard to read.

Are the parameters always delimited by commas ? In which case you could
modify the regular expression

  line =~/(\w*)=(.*)/

Then

  $2 #=> "MPEG-4,390000,700000,24000"
  $2.split(",") #=> ["MPEG-4", "390000", "700000", "24000"]

Returns you the values after the '=' sign in line as an array. For more
power you could pass this sub-string to a CSV parsing library such as
FasterCSV.

Chris

···

--
Posted via http://www.ruby-forum.com/\.

s = "Client=MPEG-4,390000,700000,24000"
    ==>"Client=MPEG-4,390000,700000,24000"
if s =~ /^\w+=\S+(,\d+)+$/
  vars = s.split( /[=,]/ )
end
    ==>["Client", "MPEG-4", "390000", "700000", "24000"]

···

On Sep 18, 3:48 am, Me Me <emanue...@tiscali.it> wrote:

Hi all,
I would like to know if there's a better way to parse a string and
assing values to variables;

Ex:

Client=MPEG-4,390000,700000,24000

I can do

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9]*),([0-9]*),([0-9]*)/

and

var1 = $1
var2 = $2
var3 = $3
var4 = $4
var4 = $5

But I'm sure there's a better way, even considering that the number of
parameters can increase and I don't want to write a long regular
expression rule, that is hard to read.

Thans for answering,
I was thinking if there some kind of c sscanf,
so that I could parse and assing to variable at the same time

so if I have

line="Client=MPEG-4,390000,700000,24000"

something like:
sscanf(line, %s=%s %s %d %d %d, val1, val2, val3, val4, val5, val6)

I don't know if there's a similar string function for this in Ruby

thanks

···

--
Posted via http://www.ruby-forum.com/.

William James wrote:

s = "Client=MPEG-4,390000,700000,24000"
    ==>"Client=MPEG-4,390000,700000,24000"
if s =~ /^\w+=\S+(,\d+)+$/
  vars = s.split( /[=,]/ )
end
    ==>["Client", "MPEG-4", "390000", "700000", "24000"]

You are right, William. That is cleaner. nice!

···

--
Posted via http://www.ruby-forum.com/\.

# I was thinking if there some kind of c sscanf,
# so that I could parse and assing to variable at the same time
# so if I have
# line="Client=MPEG-4,390000,700000,24000"
# something like:
# sscanf(line, %s=%s %s %d %d %d, val1, val2, val3, val4, val5, val6)
# I don't know if there's a similar string function for this in Ruby

you are right on scanf.
there is one in ruby, and it's a lot simpler than you think

you'll have to require it though before using,

eg,

require 'scanf'

=> false

line.scanf("%6s=%6s,%d,%d,%d,%d")

=> ["Client", "MPEG-4", 390000, 700000, 24000]

···

From: Me Me [mailto:emanuelef@tiscali.it]

line.scanf("%6s=%6s,%d,%d,%d,%d")

=> ["Client", "MPEG-4", 390000, 700000, 24000]

Thanks
the problem I have now is that the size of the string is not fixed to 6
chars.
And if I try to parse like:
line.scanf("%s=%s,%d,%d,%d,%d")
It doesn't parse the string.

Is there a way to parse any string?
thanks again

···

--
Posted via http://www.ruby-forum.com/\.

is there a way to use the scanf to parse a string not knowing how many
chars?
thanks

···

--
Posted via http://www.ruby-forum.com/.

# >> line.scanf("%6s=%6s,%d,%d,%d,%d")
# > => ["Client", "MPEG-4", 390000, 700000, 24000]
# the problem I have now is that the size of the string is not
# fixed to 6 chars.
# And if I try to parse like:
# line.scanf("%s=%s,%d,%d,%d,%d")
# It doesn't parse the string.
# Is there a way to parse any string?
# thanks again

oops, sorry, i thought it was good enough.

in that case, you'll have to use char classes,

line.scanf("%[A-Za-z]=%[A-Z1-9-],%d,%d,%d,%d")

=> ["Client", "MPEG-4", 390000, 700000, 24000]

is that ok?
kind regards -botp

···

From: Me Me [mailto:emanuelef@tiscali.it]

is there a way to use the scanf to parse a string not knowing how many
chars?

I'd still use Regexp.

line="Client=MPEG-4,390000,700000,24000"
val1,val2,val3,val4,val5 =
/^(\w*)=([^,]*),(\d*),(\d*),(\d*)/.match(line).captures

Another way:

def handle_line(v1,v2,v3,v4,v5)
  puts "I got it! #{v1} etc"
end
...
if /^(\w*)=([^,]*),(\d*),(\d*),(\d*)/ =~ line
  handle_line(*$~.captures)
end

···

--
Posted via http://www.ruby-forum.com/\.

Brian Candler wrote:

is there a way to use the scanf to parse a string not knowing how many
chars?

I'd still use Regexp.

line="Client=MPEG-4,390000,700000,24000"
val1,val2,val3,val4,val5 =
/^(\w*)=([^,]*),(\d*),(\d*),(\d*)/.match(line).captures

Another way:

def handle_line(v1,v2,v3,v4,v5)
  puts "I got it! #{v1} etc"
end
...
if /^(\w*)=([^,]*),(\d*),(\d*),(\d*)/ =~ line
  handle_line(*$~.captures)
end

thanks,
but what I would like to avoid regexp, it seems strange to me that
there's no way to parse a string providing the structure.
scanf would be great but if I put %s it doesn't get the string, unless I
put the number of chars.

···

--
Posted via http://www.ruby-forum.com/\.

thanks,
but what I would like to avoid regexp, it seems strange to me that
there's no way to parse a string providing the structure.
scanf would be great but if I put %s it doesn't get the string, unless I
put the number of chars.

%s is terminated by whitespace. You have no way of telling scanf that
you want to treat "=" (after the first field) and "," (after the second
field) as separators, rather than characters to be consumed by %s.

Well, as long as your data doesn't contain spaces, you could do

  line="Client=MPEG-4,390000,700000,24000"
  line.gsub(/[=,]/,' ').scanf("%s %s %d %d %d")

···

--
Posted via http://www.ruby-forum.com/\.

Me Me wrote:

Brian Candler wrote:

is there a way to use the scanf to parse a string not knowing how many
chars?

I'd still use Regexp.

thanks,
but what I would like to avoid regexp, it seems strange to me that
there's no way to parse a string providing the structure.

Well, you can always write a BreakApart() algorithm but I must agree
with Brian that RegEx is the way to go. After all, that is what RegEx
does. I was tempted to add BreakApart() code here but I am neither sure
that it is what you really want nor that it is the best solution for the
problem at hand.

What is the *actual* problem? If it is what you said ("I would like to
know if there's a better way to parse a string and assing values to
variables;") then RegEx is a fine solution. If you reject a good
solution and seek something else, then it can only be that you are
actually seeking a solution to a different problem. So, what are you
*really* looking for?

···

--
Posted via http://www.ruby-forum.com/\.

What is the *actual* problem? If it is what you said ("I would like to
know if there's a better way to parse a string and assing values to
variables;") then RegEx is a fine solution. If you reject a good
solution and seek something else, then it can only be that you are
actually seeking a solution to a different problem. So, what are you
*really* looking for?

I'm quite new to Ruby and I can understand that athere are better way to
do things, what I would like to avoid is to write something like this
(that works)

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9A-Za-z -.:]*),([0-9A-Za-z
-.:]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*)/

I wanted to this in one line but in a more mainable way, otherwise I
could always pars the string char by char.

···

--
Posted via http://www.ruby-forum.com/\.

What is the *actual* problem? If it is what you said ("I would like to
know if there's a better way to parse a string and assing values to
variables;") then RegEx is a fine solution. If you reject a good
solution and seek something else, then it can only be that you are
actually seeking a solution to a different problem. So, what are you
*really* looking for?

I'm quite new to Ruby and I can understand that athere are better way to
do things, what I would like to avoid is to write something like this
(that works)

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9A-Za-z -.:]*),([0-9A-Za-z
-.:]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*)/

I believe there's a bug in your regex. I assume you don't really mean all characters between space and period in the second character class, especially since that includes a comman.

I wanted to this in one line but in a more mainable way, otherwise I
could always pars the string char by char.

I would probably do it in two steps. Match the bit before and after the equal sign in one, then split() the after bit on commas:

#!/usr/bin/env ruby -wKU

if "Client=MPEG-4,390000,700000,24000" =~ /\A([^=]+)=([^=]+)\z/
   p [$1, *$2.split(",")]
end

__END__

Here's another idea using StringScanner:

#!/usr/bin/env ruby -wKU

require "strscan"

class SimpleParser
   def initialize(data)
     s = StringScanner.new(data)
     @values =

     @values << s.matched if s.scan(/\w+/)
     @values << s.matched[1..-1] if s.scan(/=[0-9A-Za-z \-.:]+/)
     @values << s.matched[1..-1] while s.scan(/,[0-9]+/)
   end

   attr_reader :values
end

p SimpleParser.new("Client=MPEG-4,390000,700000,24000").values

__END__

Hope that gives you some fresh ideas.

James Edward Gray II

···

On Sep 18, 2008, at 8:46 AM, Me Me wrote:

what I would like to avoid is to write something like this
(that works)

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9A-Za-z -.:]*),([0-9A-Za-z
-.:]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*)/

I wanted to this in one line but in a more mainable way, otherwise I
could always pars the string char by char.

If you don't actually need to match the data against a pattern, then
just use

  line.split(',')

If you only want to proceed if the line is "valid", then write a
suitable regexp pattern to validate it. There are plenty of shortcuts.
For example, \d is the same as [0-9]. {n} means repeat the preceeding
element exactly n times. So:

  case line
  when /^(\w*)=([^,]*),(\d+(,\d+){9})$/
    key1 = $1
    key2 = $2
    numbers = $3.split(/,/).collect { |n| n.to_i }
    # or: numbers = $3.scanf("%d %d %d %d %d %d %d %d %d %d") if you
prefer
  else
    puts "Invalid line!"
  end

That matches word=string,n,n,n,n,n,n,n,n,n,n

Furthermore you can substitute patterns you use repeatedly:

  WORD = "[0-9A-Za-z -.:]*"
  ...
  when /^(#{WORD})=(#{WORD}),(#{WORD}),(\d+(,\d+){9})$/o

(//o means that the regexp is built only once, the substitutions aren't
done every time round)

You can also use extended syntax to make the RE more maintainable:

  VALID_LINE = %r{ ^
    (\w*) = # key ($1)
    (#{WORD}), # format ($2)
    (\d+), # size ($3)
    (\d+) # sample rate ($4)
  $ }x

  if VALID_LINE =~ line
    ..
  end

You can also do groupings which *don't* capture data using (?: .. )

Compact enough?

···

--
Posted via http://www.ruby-forum.com/\.

Me Me wrote:

What is the *actual* problem? If it is what you said ("I would like to
know if there's a better way to parse a string and assing values to
variables;") then RegEx is a fine solution. If you reject a good
solution and seek something else, then it can only be that you are
actually seeking a solution to a different problem. So, what are you
*really* looking for?

I'm quite new to Ruby and I can understand that athere are better way to
do things, what I would like to avoid is to write something like this
(that works)

line =~ /(\w*)=([0-9A-Za-z -.:]*),([0-9A-Za-z -.:]*),([0-9A-Za-z
-.:]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*),([0-9]*)/

I wanted to this in one line but in a more mainable way, otherwise I
could always pars the string char by char.

AHA! I understand, or I at least flatter myself that I do. how about
this:

require 'scanf'

s = "Client=MPEG-4,390000,700000,24000,9452349,234583475,2452345"
val = s.scanf("%6s=%s")
vals = val[1].split(",")
p vals

=> ["MPEG-4", "390000", "700000", "24000", "9452349", "234583475",
"2452345"]

···

--
Posted via http://www.ruby-forum.com/\.