New at regexp and Ruby need help on parsing a string

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me

···

--
Posted via http://www.ruby-forum.com/.

Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>, 'not'=>}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

  case pieces[index]
  when 'and '
    smoking_table['and'] << pieces[index+1].strip
    index +=2
  when 'not '
    smoking_table['not'] << pieces[index+1].strip
    index += 2
  else
    index += 1
  end

end

p smoking_table

···

--
Posted via http://www.ruby-forum.com/\.

Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
  smoking_table[k.to_sym].push(v.strip)
end

=> {:and => ["stuff", "nice things", "girlsandboys"],
    :not => ["bad girls", "greasy boys"]}

I hope that this works for you,

Raul

···

--
Posted via http://www.ruby-forum.com/\.

7stud -- wrote:

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>, 'not'=>}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

  case pieces[index]
  when 'and '
    smoking_table['and'] << pieces[index+1].strip
    index +=2
  when 'not '
    smoking_table['not'] << pieces[index+1].strip
    index += 2
  else
    index += 1
  end

end

p smoking_table

Normally when you split() a string, you do something like this:

str = 'aXbXc'
pieces = str.split('X')
p pieces
-->["a", "b", "c"]

Notice that the pattern you use to split the string is not part of the
results-it's chopped out of the string and the pieces are what's left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:

["", "and ", "stuff ", "and ", "nice things ", "not ", "bad girls ",
"not ", "greasy boys\n", "and ", "girlsandboys"]

By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ' or 'not '. The 'and '
or 'not ' then serves as an identifier for each piece of the string.

···

--
Posted via http://www.ruby-forum.com/\.

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
  smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.

···

--
Posted via http://www.ruby-forum.com/\.

Raul,
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.
Thanks,
PV

Raul Parolari wrote:

···

Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
  smoking_table[k.to_sym].push(v.strip)
end

=> {:and => ["stuff", "nice things", "girlsandboys"],
    :not => ["bad girls", "greasy boys"]}

I hope that this works for you,

Raul

--
Posted via http://www.ruby-forum.com/\.

Gabra Kadabra wrote:

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
  smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Don't be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way--yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex's sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.

In addition, I find one liners hard to decipher, and since I don't
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.

Peter Vanderhaden wrote:

I used puts smoking_table. I'm assuming that's not the correct
way to do it.

Use the p command instead of puts to get the nice dictionary format.

···

--
Posted via http://www.ruby-forum.com/\.

Gabra Kadabra wrote:

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
  smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.

I totally agree with you; this is not a subject that you learn 'just
trying' or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your
friends.

By the way, the code above did not deal with 'notorious bad girls' (I
mean words beginning with 'not'); I had only checked for an absence of
prefix, not of suffix. So, here it is (the '\b' before and after a word
makes sure that it is indeed a 'word'):

str = "and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys"

h = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
  h[k.to_sym].push(v.strip) }
end

p h # => {:and=>["stuff", "nice things", "girslsandboys"],
    # :not=>["notorious bad girls", "greasy boys"]}

Peter Vanderhaden wrote

Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out ..

By default, the puts/print methods for hashes concatenate keys and
values; you can use 'p' (or 'puts inspect') to see the hash. If you are
in irb, just writing the name of the hash will show it to you.

Regards
Raul

···

--
Posted via http://www.ruby-forum.com/\.

p smoking_table

(Same as stud's example).

HTH,
Richard

···

On Nov 23, 10:29 am, Peter Vanderhaden <bostonanti...@yahoo.com> wrote:

Raul,
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.
Thanks,
PV

Raul Parolari wrote:
> Gabra Kadabra wrote:
>> I'm building a little test console for a ruby project. When using a
>> function I might get something like this:

>> input_string ="and stuff and nice things not bad girls not greasy boys
>> and girlsandboys"

>> As you already have guessed, I want the following in some kind of
>> format:

>> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
>> "not" => ["bad girls","greasy boys"]}

>> Thus, a regexp that splits a string on code words like "and" and "not"
>> is what I need.
>> Please help me

> # One possible implementation is:

> smoking_table = { :and => , :not => }

> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
> smoking_table[k.to_sym].push(v.strip)
> end

> => {:and => ["stuff", "nice things", "girlsandboys"],
> :not => ["bad girls", "greasy boys"]}

> I hope that this works for you,

> Raul

--
Posted viahttp://www.ruby-forum.com/.- Hide quoted text -

- Show quoted text -

When I typed the final solution, an unwanted '}' got in. I post again
the code:

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
  h[k.to_sym].push(v.strip)
end

Regards
Raul

···

--
Posted via http://www.ruby-forum.com/.