New at regexp and Ruby need help on parsing a string

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me


Posted via

Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>, 'not'=>}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

  case pieces[index]
  when 'and '
    smoking_table['and'] << pieces[index+1].strip
    index +=2
  when 'not '
    smoking_table['not'] << pieces[index+1].strip
    index += 2
    index += 1


p smoking_table


Posted via\.

Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|

=> {:and => ["stuff", "nice things", "girlsandboys"],
    :not => ["bad girls", "greasy boys"]}

I hope that this works for you,



Posted via\.

7stud -- wrote:

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>, 'not'=>}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

  case pieces[index]
  when 'and '
    smoking_table['and'] << pieces[index+1].strip
    index +=2
  when 'not '
    smoking_table['not'] << pieces[index+1].strip
    index += 2
    index += 1


p smoking_table

Normally when you split() a string, you do something like this:

str = 'aXbXc'
pieces = str.split('X')
p pieces
-->["a", "b", "c"]

Notice that the pattern you use to split the string is not part of the
results-it's chopped out of the string and the pieces are what's left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:

["", "and ", "stuff ", "and ", "nice things ", "not ", "bad girls ",
"not ", "greasy boys\n", "and ", "girlsandboys"]

By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ' or 'not '. The 'and '
or 'not ' then serves as an identifier for each piece of the string.


Posted via\.

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.



Posted via\.

Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.

Raul Parolari wrote:


Gabra Kadabra wrote:

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|

=> {:and => ["stuff", "nice things", "girlsandboys"],
    :not => ["bad girls", "greasy boys"]}

I hope that this works for you,


Posted via\.

Gabra Kadabra wrote:

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Don't be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way--yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex's sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.

In addition, I find one liners hard to decipher, and since I don't
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.

Peter Vanderhaden wrote:

I used puts smoking_table. I'm assuming that's not the correct
way to do it.

Use the p command instead of puts to get the nice dictionary format.


Posted via\.

Gabra Kadabra wrote:

Raul Parolari wrote:

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.


I totally agree with you; this is not a subject that you learn 'just
trying' or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your

By the way, the code above did not deal with 'notorious bad girls' (I
mean words beginning with 'not'); I had only checked for an absence of
prefix, not of suffix. So, here it is (the '\b' before and after a word
makes sure that it is indeed a 'word'):

str = "and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys"

h = { :and => , :not => }

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
  h[k.to_sym].push(v.strip) }

p h # => {:and=>["stuff", "nice things", "girslsandboys"],
    # :not=>["notorious bad girls", "greasy boys"]}

Peter Vanderhaden wrote

Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out ..

By default, the puts/print methods for hashes concatenate keys and
values; you can use 'p' (or 'puts inspect') to see the hash. If you are
in irb, just writing the name of the hash will show it to you.



Posted via\.

p smoking_table

(Same as stud's example).



On Nov 23, 10:29 am, Peter Vanderhaden <> wrote:

Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.

Raul Parolari wrote:
> Gabra Kadabra wrote:
>> I'm building a little test console for a ruby project. When using a
>> function I might get something like this:

>> input_string ="and stuff and nice things not bad girls not greasy boys
>> and girlsandboys"

>> As you already have guessed, I want the following in some kind of
>> format:

>> smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
>> "not" => ["bad girls","greasy boys"]}

>> Thus, a regexp that splits a string on code words like "and" and "not"
>> is what I need.
>> Please help me

> # One possible implementation is:

> smoking_table = { :and => , :not => }

> str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
> smoking_table[k.to_sym].push(v.strip)
> end

> => {:and => ["stuff", "nice things", "girlsandboys"],
> :not => ["bad girls", "greasy boys"]}

> I hope that this works for you,

> Raul

Posted via Hide quoted text -

- Show quoted text -

When I typed the final solution, an unwanted '}' got in. I post again
the code:

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|



Posted via