Tough Ruby Homework

Rory_Pascua · 14 September 2011 14:05

I'm trying to take a long piece of text, find a word, and get that word
and the 3 words on either side of it and put that new "string" into
another variable.

Example:

I have a sentence like "Robert likes green beans, girls with moustaches,
and teddy bears. John thinks Robert is strange". I am searching for
the word "Robert", so I want to return the following:

["Robert likes green", "bears. John thinks Robert is strange."]
(doesn't have to be in an array, but you get the idea)

I obviously use index to get the places where "Robert" can be found, but
any suggestion on how to do the rest?

Bonus points: if you can do the same thing for multiple words...back to
the example, but search for "green AND teddy"...you'd get:

["Robert likes green beans, girls", "with moustaches, and teddy bears.
John thinks"] as a result.

I'm posting this because I couldn't seem to find an easy way to do it..

···

--
Posted via http://www.ruby-forum.com/.

serialhex · 14 September 2011 14:20

...if it's homework, why are you simply asking us?

it would be better to write it and ask "how can this be improved?"
than not and ask "how can this be done?"

but that's just my opinion...
hex

···

--
* my blog is cooler than yours: http://serialhex.github.com
* The wise man said: "Never argue with an idiot. They bring you down
to their level and beat you with experience."
* As a programmer, it is your job to put yourself out of business.
What you do today can be automated tomorrow. ~Doug McIlroy

Josh_Cheek · 14 September 2011 14:23

Check out String#split (
http://rdoc.info/stdlib/core/1.9.2/String#split-instance_method\) that should
help you get it into an array, which should be a lot easier to work with.

···

On Wed, Sep 14, 2011 at 9:05 AM, Rory Pascua <rorypascua@yahoo.com> wrote:

I'm trying to take a long piece of text, find a word, and get that word
and the 3 words on either side of it and put that new "string" into
another variable.

Example:

I have a sentence like "Robert likes green beans, girls with moustaches,
and teddy bears. John thinks Robert is strange". I am searching for
the word "Robert", so I want to return the following:

["Robert likes green", "bears. John thinks Robert is strange."]
(doesn't have to be in an array, but you get the idea)

I obviously use index to get the places where "Robert" can be found, but
any suggestion on how to do the rest?

Bonus points: if you can do the same thing for multiple words...back to
the example, but search for "green AND teddy"...you'd get:

["Robert likes green beans, girls", "with moustaches, and teddy bears.
John thinks"] as a result.

I'm posting this because I couldn't seem to find an easy way to do it..

--
Posted via http://www.ruby-forum.com/\.

Harry3 · 14 September 2011 15:23

I'm trying to take a long piece of text, find a word, and get that word
and the 3 words on either side of it and put that new "string" into
another variable.

I don't know what work homework means.
But, I learn something from these things and maybe someone else will, too.
So, here is a step towards the first part.
If it is wrong, you can fix it.

str = "If Robert likes green beans, girls with mustaches, and teddy
bears, John thinks Robert is strange."

f,g = str.split(/\s+/), "Robert"
p (0...f.size).select{|x| f==g}.map{|y| (y-[3,y].min..y+3)}.map{|z| f[z]}

#> [["If", "Robert", "likes", "green", "beans,"], ["bears,", "John",
"thinks", "Robert", "is", "strange."]]

Harry

Matt9 · 15 September 2011 17:40

It might help to note that an array is an enumerable and that enumerable
gives you each_slice. So if you really mean a word and the 3 words on
either side of it, that's 7 words - and so if you take slices of 7
elements, you can examine each one to see if its middle item is your
word. m.

···

Rory Pascua <rorypascua@yahoo.com> wrote:

I'm trying to take a long piece of text, find a word, and get that word
and the 3 words on either side of it and put that new "string" into
another variable.

--
matt neuburg, phd = matt@tidbits.com <http://www.tidbits.com/matt/>
A fool + a tool + an autorelease pool = cool!
AppleScript: the Definitive Guide - Second Edition!
Matt Neuburg’s Home Page

Harry3 · 16 September 2011 07:10

Bonus points: if you can do the same thing for multiple words...back to
the example, but search for "green AND teddy"...you'd get:

["Robert likes green beans, girls", "with moustaches, and teddy bears.
John thinks"] as a result.

Is this what you want with multiple words?

astring = "Robert likes green beans, girls with mustaches, and teddy
bears. John thinks Robert is strange."

def my_get(str, num, substrings)
  f,g,n = str.delete(",.").split(/\s+/), substrings, num
  s = f.size
  (0...s).select{|x| g.include?(f)}.map{|y|
([0,y-n].max..[y+n,s-1].min)}.map{|z| f[z]}
end

p my_get(astring,3,["Robert","and","teddy","bears","strange"])

# Output
#> [["Robert", "likes", "green", "beans"], ["girls", "with",
"mustaches", "and", "teddy", "bears", "John"], ["with", "mustaches",
"and", "teddy", "bears", "John", "thinks"], ["mustaches", "and",
"teddy", "bears", "John", "thinks", "Robert"], ["bears", "John",
"thinks", "Robert", "is", "strange"], ["thinks", "Robert", "is",
"strange"]]

Harry

Rory_Pascua · 14 September 2011 14:27

.serialhex .. wrote in post #1021921:

...if it's homework, why are you simply asking us?

it would be better to write it and ask "how can this be improved?"
than not and ask "how can this be done?"

but that's just my opinion...
hex

Not a school homework. Its a work homework. Thanks

···

--
Posted via http://www.ruby-forum.com/\.

Rory_Pascua · 14 September 2011 14:29

Check out String#split (
http://rdoc.info/stdlib/core/1.9.2/String#split-instance_method\) that
should
help you get it into an array, which should be a lot easier to work
with.

Thanks Josh

···

--
Posted via http://www.ruby-forum.com/\.

Pascua_9804 · 14 September 2011 15:56

Thanks Harry, I'll try that.

···

--
Posted via http://www.ruby-forum.com/.

Pascua_9804 · 15 September 2011 18:56

Thanks to all who tried to help. Here's the final answer.

#!/usr/bin/env ruby
string="The quick brown fox jumped over the lazy dog"
def get_subsection(word, sentence)
sentence.scan(Regexp.new(/(?:\W{0,1}\w+\W){0,3}over(?:\W{1}\w+){0,3}/))
end

puts get_subsection("quick", string)
puts get_subsection("lazy", string)
puts get_subsection("fox", string)
puts get_subsection("dog", string)

The regex in the middle of the syntax is where I struggled, but with a
little bit of help from the guru I was able to solve the problem.

···

--
Posted via http://www.ruby-forum.com/.

Harry3 · 16 September 2011 15:31

Actually, I guess this makes a little more sense and is a little faster.

def my_get(str, num, substrings)
  f,g,n = str.delete(",.").split(/\s+/), substrings, num
  s = f.size
  (0...s).select{|x| g.include?(f[x])}.map{|y| f[([0,y-n].max..[y+n,s-1].min)]}
end

astring = "Robert likes green beans, girls with mustaches, and teddy
bears. John thinks Robert is strange."

p my_get(astring,3,["Robert","and","teddy","bears","strange"])

Harry

Mark_H_Nichols · 14 September 2011 14:29

.serialhex .. wrote in post #1021921:

...if it's homework, why are you simply asking us?

it would be better to write it and ask "how can this be improved?"
than not and ask "how can this be done?"

but that's just my opinion...
hex

Not a school homework. Its a work homework. Thanks

Still. Better to try your hand at something than to just copy what someone else says.

--Mark

···

On Sep 14, 2011, at 9:27 AM, Rory Pascua wrote:

--
Posted via http://www.ruby-forum.com/\.

serialhex · 14 September 2011 14:31

.serialhex .. wrote in post #1021921:

...if it's homework, why are you simply asking us?

...

Not a school homework. Its a work homework. Thanks

ahh i see, my bad, sorry about that
hex

···

On Wed, Sep 14, 2011 at 10:27 AM, Rory Pascua <rorypascua@yahoo.com> wrote:

--
* my blog is cooler than yours: http://serialhex.github.com
* The wise man said: "Never argue with an idiot. They bring you down
to their level and beat you with experience."
* As a programmer, it is your job to put yourself out of business.
What you do today can be automated tomorrow. ~Doug McIlroy

Darryl_L_Pierce · 14 September 2011 15:35

I think that makes no difference. What have _you_ done first to attempt
to solve this?

···

On Wed, Sep 14, 2011 at 11:27:08PM +0900, Rory Pascua wrote:

Not a school homework. Its a work homework. Thanks

--
Darryl L. Pierce <mcpierce@gmail.com>
http://mcpierce.multiply.com/
"What do you care what people think, Mr. Feynman?"

Ian_Hobson · 15 September 2011 20:32

I think that is a maintenance nightmare!

As Jamie Zawinski said - Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

For large source texts it will be horribly slow, and memory hungry, and for large search lists it will slow down even more. Huge, slow and hard to maintain = not good.

What the OP wanted was a sequence of 7 words, where the 4th is the word sought, and the string can be missing words "before" or "after" the source string.

So you need two parallel lists of strings. The first is a list of tokens from the source, where each token is separated from the next by white-space. The second are words, created from the tokens by removing punctuation.

Slide through the source, token at a time, and if the forth word of the word list is one of the ones you want,
use the token list to reconstruct the fragment of the source, (without newlines) and emit the result.

In order to handle the start-up and close-down properly, I would consider preloading the token list with null strings, and arrange the "get next token" function to return three null strings after end of file, before signalling the end.
However there are other methods.

This is one pass, so you don't need the source all in memory. It will be order source size in time, and order the number of words sought in space. Fast, compact and easy to alter the rule or length of the lists.

Regards

Ian

···

On 15/09/2011 19:56, Pascua 9804 wrote:

Thanks to all who tried to help. Here's the final answer.

#!/usr/bin/env ruby
string="The quick brown fox jumped over the lazy dog"
def get_subsection(word, sentence)
sentence.scan(Regexp.new(/(?:\W{0,1}\w+\W){0,3}over(?:\W{1}\w+){0,3}/))
end

puts get_subsection("quick", string)
puts get_subsection("lazy", string)
puts get_subsection("fox", string)
puts get_subsection("dog", string)

The regex in the middle of the syntax is where I struggled, but with a
little bit of help from the guru I was able to solve the problem.

Rory_Pascua · 14 September 2011 14:38

Mark H. Nichols wrote in post #1021929:

···

On Sep 14, 2011, at 9:27 AM, Rory Pascua wrote:

.serialhex .. wrote in post #1021921:

...if it's homework, why are you simply asking us?

it would be better to write it and ask "how can this be improved?"
than not and ask "how can this be done?"

but that's just my opinion...
hex

Not a school homework. Its a work homework. Thanks

Still. Better to try your hand at something than to just copy what
someone else says.

--Mark

Dude, if you're not going to help, why respond to this post? Simply
ignore and move on rather than be an @hole

--
Posted via http://www.ruby-forum.com/\.

Pascua_9804 · 14 September 2011 15:55

Darryl Pierce wrote in post #1021946:

Not a school homework. Its a work homework. Thanks

I think that makes no difference. What have _you_ done first to attempt
to solve this?

I know.. I know.. My post sounded like I didn't even try because I went
straight to the question without even explaining what I did. Well
here's what I did.

#!/usr/bin/env ruby
string="The quick brown fox jumped over the lazy dog"
def get_subsection(word, sentence)
sentence.scan(Regexp.new(word))
end

puts get_subsection("quick", string)
puts get_subsection("lazy", string)
puts get_subsection("fox", string)
puts get_subsection("dog", string)

I just need the REGEX portion to select the three words on both sides
and right now I'm struggling with it. I tried looking at "lookaround"
but it got me even more confused.

···

On Wed, Sep 14, 2011 at 11:27:08PM +0900, Rory Pascua wrote:

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 16 September 2011 05:28

Thanks to all who tried to help. Here's the final answer.

#!/usr/bin/env ruby
string="The quick brown fox jumped over the lazy dog"
def get_subsection(word, sentence)
sentence.scan(Regexp.new(/(?:\W{0,1}\w+\W){0,3}over(?:\W{1}\w+){0,3}/))
end

puts get_subsection("quick", string)
puts get_subsection("lazy", string)
puts get_subsection("fox", string)
puts get_subsection("dog", string)

The regex in the middle of the syntax is where I struggled, but with a
little bit of help from the guru I was able to solve the problem.

I think that is a maintenance nightmare!

As Jamie Zawinski said - Some people, when confronted with a problem, think
"I know, I'll use regular expressions." Now they have two problems.

Nah.

For large source texts it will be horribly slow, and memory hungry, and for
large search lists it will slow down even more. Huge, slow and hard to
maintain = not good.

That entirely depends on the problem to solve and the approach with
regexp chosen.

What the OP wanted was a sequence of 7 words, where the 4th is the word
sought, and the string can be missing words "before" or "after" the source
string.

So you need two parallel lists of strings. The first is a list of tokens
from the source, where each token is separated from the next by white-space.
The second are words, created from the tokens by removing punctuation.

I'd work with a single list of words and non words interchanged. That
should make generation of the combined matching sequence easier.

Slide through the source, token at a time, and if the forth word of the word
list is one of the ones you want,
use the token list to reconstruct the fragment of the source, (without
newlines) and emit the result.

In order to handle the start-up and close-down properly, I would consider
preloading the token list with null strings, and arrange the "get next
token" function to return three null strings after end of file, before
signalling the end.
However there are other methods.

This is one pass, so you don't need the source all in memory. It will be
order source size in time, and order the number of words sought in space.
Fast, compact and easy to alter the rule or length of the lists.

I find this simpler:

def word_scan(s, *words)
return to_enum(:word_scan, s, *words) unless block_given?
return if words.empty?

  s.scan /\b#{Regexp.union words}\b/ do |wd|
    pre = $`
    post = $'
    yield pre[/(?:\w+\W+){0,3}\z/] + wd + post[/\A(?:\W+\w+){0,3}/]
  end
end

s = "Robert likes green beans, girls with moustaches, and teddy bears.
John thinks Robert is strange"

puts 1
word_scan(s, "Robert") {|m| p m}

puts 2
word_scan(s, "green", "teddy") {|m| p m}
p word_scan(s, "green", "teddy").to_a

You can use it with and without block following the idiom to get an
Enumerable if there is no block. However, for really large inputs
your approach is likely better.

Kind regards

robert

···

On Thu, Sep 15, 2011 at 10:32 PM, Ian Hobson <ian.hobson@ntlworld.com> wrote:

On 15/09/2011 19:56, Pascua 9804 wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Regular expressions and long text ruby-talk	14	128	12 December 2008
Search string for occurneces of words stored in array ruby-talk	14	134	1 May 2008
Making an array of strings ruby-talk	7	155	12 September 2011
Find in Array ruby-talk	10	127	7 February 2011
String highlighting problem (newbie) ruby-talk	7	121	30 December 2007

Tough Ruby Homework

Related topics