Please explain in English

I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.

def count_words(string)
  res = Hash.new(0)
  string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
  return res
end

···

--
Posted via http://www.ruby-forum.com/.

For those more versed than myself, I have a follow on question (thank for
posting this Jooma).

In this example can't you get rid of return res?

Wayne

···

----- Original Message ----
From: jooma lavata <lists@ruby-forum.com>
To: ruby-talk ML <ruby-talk@ruby-lang.org>
Sent: Mon, January 28, 2013 11:40:24 AM
Subject: Please explain in English

I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.

def count_words(string)
  res = Hash.new(0)
  string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
  return res
end

That's not a very idiomatic way, because the result of the map
function, which returns an array, is ignored. This signals that map is
not the correct method to use. Now, with that said:

string.downcase #=> returns a new string with all the characters downcased
.scan(/\w+/) #=> return an array of strings with each match of the
regular expression. \w+ means: one or more word characters, so this
should return an array of words.
.map #=> returns a new array where each position is filled with the
result of invoking the block with each element of the array. Example:

[1,2,3].map {|x| "x is #{x}"} #=> ["x is 1", "x is 2", "x is 3"]

res[word] = string.downcase.scan(\b#{word}\b/).size

What this means is, take the string, downcase it again, scan it for
the current word surrounded by word boundaries (so, whole word), take
the size of that array and place it in the hash under the key for this
word.
This is extremely inefficient, since, first of all, for each word it's
downcasing the string again, and then scanning for each word through
the full string again (which you are already doing). So this seems to
be O(N^2), where a single pass through the string should suffice.
Also, the block-less form of scan and using map like that is creating
many intermediate objects that are not used.

I'd do something like:

res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res

This uses the block form of scan, which instead of building an array,
just yields each match to the block. Since we are not doing anything
with that array, this is more efficient. We take advantage of the
default value of hash, which is set to 0, to just increment the count
for each word.

Hope this helps,

Jesus.

···

On Mon, Jan 28, 2013 at 6:39 PM, jooma lavata <lists@ruby-forum.com> wrote:

I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.

def count_words(string)
  res = Hash.new(0)
  string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
  return res
end

Regex is critical to this one. \w is word boundary. Scan returns everything
that matches that regex with a boolean true.

Down case isn't necessary. The word count would be the same either way.

Now if you just want to count words you don't even need that hash. If
you're trying to count instances of words that's a different story.

Suggested reading: Enumerables, Blocks, Scan, Inject, and Reduce.

Enumerable covers most of those. Read the Ruby docs.

Seeing as I'm on my phone at the moment, could someone else rewrote that
code a bit? It'd look all types of funky if I did right now.

Cheers.

···

On Jan 28, 2013 11:40 AM, "jooma lavata" <lists@ruby-forum.com> wrote:

I'm learning Ruby and I'm reading some expression that I saw on the
forum. I'm coming from Javascript. This is really hard for me. Please
help explain to me in plain English. I understand that it's a Function
that takes string and count words to return a Hash.

def count_words(string)
  res = Hash.new(0)
  string.downcase.scan(/\w+/).map{|word| res[word] =
string.downcase.scan(/\b#{word}\b/).size}
  return res
end

--
Posted via http://www.ruby-forum.com/\.

I'll try to break it down, let us know if there's anything further that
needs clarifying.

#Declare a method with one argument
  def count_words(string)

#Create an empty Hash (aka Dictionary) to modify it later
  res = Hash.new(0)

#Convert the whole string to lowercase (returns a new object, doesn't
modify in place)
  string.downcase

#Use Regex to return each word ( "+" means until a non-word character)
as an enumerator
  .scan(/\w+/)

#Iterate through each of the words and return (map) a new object (which
isn't used in this case)
  .map{|word|

#Populate the hash on each iteration (overwriting existing values)
res[word] =

#Get the "size" of the array returned by searching the string for all
instance of the current word
  string.downcase.scan(/\b#{word}\b/).size}

#Explicitly return the hash ("return" isn't strictly required as this is
the last line)
  return res
end

I can't helping feeling that there is a more efficient way to do this,
given that the loop iterates needlessly multiple times over the
duplicates.

This does the same thing (not sure whether it's faster):

def count_words(string)
  res = {}
  string.downcase!
  string.scan( /\w+/ ).uniq.each{ |word| res[word] =
    string.scan(/\b#{word}\b/).size }
  res
end

···

--
Posted via http://www.ruby-forum.com/.

i have a project in netbeans 6.8. I created a global module so...

module SharedVariables
  @prueba = 1

  def variable
    @prueba ||= 1
  end

  def variable= (var)
    @prueba = var
  end

end

this module are in global_var.rb file and want call this module by
other ruby file....

what to do ?

thanks

···

--
Posted via http://www.ruby-forum.com/.

nevermind... Now I see what's going on. (just had to run it in irb and look at
the results with and without the return res).

···

----- Original Message ----
From: Wayne Brisette <wbrisett@att.net>
To: ruby-talk ML <ruby-talk@ruby-lang.org>
Sent: Mon, January 28, 2013 11:45:45 AM
Subject: Re: Please explain in English

For those more versed than myself, I have a follow on question (thank for
posting this Jooma).

In this example can't you get rid of return res?

Wayne

You could, using inject, but some people might say this is less
readable, and also creates some intermediate object that is not really
needed:

string.downcase.scan(/\w+/).inject(Hash.new(0)) {|h, word| h[word] += 1; h}

Jesus.

···

On Mon, Jan 28, 2013 at 6:45 PM, Wayne Brisette <wbrisett@att.net> wrote:

For those more versed than myself, I have a follow on question (thank for
posting this Jooma).

In this example can't you get rid of return res?

"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

···

--
Posted via http://www.ruby-forum.com/\.

This modifies the argument coming in. Don't ever call downcase! or other mutating methods on an argument or you'll wind up in debugging hell. Make a copy instead:

string = string.downcase

···

On Jan 28, 2013, at 10:01 , Joel Pearson <lists@ruby-forum.com> wrote:

def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
   string.scan(/\b#{word}\b/).size }
res
end

And to answer Wayne's question how to get rid of the "return":

Hash.new(0).tap do |res|
  string.downcase.scan(/\w+/) {|word| res[word] += 1}
end

Kind regards

robert

···

On Mon, Jan 28, 2013 at 6:55 PM, Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

First of all, please do not hijack other threads. Then, please
explain what your goal is, i.e. what you want to achieve.

Kind regards

robert

···

On Tue, Jan 29, 2013 at 6:38 PM, sasan sasgho <lists@ruby-forum.com> wrote:

what to do ?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I guess the reason is that you avoid the intermediate arrays.

Jesus.

···

On Mon, Jan 28, 2013 at 7:16 PM, Joel Pearson <lists@ruby-forum.com> wrote:

"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

Ryan Davis wrote in post #1094171:

···

On Jan 28, 2013, at 10:01 , Joel Pearson <lists@ruby-forum.com> wrote:

def count_words(string)
res = {}
string.downcase!
string.scan( /\w+/ ).uniq.each{ |word| res[word] =
   string.scan(/\b#{word}\b/).size }
res
end

This modifies the argument coming in. Don't ever call downcase! or other
mutating methods on an argument or you'll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Thanks, I thought that those two things were equivalent.
Doesn't string = string.downcase overwrite the argument string anyway?

--
Posted via http://www.ruby-forum.com/\.

Ryan Davis wrote in post #1094171:

This modifies the argument coming in. Don't ever call downcase! or other
mutating methods on an argument or you'll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Ah, I didn't know that a bang method would also change the argument
outside of the current scope as well! Dangerous.

irb(main):001:0> a = 'a'
=> "a"
irb(main):002:0> def t1(b)
irb(main):003:1> b.upcase
irb(main):004:1> end
=> nil
irb(main):005:0> def t2(b)
irb(main):006:1> b.upcase!
irb(main):007:1> end
=> nil
irb(main):008:0> t1 a
=> "A"
irb(main):009:0> a
=> "a"
irb(main):010:0> t2 a
=> "A"
irb(main):011:0> a
=> "A"

···

--
Posted via http://www.ruby-forum.com/\.

As usual Robert, you've shown me a very elegant way to handle this! Thanks!

Wayne

···

----- Original Message ----
From: Robert Klemme <shortcutter@googlemail.com>

And to answer Wayne's question how to get rid of the "return":

Hash.new(0).tap do |res|
  string.downcase.scan(/\w+/) {|word| res[word] += 1}
end

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I suspect only scanning once is much more important than the extra arrays.

···

On Jan 28, 2013, at 12:13 , Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

On Mon, Jan 28, 2013 at 7:16 PM, Joel Pearson <lists@ruby-forum.com> wrote:

"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

I guess the reason is that you avoid the intermediate arrays.

Ryan Davis wrote in post #1094171:

This modifies the argument coming in. Don't ever call downcase! or other
mutating methods on an argument or you'll wind up in debugging hell.
Make a copy instead:

string = string.downcase

Ah, I didn't know that a bang method would also change the argument
outside of the current scope as well! Dangerous.

That's why there is the exclamation mark in the first place. It means
"potentially dangerous method" (defined by Matz).

Btw, this does not have that much to do with scope but it's rather
which object gets changed. All places in code which reference that
particular instance will notice the change once they use the object.

irb(main):001:0> a = 'a'
=> "a"
irb(main):002:0> def t1(b)
irb(main):003:1> b.upcase
irb(main):004:1> end
=> nil
irb(main):005:0> def t2(b)
irb(main):006:1> b.upcase!
irb(main):007:1> end
=> nil
irb(main):008:0> t1 a
=> "A"
irb(main):009:0> a
=> "a"
irb(main):010:0> t2 a
=> "A"
irb(main):011:0> a
=> "A"

Yeah, String methods with exclamation mark typically change the
instance itself whereas the "less dangerous" brothers typically return
a modified instance.

Kind regards

robert

···

On Tue, Jan 29, 2013 at 1:19 PM, Joel Pearson <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

You're welcome! But I think the elegance is rather due to language
and library design than me. Thank Matz!

Kind regards

robert

···

On Tue, Jan 29, 2013 at 3:16 PM, Wayne Brisette <wbrisett@att.net> wrote:

As usual Robert, you've shown me a very elegant way to handle this! Thanks!

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Sure, you are right. I didn't really read Joel's proposal, and assume
he had removed the double scan.

Jesus.

···

On Tue, Jan 29, 2013 at 10:25 AM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Jan 28, 2013, at 12:13 , Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

On Mon, Jan 28, 2013 at 7:16 PM, Joel Pearson <lists@ruby-forum.com> wrote:

"Jesús Gabriel y Galán" <jgabrielygalan@gmail.com> wrote in post
#1094106:

string.downcase.scan(/\w+/) {|word| res[word] += 1}

I tried benchmarking out of curiosity and that is a lot faster! Nicely
done.

I guess the reason is that you avoid the intermediate arrays.

I suspect only scanning once is much more important than the extra arrays.