Splitting strings

Hi all,

I have a text file with phrases that I'm looking to split into chunks.

The following keyword list:

the brown fox jumped,
over the fence,

Which should produce the following output:

the,
the brown,
the brown fox,
the brown fox jumped,
brown fox,
brown fox jumped,
fox,
fox jumped,
jumped,
over,
over the,
over the fence,
the,
the fence,
fence

I'm currently using the following code which splits after each space:

  def count_frequency
   the_file='D:/Ruby/projects/data.txt'
   h = Hash.new
   f = File.open(the_file, "r")
   f.each_line { |line|
   words = line.split
   words.each { |w|
    if h.has_key?(w)
      h[w] = h[w] + 1
    else
      h[w] = 1
    end
  }
}

# sort the hash by value, and then print it in this sorted order
h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
  puts "\"#{elem[0]}\" has #{elem[1]} occurrences"
}

end

By the look of this I just need to append to the words array more words
with a different slice?

Many thanks in advance,

Ryan

···

--
Posted via http://www.ruby-forum.com/.

lines = [
'the brown fox jumped,',
'over the fence,'
]

results = []

lines.each do |line|
  line.chomp!(',')
  words = line.split(' ')
  print "words: "
  p words

  words.inject('') do |accumulator, word|
    accumulator << "#{word} "
    results << "#{accumulator.chomp(' ')},"
    accumulator

  end
end

puts results

--output:--
the,
the brown,
the brown fox,
the brown fox jumped,
over,
over the,
over the fence,

···

--
Posted via http://www.ruby-forum.com/.

You're right. I'm still a beginner to Ruby, however I have still tried
researching what I'm looking for and come up with no results. I tried
manipulating the starting code but did not return any related results so
I asked a question... surely thats what these forums are for! (I'm sure
you would have done something like this - learning by example when you
first started too!)

Although I don't appreciate your tone and communication skills (perhaps
you need a lesson) thank you for your technical help.

Ryan

···

--
Posted via http://www.ruby-forum.com/.

Ok, taking on board all what has been said so far... this is what I'm
hoping to achieve (short term help needed as I am a beginner) is take a
list of strings from a text file and run through each string and split
it in as many combinations as possible, then count all the occurences of
each new strings that are split and provide them in the console as an
output.

Any help would be appreciated.

Regards,

Ryan

···

--
Posted via http://www.ruby-forum.com/.

Hi Robert,

Your example works perfectly, thank you!

To incorporate the occurence count for each keyword do we need to put it
into a hash similar to the first example I gave or is it possible to
directly link that up with the output?

The previous example I had was:

  words.each { |w|
    w.lstrip
    if h.has_key?(w)
      h[w] = h[w] + 1
    else
      h[w] = 1
    end
  }
}

# sort the hash by value, and then print it in this sorted order
h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
  puts "\"#{elem[0]}\" has #{elem[1]} occurrences"
}

Many thanks again for your help.

Regards,

Ryan

···

--
Posted via http://www.ruby-forum.com/.

Ok, so far at the minute then I have included the counter hash so I have
the following:

  def count_frequency
    the_file='D:/Rails/projects/data.txt'
    h = Hash.new
    words = Hash.new
    f = File.open(the_file, "r")
    counts = Hash.new 0

    if.each do |line|
      phrase = line.scan /\w+/
      limit = phrase.length - 1

      0.upto limit do |start|
        start.upto limit do |stop|
          puts phrase[start..stop].join(' ')
        end
      end

       counts.sort_by {|w,c| -c}.each do |w,c|
         printf "%6d %s\n", c, w
       end
    end
  end

Would the counter hash with the key go underneath the 'puts' in the loop
so that it records each step? At the minute it still just outputs the
new strings without the ordering.

Many thanks,

Ryan

···

--
Posted via http://www.ruby-forum.com/.

Ryan Mckenzie wrote in post #1011820:

You're right. I'm still a beginner to Ruby, however I have still tried
researching what I'm looking for and come up with no results. I tried
manipulating the starting code but did not return any related results so
I asked a question... surely thats what these forums are for! (I'm sure
you would have done something like this - learning by example when you
first started too!)

Although I don't appreciate your tone and communication skills (perhaps
you need a lesson) thank you for your technical help.

Ryan, since you brought up communication skills: from your original
posting it is not entirely clear to me what you want to do. Do you want
to count word occurrences? Do you want to generate permutations of all
subsets of words found in a document? Or do you want to generate all
sub sequences of each phrase (line) in the document?

A few remarks: the usual counting idiom is this

counters = Hash.new 0
...
counters[key] += 1

If you need to append to Array per key, you can do

lists = Hash.new {|h, k| h[k] = ]
...
lists[key] << item

You open the file but do not close it (better use block form of
File.open or use File.foreach for even simpler code).

Maybe this does what you want - maybe not

ARGF.each do |line|
  phrase = line.scan /\w+/
  limit = phrase.length - 1

  0.upto limit do |start|
    start.upto limit do |stop|
      puts phrase[start..stop].join ' '
    end
  end
end

Kind regards

robert

···

--
Posted via http://www.ruby-forum.com/\.

Ryan Mckenzie wrote in post #1011820:

You're right. I'm still a beginner to Ruby, however I have still tried
researching what I'm looking for and come up with no results.

As a beginner to ruby programming, you should be writing all programs
from scratch--not trying to alter some program you found on the
internets.

···

--
Posted via http://www.ruby-forum.com/\.

Ok, taking on board all what has been said so far... this is what I'm
hoping to achieve (short term help needed as I am a beginner) is take a
list of strings from a text file

Check File#read (either "ri File#read" on the command line, or on
ruby-doc.org). The gist:

data = File.read "myfile"

and run through each string and split it in as many combinations as possible

There's a lot of splitting possible!

Though, I guess you want to split a sentence into its words, correct?

Either way, String#split is what you want (probably 'A string".split("
")', which splits the string at spaces).

, then count all the occurences of each new strings that are split and provide them in the console as an
output.

Well, once you split your string, you get an Array of chunks (or
tokens, if you prefer): ["A", "string"]. So the question is: Do you
want to get every possible combination, or a subset of these
combinations (as in the example provided in your OP)?

···

On Wed, Jul 20, 2011 at 8:16 PM, Ryan Mckenzie <ryan@souliss.com> wrote:

--
Phillip Gawlowski

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
-- Leibniz

Ok, taking on board all what has been said so far... this is what I'm
hoping to achieve (short term help needed as I am a beginner) is take a

Unfortunately there is still a lot of room for interpretation left...

list of strings from a text file and run through each string and split

How do you obtain the list of strings? Is a string a line from the
text file? And, as Phillip asked, how do you want your strings to be
split?

it in as many combinations as possible,

Does order matter or not? Example: do you consider "a b" and "b a" to
be the same combination or two separate combinations?

then count all the occurences of
each new strings that are split and provide them in the console as an
output.

Do you want to count the parts of the original string (line) or the
combination, i.e do you want to count "a" and "b" or "a b"?

Btw, did you try out the code I sent earlier?

Kind regards

robert

···

On Wed, Jul 20, 2011 at 8:16 PM, Ryan Mckenzie <ryan@souliss.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ryan Mckenzie wrote in post #1011936:

Hi Robert,

Your example works perfectly, thank you!

You're welcome!

To incorporate the occurence count for each keyword do we need to put it
into a hash similar to the first example I gave or is it possible to
directly link that up with the output?

Please see what I called "counting idiom" above.

# sort the hash by value, and then print it in this sorted order
h.sort{|a,b| a[1]<=>b[1]}.each { |elem|
  puts "\"#{elem[0]}\" has #{elem[1]} occurrences"
}

To print in descending order you can as well do

counts.sort_by {|w,c| -c}.each do |w,c|
  printf "%6d %s\n", c, w
end

Kind regards

robert

···

--
Posted via http://www.ruby-forum.com/\.

Ryan Mckenzie wrote in post #1012277:

Ok, so far at the minute then I have included the counter hash so I have
the following:

  def count_frequency
    the_file='D:/Rails/projects/data.txt'
    h = Hash.new
    words = Hash.new
    f = File.open(the_file, "r")
    counts = Hash.new 0

    if.each do |line|
      phrase = line.scan /\w+/
      limit = phrase.length - 1

      0.upto limit do |start|
        start.upto limit do |stop|
          puts phrase[start..stop].join(' ')
        end
      end

       counts.sort_by {|w,c| -c}.each do |w,c|
         printf "%6d %s\n", c, w
       end
    end
  end

Would the counter hash with the key go underneath the 'puts' in the loop
so that it records each step? At the minute it still just outputs the
new strings without the ordering.

At the moment I would be surprised to see any output from counts because
you never update it. You also do not close the file properly (you could
make your life easier by using File.foreach) and I believe there is also
a spelling error ("if.each"). Did this program actually run and work?

Btw, the_file should rather be a method argument IMHO.

Kind regards

robert

···

--
Posted via http://www.ruby-forum.com/\.

Did you really post this through the forum? Interestingly there I
cannot see your sentence "As a beginner...". How weird is that? Does
the forum -> mailing list gateway add content? :slight_smile:

Kind regards

robert

···

On Wed, Jul 20, 2011 at 12:18 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

Ryan Mckenzie wrote in post #1011820:

You're right. I'm still a beginner to Ruby, however I have still tried
researching what I'm looking for and come up with no results.

As a beginner to ruby programming, you should be writing all programs
from scratch--not trying to alter some program you found on the
internets.

--
Posted via http://www.ruby-forum.com/\.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Good artists create. Great artists steal.

···

On Wed, Jul 20, 2011 at 12:18 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

As a beginner to ruby programming, you should be writing all programs
from scratch--not trying to alter some program you found on the
internets.

--
Phillip Gawlowski

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
-- Leibniz

Hi Robert,

Sorry that was a typo with the 'if'. I changed that to 'f'.

Unfortunately no I could not get it working. I will update the file so
it closes as you mentioned. How would I go about intergrating the count
with the phrase[start...stop] to insert those into the hash.

Sorry its probably an extremely basic question...

Thanks again,

Ryan

···

--
Posted via http://www.ruby-forum.com/.

I laughed out loud when I read this.

Altering programs found online is a fantastic way to learn. Thank god for open source/free software. Of course it depends on your goals, but i would not limit learning to just writing everything from scratch.

Lake

···

On Jul 20, 2011, at 10:52 AM, Phillip Gawlowski <cmdjackryan@gmail.com> wrote:

On Wed, Jul 20, 2011 at 12:18 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

As a beginner to ruby programming, you should be writing all programs
from scratch--not trying to alter some program you found on the
internets.

Good artists create. Great artists steal.

--
Phillip Gawlowski

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
              -- Leibniz

Ryan Mckenzie wrote in post #1012878:

Unfortunately no I could not get it working. I will update the file so
it closes as you mentioned. How would I go about intergrating the count
with the phrase[start...stop] to insert those into the hash.

I think you got that information above (see
http://www.ruby-forum.com/topic/2176493?reply_to=1012878#1011844\).

Kind regards

robert

···

--
Posted via http://www.ruby-forum.com/\.

Ah yes I see what you mean now. Ok so now I'm left with trying to use
the hash key to assign the phrases. Do I need to incrument the h and k
each time or just one of them?

    lists = Hash.new {|h, k| h[k] = []}

    f.each do |line|
      phrase = line.scan /\w+/
      limit = phrase.length - 1

      0.upto limit do |start|
        start.upto limit do |stop|
          lists[h,k] << [start..stop].join(' ')
        end
      end

    end

    lists.sort_by {|w,c| -c}.each do |w,c|
       printf "%6d %s\n", c, w
    end

Thanks again,

Ryan

···

--
Posted via http://www.ruby-forum.com/.