Writing UNIX 'wc' program

Hey all,

I've been spending the last week learning Ruby. Prior to that, I had spent some time learning Python. For various reasons, it looks like I'm gravitating more to Ruby.

That being said, I decided to write a small program similiar to the UNIX 'wc' program.

Right now, it's very stripped down being that it only accepts input data from STDIN if not a tty and I also haven't yet implemented the command line arguments.

For the context of this post, I only included the logic that gets/prints the word count and gets/prints the length of the longest line since these are the ones I have questions about. Here it is:

----- Beginning of Program -----

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
    data = STDIN.read
end

exit if not data

# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
    line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----

My question here isn't correctness as much as elegance. I'm fairly sure the solutions I've provided are correct (maybe); I'm just wondering if anyone has a better solution.

Thanks,
Keith P. Boruff

"@*(&SPAM&)*optonline.net" <" kboruff\""@*.*optonline.net> (&SPAM&)> writes:

Hey all,

I've been spending the last week learning Ruby. Prior to that, I had
spent some time learning Python. For various reasons, it looks like
I'm gravitating more to Ruby.

That being said, I decided to write a small program similiar to the
UNIX 'wc' program.

Right now, it's very stripped down being that it only accepts input
data from STDIN if not a tty and I also haven't yet implemented the
command line arguments.

For the context of this post, I only included the logic that
gets/prints the word count and gets/prints the length of the longest
line since these are the ones I have questions about. Here it is:

[...]

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

How about something like this?

class String
  def match_count (pattern)
    count = 0
    scan (pattern) { count = count + 1 }
    return count
  end

  def word_count
    match_count /\w+/
  end
end

puts "Word Count: #{data.word_count}"

Or put the methods outside of String if you prefer.

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
    line_length = line.length if line_length < line.length
end

I'd write something like

maximum_length = 0
data.each_line do |line|
  if line.length > maximum_length then
    maximum_length = line.length
  end
end

which doesn't need to keep all lines in an array.

  mikael

A couple of tips.

* ARGF is your friend when it comes to input; it is a virtual file
  that gets all input from named files or all from STDIN.

* 'unless' is a substitute for 'if not'.

* 'puts' is worth knowing about (I use 'printf' 1% of the time):

    puts "Word count: " + data.split(/\s/).length
    puts "Longest Line Length: #{line_length}"

* As far as elegance goes, see how you like this:

    longest_length = data.split(/\n/).map { |l| l.length }.max

  The way you've done it will perform better though, I expect.

Cheers,
Gavin

···

On Sunday, June 27, 2004, 10:03:02 AM, wrote:

----- Beginning of Program -----

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
    data = STDIN.read
end

exit if not data

# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
    line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----

Hi,

I like the following because:
1. It doesn't store the input in an array
2. As a toy program, it's easier to provide the data in-line rather than
through $stdin
3. It's quite succinct, IMHO

max_len = wd_cnt = 0
DATA.each_line do |line|
   line.chomp!
   max_len = line.length > max_len ? line.length : max_len
   # Note: pattern recognizes contractions (embedded apostrophe)
   wd_cnt += line.scan(/\w+'?(\w+)?/).size
end
puts "Max. len. = #{max_len}"
puts "Wd. count = #{wd_cnt}"

# Yogi'isms
__END__
When asked about his philosophy of life, he replied: "When you reach a fork
in the road, take it!"
When Yogi was told that Dublin, Ireland elected a Jewish mayor, he
excalimed: "only in America!"
When asked "What time is it?", Yogi inquired: "You mean now?"
"No one goes to that restaurant any more: it's too crowded!"

HTH,
Richard

···

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.708 / Virus Database: 464 - Release Date: 6/23/2004

Hi,

If you want you could get rid of the loop using inject:

line_length = data.inject(0){ |m, l|
  m = (l.length > m ? l.length : m) } - 1

#the - 1 is for the extra newline character.

I think however that for large files it may not be so
efficient (since the whole file has to be loaded in memory).

You could put everything in a loop for STDIN.each:

# your initialization code ...
line_length = 0
wc = 0
STDIN.each do |l|
  wc += l.split.length
  line_length = l.length if l.length > line_length
end
line_length -= 1

#show the result

Regards,
Kristof Bastiaensen

···

On Sat, 26 Jun 2004 23:58:44 +0000, @*(&SPAM&)*optonline.net wrote:

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
    data = STDIN.read
end

exit if not data

# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
    line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----

My question here isn't correctness as much as elegance. I'm fairly sure
the solutions I've provided are correct (maybe); I'm just wondering if
anyone has a better solution.

Thanks,
Keith P. Boruff

"@*(&SPAM&)*optonline.net" <""kboruff\"@*(&SPAM&)*optonline.net"> wrote in message news:<U4oDc.9527$OT6.7027622@news4.srv.hcvlny.cv.net>...

Hey all,

I've been spending the last week learning Ruby. Prior to that, I had
spent some time learning Python. For various reasons, it looks like I'm
gravitating more to Ruby.

That being said, I decided to write a small program similiar to the UNIX
'wc' program.

Right now, it's very stripped down being that it only accepts input data
from STDIN if not a tty and I also haven't yet implemented the command
line arguments.

For the context of this post, I only included the logic that gets/prints
the word count and gets/prints the length of the longest line since
these are the ones I have questions about. Here it is:

----- Beginning of Program -----

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
    data = STDIN.read
end

exit if not data

# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
    line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----

My question here isn't correctness as much as elegance. I'm fairly sure
the solutions I've provided are correct (maybe); I'm just wondering if
anyone has a better solution.

Thanks,
Keith P. Boruff

For File.wc see "ptools", available on the RAA.

Regards,

Dan

maybe even:
longest= data.split(/\n/).sort_by{ |l| l.length }.last

to get the line instead of the number of elements (I believe you're
keeping the array in memory to handle the lines more then one time)

···

il Sun, 27 Jun 2004 09:37:24 +0900, Gavin Sinclair <gsinclair@soyabean.com.au> ha scritto::

* As far as elegance goes, see how you like this:

   longest_length = data.split(/\n/).map { |l| l.length }.max

The way you've done it will perform better though, I expect.

Mikael Brockman wrote:

How about something like this?

> class String
> def match_count (pattern)
> count = 0
> scan (pattern) { count = count + 1 }
> return count
> end
> > def word_count
> match_count /\w+/
> end
> end
> > puts "Word Count: #{data.word_count}"

Or put the methods outside of String if you prefer.

This is good. I'll give it a shot.

# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
   line_length = line.length if line_length < line.length
end

I'd write something like

> maximum_length = 0
> data.each_line do |line|
> if line.length > maximum_length then
> maximum_length = line.length
> end
> end

which doesn't need to keep all lines in an array.

This is good too. However, this solution keeps the newline at the end of each line in the iteration so my longest line length of my test data is one more. In keeping with the actual wc program, it doesn't seem to count the new line. To add to your solution to fix this, I did this:

maximum_length = 0
data.each_line do |line|

   line.chomp!

   if line.length > maximum_length then
     maximum_length = line.length
   end
end

Keith Boruff

gabriele renzi wrote:

  longest_length = data.split(/\n/).map { |l| l.length }.max

longest= data.split(/\n/).sort_by{ |l| l.length }.last

to get the line instead of the number of elements (I believe you're
keeping the array in memory to handle the lines more then one time)

For something like this it would be great to have a .max_by built-in.

Are there any good reasons for not having it? I might write a RCR for it soon.

Regards,
Florian Gross

Florian Gross <flgr@ccan.de> writes:

gabriele renzi wrote:

  longest_length = data.split(/\n/).map { |l| l.length }.max

longest= data.split(/\n/).sort_by{ |l| l.length }.last
to get the line instead of the number of elements (I believe you're
keeping the array in memory to handle the lines more then one time)

For something like this it would be great to have a .max_by built-in.

Are there any good reasons for not having it? I might write a RCR for
it soon.

#max, like #sort, takes a block. That good enough, isn't it?

George Ogata wrote:

#max, like #sort, takes a block. That good enough, isn't it?

It's not as comfortable as a #max_by that would use a Schwartzian transform IMHO.

It's irrelevant - max only traverses the list once anyway, so each list
element is preprocessed only once. The Schwartzian transform wouldn't
make any difference.

martin

···

Florian Gross <flgr@ccan.de> wrote:

George Ogata wrote:

> #max, like #sort, takes a block. That good enough, isn't it?

It's not as comfortable as a #max_by that would use a Schwartzian
transform IMHO.

Oops - ignore my other reply. This would indeed be syntactically neater.

martin

···

Florian Gross <flgr@ccan.de> wrote:

George Ogata wrote:

> #max, like #sort, takes a block. That good enough, isn't it?

It's not as comfortable as a #max_by that would use a Schwartzian
transform IMHO.