Beautiful Code : Pity he didn't ask here

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

   http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
    puts $1
end

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
  counts = Hash.new(0)
   or
  counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
   10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
   def sort_keys_by_value
     keys.sort_by{|key| fetch(key)}
   end
end

Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end

Replace that with...

4 @hash = Hash.new{|hash,key| hash[key] = []}

10 @hash[s[0]] << [ s[1], article ]

That makes it real pretty code.

In fact, I can't find any pretty reason to have the class BigHash at
all....looks like his Java roots are showing through.

Personally I'd just drop it.

Partly the point of the exercise here is it is slow because it is so
big, a couple of things I wonder about...

For example I would time how long it takes to just read the file and do nothing..

He mentions the program grew to 1.56gb

Well, he is creating an Array per log entry.

Maybe if he created two hashes instead of an array per entry.

Incidently, I wouldn't have chosen that chunk of Java for illustrating
the binary search....

glibc's implementation is just so clean....

/* Perform a binary search for KEY in BASE which has NMEMB elements
    of SIZE bytes each. The comparisons are done by (*COMPAR)(). */
void *
bsearch (const void *key, const void *base, size_t nmemb, size_t size,
    int (*compar) (const void *, const void *))
{
   size_t l, u, idx;
   const void *p;
   int comparison;

   l = 0;
   u = nmemb;
   while (l < u)
     {
       idx = (l + u) / 2;
       p = (void *) (((const char *) base) + (idx * size));
       comparison = (*compar) (key, p);
       if (comparison < 0)
   u = idx;
       else if (comparison > 0)
   l = idx + 1;
       else
   return (void *) p;
     }

   return NULL;
}

On the other hand both have the classic bug that (high + low) or (l+u) overflows on large arrays...

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

Hmm, I don't think we should slurp when we don't need too. There's no reason to be wasteful.

I really liked the rest of your fixes though.

James Edward Gray II

···

On Jul 9, 2007, at 7:15 PM, John Carter wrote:

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
   puts $1
end

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|
    puts $1
end

martin

···

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
    puts $1
end

Just got the O'Reilly announcement of the book "Beautiful

<snip>

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
  counts = Hash.new(0)
   or
  counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
   10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
   def sort_keys_by_value
     keys.sort_by{|key| fetch(key)}
   end
end

or

tops = counts.sort_by{|*kv| kv.last}.reverse[0..9]
puts tops.map{ |k,v| "#{v}: #{k}"}

#each is just a primitive for Enumerables, right?

Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end

Replace that with...

4 @hash = Hash.new{|hash,key| hash[key] = }

10 @hash[s[0]] << [ s[1], article ]

That makes it real pretty code.

Save for

   File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

<snip>
Robert

···

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

John Carter <john.carter@tait.co.nz> writes:

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

...

For example scanning a log file.....

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
   puts $1
end

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
  %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
  line.scan(article_pattern) do |article,|
    puts article
  end
end

Look, no global variables!

I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

It is odd that he seems unaware of Array.sort_by - anyone know when
this was added to Ruby core? The book could be rather out of date.

Example 4.5 looks like a poster child for the Hash.new block approach...

Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.

In fact, I can't find any pretty reason to have the class BigHash at
all....looks like his Java roots are showing through.

Yeah; in Java, you can't have code existing outside of a class, and I
suspect that's a big part of it.

On the other hand both have the classic bug that (high + low) or
(l+u) overflows on large arrays...

Actually, his code doesn't: java arrays can only have 2**31 entries
(since their index is a java int), and java ints are always signed.
Therefore, any overflow in the addition in java yields a negative
integer, and >>> in java does right shift without sign extension.
(That is, with 0s shifted in at the left).

The glibc code does have that problem, but only on arrays that have
more than (maximum size_t)/2 elements.

···

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.to_a.last )
       puts "s=%q(#{s})",s.to_a.last

Hi --

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Yes, but be careful: your two lines don't do the same thing as each
other:

irb(main):007:0> counts = Hash.new(0)
=> {}
irb(main):008:0> counts["hi"]
=> 0
irb(main):009:0> counts
=> {}
irb(main):010:0> counts = Hash.new{|hash,key| hash[key]=0}
=> {}
irb(main):011:0> counts["hi"]
=> 0
irb(main):012:0> counts
=> {"hi"=>0}

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
   keys.sort_by{|key| fetch(key)}
end
end

It's best not to get into changing the core language, though, if the
book isn't going to discuss the benefits and pitfalls of doing so in
depth.

David

···

On Tue, 10 Jul 2007, John Carter wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Hello.

···

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,

--
Christoffer Sawicki
http://vemod.net/

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

  Beautiful Code [Book]

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Yep, mostly because for some reason I didn't know about sort_by. I wonder how I missed that? Obviously, the main point of my article was to encourage people to use the idiom of reading lines, picking 'em apart with a regex, and accumulating in a hash, invented in awk, brought to the world in perl, perfected in Ruby.

sort_by aside, I'm not convinced that any of the alternatives to my main loop are more beautiful or readable. I personally think that the reason Ruby will enjoy an ever increasing market share in the programming-languages space is because it's more readable.

I'm looking at the following...

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
   puts $1
end

Really? Considering most of the readership is new to Ruby?

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
  or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

  -Tim

···

On Jul 9, 2007, at 5:15 PM, John Carter wrote:

Well i for one am feeling pretty good that i learned Ruby here and not
from a textbook.

Anyway, nice work. Editors are probably wishing they would have gone
with "Cute Code" or "Decent-Looking Code."

--alex

···

On Jul 10, 2:15 am, John Carter <john.car...@tait.co.nz> wrote:

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

   Beautiful Code [Book]

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
    puts $1
end

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
  counts = Hash.new(0)
   or
  counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
   10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
   def sort_keys_by_value
     keys.sort_by{|key| fetch(key)}
   end
end

Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}

John Carter <john.carter@tait.co.nz> writes:

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
   puts $1
end

Kinda late to the party, but I'm amazed noone came up with:

ARGF.grep(%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) {
  puts $1
}

···

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Actually there is plenty of reason to be wasteful:

1) We can.
2) It leads to code that reads like the above.
3) We can!
4) It is only a short hop to:

    puts ARGF.read.scan( %r%GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) %).join("\n")

5) We can!!
6) It is our moral obligation to use our oft-idle and underused computers to their fullest.

There was a 7th... but I forgot what it was.

···

On Jul 9, 2007, at 17:53 , James Edward Gray II wrote:

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
   puts $1
end

Hmm, I don't think we should slurp when we don't need too. There's no reason to be wasteful.

> Since most Linux distro's roll the log files to a reasonable size I would have just gone with...
>
> ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
> puts $1
> end

I am with James do not slurp in the whole string it just does not make
sense to me. I would feel very uneasy using regexps on large strings
even if right now there seems no backtracking issues to be present,
but code evolves sometimes :wink:

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|

Nope Martin read the original code again!! All the / and \ ;).
I was about to fall into the same trap.
As a matter of fact I miss all beauty in that code :frowning: The regexp is
just ugly, some ideas for cosmetic surgery :wink:

def dig_dirs *args
    Regexp.new args.map{|n| "\d" * n}.join("/")
end

puts ARGF.map { |line|
    line =~ %r{ GET /ongoing/When/\d{3}x/(#{dig_dirs 4, 2, 2}/[^ .])}
    $1
}.compact

But beauty lies to everybody (or was that "in the eyes of the beholder"? ).

Cheers
Robert

···

On 7/10/07, Martin DeMello <martindemello@gmail.com> wrote:

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify "where k = key and v = value" then these should have been used in your code.

Always favor readability at the expense of verbosity, both your future self and whoever else maintains your code will thank you.

···

On Jul 10, 2007, at 08:27 , Christoffer Sawicki wrote:

Hello.

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,

--
Wayne E. Seguin
Sr. Systems Architect & Systems Admin
wayneseguin@gmail.com

"Robert Dober" <robert.dober@gmail.com> writes:

Just got the O'Reilly announcement of the book "Beautiful

<snip>

That makes it real pretty code.

Save for

  File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Good point. In many ways, it feels that the author is not as
comfortable with Ruby blocks as one should be. A better version,
since we also don't need a separate class:

def get_big_hash(filename)
  hash = Hash.new {|h,k| h[k]=}
  lines = 0
  File.open(filename) do |file|
    file.each_line do |line|
      s = line.split
      article = s[2].intern
      hash[s[0]] << [ s[1], article ]
      lines += 1
      STDERR.puts "Line: #{lines}" if (lines % 100_000) == 0
    end
  end
  return hash
end

I really like the block-using form of File.open. It makes me feel all
nice and safe like being wrapped in with-open-file or a carefully
constructed dynamic-wind.

···

On 7/10/07, John Carter <john.carter@tait.co.nz> wrote:

--
s=%q( Daniel Martin -- martin@snowplow.org
       puts "s=%q(#{s})",s.to_a.last )
       puts "s=%q(#{s})",s.to_a.last

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
  %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
  line.scan(article_pattern) do |article,|
    puts article
  end
end

Look, no global variables!

Yep, much better, I wish I'd done it that way.

I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

Why? default= makes it more obvious what you're doing. I think it's better practice.

Example 4.5 looks like a poster child for the Hash.new block approach...

Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.

Perl :slight_smile:

  -Tim

···

On Jul 10, 2007, at 5:11 AM, Daniel Martin wrote:

Actually, I think this:

  counts = Hash.new(0)

. . . is perfectly readable, and a bit more elegant than the two-line
version from "Example 4.3". On the other hand, this bit looks pretty
ugly to me:

  counts = Hash.new{|hash,key| hash[key]=0]}

. . . especially since it looks like someone is afraid he has a limited
number of space characters to use, and doesn't want to run out. Even
with some spaces added to it, though, it's not as pretty and readable.

···

On Tue, Jul 10, 2007 at 11:47:26PM +0900, Tim Bray wrote:

On Jul 9, 2007, at 5:15 PM, John Carter wrote:

>In Example 4.3 he has...
>1 counts = {}
>2 counts.default = 0
>
>Personally I prefer...
> counts = Hash.new(0)
> or
> counts = Hash.new{|hash,key| hash[key]=0}
>buts that's personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Larry Wall: "A script is what you give the actors. A program is what you
give the audience."

Perhaps because sort_by is defined in Enumerable, not in the Array
class. I wish that the Ruby documentation inlined methods like these
from Mixin classes (similar to how Javadoc displays a parent class'
methods). Is this possible when generating using rdoc?

···

On 7/10/07, Tim Bray <Tim.Bray@sun.com> wrote:

Yep, mostly because for some reason I didn't know about sort_by. I
wonder how I missed that? O

--
Caleb

Doubt is not a pleasant condition, but certainty is absurd - Voltaire

let's step back for one moment and think about why we all left perl:

   - wtf is $1 ?
   - wtf is ARGF ?
   - regexes are, of all things, the epitome of snoopy swearing

these are the kinds of magic things that used to send us all perdoc'ing to read our own code.

while i use both of those thing in dirty script i keep in ~/bin/ i'd never publish them as beautiful. names are the most important thing in programing and terminals are pretty dang wide these days

given that, why not

   pattern =
     %r|GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)|x

   STDIN.each_line do |line|
     match, date, *ignored = pattern.match(line).to_a
     next unless date
     puts date
   end

i always consider using the magic $vars a sign of temporary perl madness.

kind regards.

-a

···

On Jul 10, 2007, at 8:47 AM, Tim Bray wrote:

Gosh, I think I totally disagree on readability grounds.

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Well i for one am feeling pretty good that i learned Ruby here and not
from a textbook.

That is something very nice to say :slight_smile:

Anyway, nice work. Editors are probably wishing they would have gone
with "Cute Code" or "Decent-Looking Code."

It is indeed a very ambitious title, the Ruby chapter is very
disappointing but the author pointed out it was for beginners (sic).
For me that makes no sense :frowning:

Oh BTW please do not top post, a huge majority of us prefer bottom posting :slight_smile:

--alex

Robert

···

On 7/11/07, alex_land <doug.ybarbo@gmail.com> wrote:
--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

I perfectly agree with you.

Yet, I find my use of Ruby has changed on that front.

At first I wrote Ruby that was, I thought, in a most readable style.

What I realized later was, it was most readable to someone who knew
C++ well, but not Ruby.

But now I have found I'm headed more to Hal Fulton's attitude as expressed
in his book "The Ruby Way"
    http://rubyhacker.com/coralbook/

I try to write the most readable code that is most readable by someone
who knows Ruby perfectly well.

If, on the rare occasions that requires deep knowledge of Ruby... I
will explain that in the comment, not the code.

ie. My code assumes you know everything about Ruby. My comments assume
that you are a newbie.

Now that places a conundrum before someone writing a book "Beautiful
Code" for non-Ruby readers. Talk down to them? Speak a C++/Java/Ruby
pidgin?

Personally I would speak up to them. Scan the libraries for the best
code Rubiest have to offer in the Ruby idiom.... and spend my time
explaining the elegance, power and succinctness of it.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

···

On Tue, 10 Jul 2007, Tim Bray wrote:

I personally think that the reason Ruby will enjoy an ever
increasing market share in the programming-languages space is
because it's more readable.

   >snip<

Gosh, I think I totally disagree on readability grounds.