Adventures in Optimization... or why CONST frozen is Good

...or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a "Bang!" at the end.

Consider this code.

a = ['froot']
b=a.first
c = {"d"=>b}

Now a[0], b, c["d"] refer to _exactly_ the same string instance

a[0].object_id

=> -605300798

b.object_id

=> -605300798

c["d"].object_id

=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,"ui")
=> "fruit"
irb(main):009:0> b
=> "fruit"
irb(main):010:0> c
=> {"d"=>"fruit"}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don't have to "new" a new object
instance if you don't want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String's were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snippet&id=70

So I extended that to find _which_ was the most common string I was
generating.

    def MemoryProfile::string_duplicates
       Dir.chdir "/tmp"
       ObjectSpace::garbage_collect
       sleep 10 # Give the GC thread a chance

       tally = Hash.new(0)
       ObjectSpace.each_object do |obj|
          next if obj.class != String
          tally[obj]+=1
       end

       open( LOG_FILE, 'a') do |outf|
          outf.puts '='*70
          outf.puts "
String Duplicates report for #{$0}

"
          tally.keys.find_all{|s| tally[s] > 1}.sort_by{|s| tally[s]}.each do |s|
             outf.puts "#{s}\t#{tally[s]}"
          end
       end
    end

The answer, by a long shot, was "U".

Somewhere in my code I had the line
   symbols_needed[symbol_name] = 'U'

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT...
   UNDEFINED = 'U'.freeze

and
   symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the "freeze" will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string... can be a pessimization in every case where
you assign a string literal.

a= "froot"
=> "froot"
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= "froot"
=> "froot"
irb(main):004:0> a.object_id
=> -605352198

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

thanks john. interesting.

a @ http://codeforpeople.com/

···

On Dec 1, 2008, at 4:53 PM, John Carter wrote:

...or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a "Bang!" at the end.

Consider this code.

a = ['froot']
b=a.first
c = {"d"=>b}

Now a[0], b, c["d"] refer to _exactly_ the same string instance

a[0].object_id

=> -605300798

b.object_id

=> -605300798

c["d"].object_id

=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,"ui")
=> "fruit"
irb(main):009:0> b
=> "fruit"
irb(main):010:0> c
=> {"d"=>"fruit"}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don't have to "new" a new object
instance if you don't want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String's were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snippet&id=70

So I extended that to find _which_ was the most common string I was
generating.

  def MemoryProfile::string_duplicates
     Dir.chdir "/tmp"
     ObjectSpace::garbage_collect
     sleep 10 # Give the GC thread a chance

     tally = Hash.new(0)
     ObjectSpace.each_object do |obj|
        next if obj.class != String
        tally[obj]+=1
     end

     open( LOG_FILE, 'a') do |outf|
        outf.puts '='*70
        outf.puts "
String Duplicates report for #{$0}

"
        tally.keys.find_all{|s| tally[s] > 1}.sort_by{|s| tally[s]}.each do |s|
           outf.puts "#{s}\t#{tally[s]}"
        end
     end
  end

The answer, by a long shot, was "U".

Somewhere in my code I had the line
symbols_needed[symbol_name] = 'U'

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT...
UNDEFINED = 'U'.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the "freeze" will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string... can be a pessimization in every case where
you assign a string literal.

a= "froot"
=> "froot"
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= "froot"
=> "froot"
irb(main):004:0> a.object_id
=> -605352198

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

Now I have a class CONSTANT...
   UNDEFINED = 'U'.freeze

and
   symbols_needed[symbol_name] = UNDEFINED

Yes, that's the "right" solution with Ruby today, and you'll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don't consider this any sort of "optimisation" though. It's
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

This is a breath of fresh air when compared to, say, Perl. Is this value
a scalar? Is it a scalar number or string, or a reference to an Array or
a Hash, or a typeglob, or a filehandle, or ...?

However, you could argue that string literals should have been immutable
(like Symbol). The language would end up being somewhat different to
use:

    a = "hello" # maybe Symbol or StringLiteral
    b = String.new(a) # mutable String
    b << " world"

You'd also have to have a load of rules to work out. Should a.dup return
the same Symbol, or a new mutable String? Should (a + "world") return a
new Symbol, or a new mutable String?

From this point of view, just having String keeps things simple, even if
it does end up creating a load of garbage objects. In those cases where
this matters, your approach (of profiling and zapping) is a good one.

···

--
Posted via http://www.ruby-forum.com/\.

Completely agree, this comes as no surprise. Actually, this is an
obvious design decision, if you want to use the same value to denote a
particular state then just use one object.

Another, probably more subtle issue is this:

irb(main):001:0> s="foo"
=> "foo"
irb(main):002:0> h={s=>1}
=> {"foo"=>1}
irb(main):003:0> s.equal? h.keys.first
=> false
irb(main):004:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539280]
irb(main):005:0> s.freeze
=> "foo"
irb(main):006:0> h={s=>1}
=> {"foo"=>1}
irb(main):007:0> s.equal? h.keys.first
=> true
irb(main):008:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539250]

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

Kind regards

robert

···

2008/12/2 Brian Candler <b.candler@pobox.com>:

Now I have a class CONSTANT...
   UNDEFINED = 'U'.freeze

and
   symbols_needed[symbol_name] = UNDEFINED

Yes, that's the "right" solution with Ruby today, and you'll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don't consider this any sort of "optimisation" though. It's
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

--
remember.guy do |as, often| as.you_can - without end

For some values of hidden :

16:41 grappa:~> qri 'Hash#='
--------------------------------------------------------------- Hash#=
     hsh[key] = value => value
     hsh.store(key, value) => value

···

Le 02 décembre à 16:33, Robert Klemme a écrit :

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

------------------------------------------------------------------------
     Element Assignment---Associates the value given by value with the
     key given by key. key should not have its value changed while it
     is in use as a key (a String passed as a key will be duplicated
     and frozen).

Fred
--
Assignments: telling a variable what it stands for, and/or what value(s)
it should have is coercive and paternalistic: variables should be free
to choose their own names and value-sets from a range of non-sexist,
non-racist options. (Tanuki in the SDM, on politically-correct coding)