[QUIZ] Short But Unique (#83)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Ryan Williams

I use Eclipse (with RadRails!) I have a bunch of files open in tabs. Once enough
files are open, Eclipse starts to truncate the names so that everything fits.
It truncates them from the right, which means that pretty soon I'm left unable
to tell which tab is "users_controller.rb" and which is
"users_controller_test.rb", because they're both truncated to
"users_control...".

The quiz would be to develop an abbrev-like module that shortens a set of
strings so that they are all within a specified length, and all unique. You
shorten the strings by replacing a sequence of characters with an ellipsis
character [U+2026]. If you want it to be ascii-only, use three periods instead,
but keep in mind that then you can only replace blocks of four or more
characters.

It might look like this in operation:

  ['users_controller', 'users_controller_test',
   'account_controller', 'account_controller_test',
   'bacon'].compress(10)
  => ['users_c...', 'use...test', 'account...', 'acc...test', 'bacon']

There's a lot of leeway to vary the algorithm for selecting which characters to
crop, so extra points go to schemes that yield more readable results.

Two things:

    Are the entries in the array always unique?
Or do we have to be able to handle the array such as:

['users_controller', 'users_controller_test', 'account_controller', 'account_controller_test', 'bacon', 'users_controller_test']

Also is the unicode ellipsis counted as one or three characters?

-Gautam Dey

···

On Jun 16, 2006, at 5:33 AM, Ruby Quiz wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Ryan Williams

I use Eclipse (with RadRails!) I have a bunch of files open in tabs. Once enough
files are open, Eclipse starts to truncate the names so that everything fits.
It truncates them from the right, which means that pretty soon I'm left unable
to tell which tab is "users_controller.rb" and which is
"users_controller_test.rb", because they're both truncated to
"users_control...".

The quiz would be to develop an abbrev-like module that shortens a set of
strings so that they are all within a specified length, and all unique. You
shorten the strings by replacing a sequence of characters with an ellipsis
character [U+2026]. If you want it to be ascii-only, use three periods instead,
but keep in mind that then you can only replace blocks of four or more
characters.

It might look like this in operation:

  ['users_controller', 'users_controller_test',
   'account_controller', 'account_controller_test',
   'bacon'].compress(10)
  => ['users_c...', 'use...test', 'account...', 'acc...test', 'bacon']

There's a lot of leeway to vary the algorithm for selecting which characters to
crop, so extra points go to schemes that yield more readable results.

My entry is simple, and not very complicated. At first I was thinking of make it much more complicated and using the abbrev to get human readable entries for and abbreviated version
of the title. But that seemed to complicated things more then help. So, I just went for a simple algorithm. My solution basically consistest of taking a simple truncation of the file name, then if that is already taken, going to the end and shifting the ellipsis to the left while reveling more of the last word, till a the title does not match anymore. There is a very large possibility of getting an infinite loop. And I have not tested it on many strings. Also, another flaw is that if two string are identical but smaller then the value sent to the function, it will return both string untouched. Since it does not touch any string smaller or equal to the length passed to it.

Gautam.

···

--------------------------------------------------------------------------------------------------------------------------------
#!/usr/bin/env ruby -w
# Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
# on Ruby Talk follow the discussion. Please reply to the original quiz message,
# if you can.
#
# -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#
# by Ryan Williams
#
# I use Eclipse (with RadRails!) I have a bunch of files open in tabs. Once enough
# files are open, Eclipse starts to truncate the names so that everything fits.
# It truncates them from the right, which means that pretty soon I'm left unable
# to tell which tab is "users_controller.rb" and which is
# "users_controller_test.rb", because they're both truncated to
# "users_control...".
#
# The quiz would be to develop an abbrev-like module that shortens a set of
# strings so that they are all within a specified length, and all unique. You
# shorten the strings by replacing a sequence of characters with an ellipsis
# character [U+2026]. If you want it to be ascii-only, use three periods instead,
# but keep in mind that then you can only replace blocks of four or more
# characters.
#
# It might look like this in operation:
#
# ['users_controller', 'users_controller_test',
# 'account_controller', 'account_controller_test',
# 'bacon'].compress(10)
# => ['users_c...', 'use...test', 'account...', 'acc...test', 'bacon']
#
# There's a lot of leeway to vary the algorithm for selecting which characters to
# crop, so extra points go to schemes that yield more readable results.
#
# This code is released under the GPL.

require 'Abbrev'
module GDCompress
   def compress (size)
     usedNameHash = Hash.new
     compressedTitleNames = Array.new
     for tabTitle in self
       newTabTitle = "" # start with empty string.
       if tabTitle.length > size
           caseValue = 0
           loop do
             newTabTitle = tabTitle[0,size-(1+caseValue)] + "…" + tabTitle[-caseValue,caseValue]
             #print "\t#{newTabTitle} is the new tabTitleTitle for #{tabTitle}\n"
             caseValue = caseValue + 3
             break unless usedNameHash[newTabTitle]
           end
       else
         newTabTitle = tabTitle
       end
       usedNameHash[newTabTitle] = tabTitle
       compressedTitleNames[compressedTitleNames.length] = newTabTitle
     end
     compressedTitleNames
   end
end

class Array
   include GDCompress
   extend GDCompress
end

print ['users_controller', 'users_controller_test',
      'account_controller', 'account_controller_test',
      'bacon'].compress(10)

---------------------------------------------------------------------------------------------------------------------------

On Jun 16, 2006, at 5:33 AM, Ruby Quiz wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Ryan Williams

I use Eclipse (with RadRails!) I have a bunch of files open in tabs. Once enough
files are open, Eclipse starts to truncate the names so that everything fits.
It truncates them from the right, which means that pretty soon I'm left unable
to tell which tab is "users_controller.rb" and which is
"users_controller_test.rb", because they're both truncated to
"users_control...".

The quiz would be to develop an abbrev-like module that shortens a set of
strings so that they are all within a specified length, and all unique. You
shorten the strings by replacing a sequence of characters with an ellipsis
character [U+2026]. If you want it to be ascii-only, use three periods instead,
but keep in mind that then you can only replace blocks of four or more
characters.

It might look like this in operation:

  ['users_controller', 'users_controller_test',
   'account_controller', 'account_controller_test',
   'bacon'].compress(10)
  => ['users_c...', 'use...test', 'account...', 'acc...test', 'bacon']

There's a lot of leeway to vary the algorithm for selecting which characters to
crop, so extra points go to schemes that yield more readable results.

Here is my solution.
It tries to generate unambiguous abbrevations, if those don't exist,
it uses the least ambiguous one and always avoids using the same
abbrevation twice.
There's also a readability thing built in, strings with many
characters at the beginning or having the characters split equally
over the beginning and ending parts are considered the most readable.

class String
  def compress(total_length, end_length)
    self[0...total_length-end_length] + '...' + self[length-end_length..-1]
  end
end

class Array
  def compress!(max_length)
    max_length = 4 if max_length < 4
  score = Hash.new(0)
  usable_length = max_length - 3
  order = (0..usable_length).sort_by{|len|
[(len-usable_length.to_f/2).abs,len].min}
  to_compress = select {|s| s.length > usable_length}
  to_compress.each {|s| order.map{|l| score[s.compress(usable_length,l)] += 1 } }
  to_compress.each{|s|
    s.replace order.map{|l| s.compress(usable_length,l) }.min{|a,b|
score[a] <=> score[b]}
    score[s] += 100
  }
  self
  end
end

if __FILE__==$0
  p ['users_controller', 'users_controller_test','account_controller',
'account_controller_test','bacon'].compress!(10)
  p Array.new(10){'abcdefghijklmnopqrstuvwxyz'}.compress!(12)
  p ['aaaaaazbbbbb','aaaaaaybbbbb'].compress!(9)
end

Two things:

   Are the entries in the array always unique?

Let's assume they are, sure.

Also is the unicode ellipsis counted as one or three characters?

One, in my opinion.

James Edward Gray II

···

On Jun 17, 2006, at 3:15 PM, Gautam Dey wrote: