[QUIZ] Text Munger (#76)

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion.

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Matthew Moss

  Now terhe is a fnial rseaon I thnik that Jsues syas, "Lvoe your
  emneies." It is tihs: taht love has wtiihn it a remvpidtee pewor. And
  three is a pwoer three taht eellvtanuy tfranrmsos idvlinaidus. Taht's
  why Juess syas, "Love your emeeins." Bsceaue if you hate your
  enmeies, you have no way to reedem and to tarfnrsom your eenmeis. But
  if you love yuor emienes, you wlil decsiovr that at the vrey root of
  love is the pwoer of rdoemptein. You just keep loinvg pepole and keep
  lnivog tehm, even tgouhh they're mteitnsiarg you. Hree's the porsen
  who is a nhoeigbr, and tihs psoren is dnoig simhoetng wrong to you and
  all of that. Just keep being fnrdliey to that preosn. Keep liovng
  them. Don't do atnynhig to earsmrbas tehm. Just keep lvonig them, and
  they can't stand it too long. Oh, they raect in mnay ways in the
  bineningg. They react wtih brnetitess beucase they're mad bauesce you
  lvoe them like that. Tehy raect wtih gluit flegines, and setioemms
  they'll hate you a lltite more at that tinoiasrtn piroed, but just
  keep lvniog them. And by the poewr of your love tehy will beark down
  uendr the load. That's lvoe, you see. It is retpmevide, and tihs is
  why Juess says love. Trehe's shimeotng aubot love that blidus up and
  is cavrtiee. Trehe is stmeonihg aubot hate that tares dwon and is
  disettvrcue. So lvoe your eenmeis.

On first glance, the above may appear to be gibberish, but you may find that you
can actually read this portion of a speech from Dr Martin Luther King Jr. The
brain has an amazing capacity to compensate for things that aren't quite right,
and one study has shown that when the first and last letters of words are left
alone but those in the middle are scrambled, the text is often still quite
comprehensible.

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

Hi --

by Matthew Moss

  Now terhe is a fnial rseaon I thnik that Jsues syas, "Lvoe your
  emneies." It is tihs: taht love has wtiihn it a remvpidtee pewor. And
  three is a pwoer three taht eellvtanuy tfranrmsos idvlinaidus. Taht's
  why Juess syas, "Love your emeeins." Bsceaue if you hate your
  enmeies, you have no way to reedem and to tarfnrsom your eenmeis. But
  if you love yuor emienes, you wlil decsiovr that at the vrey root of
  love is the pwoer of rdoemptein. You just keep loinvg pepole and keep
  lnivog tehm, even tgouhh they're mteitnsiarg you. Hree's the porsen
  who is a nhoeigbr, and tihs psoren is dnoig simhoetng wrong to you and
  all of that. Just keep being fnrdliey to that preosn. Keep liovng
  them. Don't do atnynhig to earsmrbas tehm. Just keep lvonig them, and
  they can't stand it too long. Oh, they raect in mnay ways in the
  bineningg. They react wtih brnetitess beucase they're mad bauesce you
  lvoe them like that. Tehy raect wtih gluit flegines, and setioemms
  they'll hate you a lltite more at that tinoiasrtn piroed, but just
  keep lvniog them. And by the poewr of your love tehy will beark down
  uendr the load. That's lvoe, you see. It is retpmevide, and tihs is
  why Juess says love. Trehe's shimeotng aubot love that blidus up and
  is cavrtiee. Trehe is stmeonihg aubot hate that tares dwon and is
  disettvrcue. So lvoe your eenmeis.

On first glance, the above may appear to be gibberish, but you may find that you
can actually read this portion of a speech from Dr Martin Luther King Jr. The
brain has an amazing capacity to compensate for things that aren't quite right,
and one study has shown that when the first and last letters of words are left
alone but those in the middle are scrambled, the text is often still quite
comprehensible.

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

Question:

Given a word like "there's" or "that's", does the letter before the
apostrophe count as a "last" letter? In other words, could "that's"
become "ttha's"?

In the example above, there's no case where that letter gets
scrambled. It's possible that that's coincidence, but it doesn't look
like it.

David

···

On Fri, 21 Apr 2006, Ruby Quiz wrote:

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

Ruby Quiz wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

What about writing an unscrambler? Could that also be done for this quiz or might that be next week's task? :slight_smile:

···

--
http://flgr.0x42.net/

My first participation in Ruby Quiz, and it has to be easy. That said, I must really be missing something because some of you guys are mentioning one-liners, and mine is 26 lines. Maybe it's because I made mine highly abstracted, but I still don't really see how to do this in one line.

- Jake McArthur

I know everyone here is Nice(tm), so I'm sure this is not the
intent... but between this quiz and the Markov chain one, it seems we
are building a set of utilities perfect for generating those 'Re:
PHARmudMACY'
spam emails selling 'vigara' and such that have been sneaking through
my spam filter at work recently...

-Adam

···

On 4/21/06, Ruby Quiz <james@grayproductions.net> wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

Here's my solution.

Usage : scramble.rb <text_file>

I made 3 attempts.

1)
print ARGF.read.gsub!(/\B[a-z]+\B/) {|x| x.split('').sort_by{rand}.join}

Here I use gsub to find all the words. Use split to convert strings into
arrays. And then use the sort_by{rand} to scramble the arrays. And finally
use join to convert the array back to a string.
I'm assuming that words don't have upper case letters in the middle, so that
I can get away with [a-z].

2)
print ARGF.read.gsub!(/\B[a-z]+\B/) {|x| x.unpack
('c*').sort_by{rand}.pack('c*')}

I found this method of converting strings to and from arrays to be faster.
I'm not sure what the standard idiom for doing this is. But, I'm sure I'll
learn after seeing other people's solutions :wink:

3 If sort_by{rand} does what I think it does, it probably has a bias when
the rand function returns the same value. So, this is my third
implementation:

print ARGF.read.gsub!(/\B[a-z]+\B/) {|x|
    x.length.times {|i|
        j = rand(i+1)
        x[j], x[i] = x[i] , x[j]
    }
    x
}

Basically, this is an implementation of scrambling that uses swaps. I
remember this method for scrambling from way back, but I can't seem to find
a good reference for it at the moment.
I also figured that this method would be faster since it is linear, while
the sorts are n log(n) (n = length of the word)

To by surprise, I found this method to actually be slower for any normal
text. One possible explanation is that when words are relatively short you
don't gain much from the n vs. nlogn difference, and you lose because while
this method always has n swaps, sorting may have less.

In order to see any performance benefit from the 3rd method I had to make up
some horrifically long words which aren't terribly likely in the English
language (maybe I should have tried German :)).

Himadri

···

On 4/21/06, Ruby Quiz <james@grayproductions.net> wrote:

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby Talk follow the discussion.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Matthew Moss

        Now terhe is a fnial rseaon I thnik that Jsues syas, "Lvoe your
        emneies." It is tihs: taht love has wtiihn it a remvpidtee pewor.
And
        three is a pwoer three taht eellvtanuy tfranrmsos idvlinaidus.
Taht's
        why Juess syas, "Love your emeeins." Bsceaue if you hate your
        enmeies, you have no way to reedem and to tarfnrsom your eenmeis.
But
        if you love yuor emienes, you wlil decsiovr that at the vrey root
of
        love is the pwoer of rdoemptein. You just keep loinvg pepole and
keep
        lnivog tehm, even tgouhh they're mteitnsiarg you. Hree's the
porsen
        who is a nhoeigbr, and tihs psoren is dnoig simhoetng wrong to you
and
        all of that. Just keep being fnrdliey to that preosn. Keep liovng
        them. Don't do atnynhig to earsmrbas tehm. Just keep lvonig them,
and
        they can't stand it too long. Oh, they raect in mnay ways in the
        bineningg. They react wtih brnetitess beucase they're mad bauesce
you
        lvoe them like that. Tehy raect wtih gluit flegines, and setioemms

        they'll hate you a lltite more at that tinoiasrtn piroed, but just
        keep lvniog them. And by the poewr of your love tehy will beark
down
        uendr the load. That's lvoe, you see. It is retpmevide, and tihs
is
        why Juess says love. Trehe's shimeotng aubot love that blidus up
and
        is cavrtiee. Trehe is stmeonihg aubot hate that tares dwon and is
        disettvrcue. So lvoe your eenmeis.

On first glance, the above may appear to be gibberish, but you may find
that you
can actually read this portion of a speech from Dr Martin Luther King
Jr. The
brain has an amazing capacity to compensate for things that aren't quite
right,
and one study has shown that when the first and last letters of words are
left
alone but those in the middle are scrambled, the text is often still quite
comprehensible.

Your task for this quiz, then, is to take a text as input and output the
text in
this fashion. Scramble each word's center (leaving the first and last
letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't
a
word -- should also remain unchanged.

Seems like forty-eight hours are up now, so here are my solutions for
this quiz, it was good to get a quick one :slight_smile: I wrote a simple random
munging solution, and a slightly longer one that munges only part of the
words. I went for a different way on the latter one, just to play with
regexps a bit, but I expect its performance isn't great...

Both support unicode properly, as long as the -Ku stays on the ruby
command line :wink: I could have used the u modifier instead but wanted to
save on the repetition.

# ========= random munging
#!/usr/local/bin/ruby -Ku
$stdout << ARGF.read.gsub(/\B((?![\d_])\w{2,})\B/) do |w|
  $&.split(//).sort_by { rand }
end

# (easily compresses to:)

#!/usr/local/bin/ruby -npKu
gsub(/\B((?![\d_])\w){2,}\B/){$&.split(//).sort_by{rand}}

# ========= slightly-less-random munging
#!/usr/local/bin/ruby -Ku
RX = Hash.new{|h,k|h[k]=/(.{#{(k/4.0).round}})#{'(.)'*(k/2.0).round}(.*)/}
$stdout << ARGF.read.gsub(/((?![\d_])\w){4,}/) do |w|
  (caps = RX[w.split(//u).length].match(w).captures).first +
      caps[1..-2].sort_by { rand }.to_s + caps.last
end

···

--
Ross Bamford - rosco@roscopeco.REMOVE.co.uk

Your task for this quiz, then, is to take a text as input and output the
text in this fashion. Scramble each word's center (leaving the first and
last letters of each word intact). Whitespace, punctuation, numbers --
anything that isn't a word -- should also remain unchanged.

···

_________________________________________
solution one
_________________________________________

     harp:~ > cat a.rb
     class String
       def scramble on = ''
         re = %r/( (?:\b \w \w{2,} \w \b) | \s+ | . )/iox
         scan(re){|words| on << words.first.scrambled}
         on
       end
       def scrambled
         self[1..-2] = self[1..-2].split(%r//).sort_by{rand}.to_s if size >= 4
         self
       end
     end
     ARGF.read.scramble STDOUT

     harp:~ > ruby a.rb < a.rb
     cslas Srntig
       def srbcamle on = ''
         re = %r/( (?:\b \w \w{2,} \w \b) | \s+ | . )/iox
         sacn(re){|wrods| on << wodrs.fisrt.salbercmd}
         on
       end
       def sclmaebrd
         slef[1..-2] = slef[1..-2].split(%r//).s_botry{rnad}.t_os if size >= 4
         slef
       end
     end
     ARGF.read.srcalbme SUDOTT

_________________________________________
solution two (golfing) _________________________________________

     harp:~ > ruby -npae 'gsub!(/\b(\w)(\w{2,})(\w)\b/){_=$3;[$1,$2.split(//).sort_by{rand},_]}' a.rb
     calss Snrtig
       def sbcarlme on = ''
         re = %r/( (?:\b \w \w{2,} \w \b) | \s+ | . )/iox
         sacn(re){|wdros| on << wdros.first.slramcebd}
         on
       end
       def smlcbaerd
         self[1..-2] = slef[1..-2].siplt(%r//).srbt_oy{rand}.t_os if size >= 4
         self
       end
     end
     ARGF.read.smclarbe SUTODT

     harp:~ > wc -c
     gsub!(/\b(\w)(\w{2,})(\w)\b/){_=$3;[$1,$2.split(//).sort_by{rand},_]}
          70

thanks for the fun quiz!

-a
--
be kind whenever possible... it is always possible.
- h.h. the 14th dali lama

Well, the simple regex based one-liner seems to have gotten
plenty of airplay, so I decided to expand mine in an attempt
to improve the readability of the munged text. For example:

A naive munging:

  Noumeurs idavilundis have dneoatrstmed the ieneascrd
  dfifclutiy oinrcrucg wehn leihngter wdors are slipmy
  reiondmazd. Raionizdnmg wiihtn hntyahoeipn buadoreins
  offers smoe irnmoeemvpt.

A slightly more readable munging:

  Nuemruos inididvuals hvae dnometrtsaed the insecraed
  dfiifulcty ocucrinrg when lghetnier wrods are smiply
  rnamodized. Randomzinig wihtin hyphenatoin bonduiares
  offres some imvoepremnt.

Original text:

  Numerous individuals have demonstrated the increased
  difficulty occurring when lengthier words are simply
  randomized. Randomizing within hyphenation boundaries
  offers some improvement.

The hyphen-boundary randomizer:

  require 'text/hyphen'
  hyp = Text::Hyphen.new :left => 1, :right => 1
  text = ARGF.read
  text.gsub!(/[^\W\d_]+/) do |m|
    hyp.visualize(m).split(/(^\w|\w$)|-/).map{|t|
      t.split(//).sort_by{rand}.join
    }.join
  end
  puts text
  __END__

cheers,
andrew

···

--
Posted via http://www.ruby-forum.com/.

Can you also suggest that people reply to the original thread instead of making new ones when they send their solutions? Right now there is:

- Original ruby quiz thread
- [QUIZ][SOLUTION] ...
- [QUIZ] ... A solution
- [QUIZ] ... A simplistic solution
- [SOLUTION] ...

-- Daniel

···

On Apr 21, 2006, at 2:34 PM, Ruby Quiz wrote:

The three rules of Ruby Quiz:
[...]
Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion.

Ruby Quiz wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

My solution:

=== snip ===

# Ruby Quiz 76
# Ruby Quiz - Text Munger (#76)

···

#
# Solution of Tom Moertel
# http://blog.moertel.com/
# 2006-04-21
#
# Usage: munge.rb [inputs...]

class String
    def munge!
     (length - 2).downto(2) do |i|
       j = rand(i) + 1
       self[i], self[j] = self[j], self[i]
     end
     self
   end
end

while line = gets
   puts line.gsub(/\w+/) { |s| s.munge! }
end

=== end ===

A few notes:

I took the term "scramble" in the task definition to mean randomly permute because some occurrences of words in the example text were apparently unchanged by the scrambling transformation (e.g., "keep" and "being" in the tenth line) and some words that had multiple occurrences were scrambled differently for each occurrence (e.g., "remvpidtee" in the second line vs. "retpmevide" in the fourth from the last line).

str.munge! (fairly) permutes the inner characters of +str+ and has no effect on strings of three or fewer characters.

I used +gets+ in the main I/O loop in order to get sensible command-line input handling for free.

Cheers,
Tom

Hello,

Here is my solution to the quiz.
It's not a one-liner anymore - i've left the first version in the
comments, for historical purposes.

# 1st try:
# does not scramble abcd123, which may or not be a good thing
# no support for accented characters
# _ is considered a letter
#puts ARGF.read.gsub(/\b(?=\D+\b)(\w)(\w+)(?=\w\b)/) { $1 + $2.split('').sort_by{rand}.join }

class String
  # returns the string with characters randomly placed
  def randomize
    split('').sort_by{rand}.join
  end

  # character class to identify a word's letter
  # arbitrarily ripped from iso-8859-1
  WordChars = '[a-zA-Z\xc0-\xd6\xd8-\xf6\xf8-\xfd\xff]'
  
  # randomizes each word (defined by +chars+), leaving alone the
  # first and last letters
  # uses a default argument to fit in 80 cols :slight_smile:
  def scramble_words(chars = WordChars)
    gsub(/(#{chars})(#{chars}+)(?=#{chars})/) { $1 + $2.randomize }
  end
end

puts ARGF.read.scramble_words if __FILE__ == $0

···

--
Yoann

It's not part of the challenge this week or next, but you know I'm always for setting your own goals. :slight_smile:

James Edward Gray II

···

On Apr 21, 2006, at 8:10 AM, Florian Groß wrote:

Ruby Quiz wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

What about writing an unscrambler? Could that also be done for this quiz or might that be next week's task? :slight_smile:

Given a word like "there's" or "that's", does the letter before the
apostrophe count as a "last" letter? In other words, could "that's"
become "ttha's"?

In the example above, there's no case where that letter gets
scrambled. It's possible that that's coincidence, but it doesn't look
like it.

Do it whichever way you like it...

I don't know what the study said about contractions, if anything.
Personally, I think I would consider the parts before and after as
separate words, which would be slightly less scrambled, but my
intuition (which could be wrong) says that counting it as one whole
word might throw off legibility more than expected.

Strictly speaking, any Ruby code can be made into one line with
liberal use of the semi-colon (;). It would just be an extremely long
line!

My full, nicely abstracted solution for this quiz is 36 lines
(including empty lines), but I also wrote a somewhat obfuscated
one-line version which is 104 characters. But it is missing some of
the features of the full one. But overall it solves the quiz. It is
probably possible to make an even shorter version.

Ryan

···

On 4/21/06, Jake McArthur <jake.mcarthur@gmail.com> wrote:

My first participation in Ruby Quiz, and it has to be easy. That
said, I must really be missing something because some of you guys are
mentioning one-liners, and mine is 26 lines. Maybe it's because I
made mine highly abstracted, but I still don't really see how to do
this in one line.

Hi --

···

On Sat, 22 Apr 2006, Adam Shelly wrote:

On 4/21/06, Ruby Quiz <james@grayproductions.net> wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

I know everyone here is Nice(tm), so I'm sure this is not the
intent... but between this quiz and the Markov chain one, it seems we
are building a set of utilities perfect for generating those 'Re:
PHARmudMACY'
spam emails selling 'vigara' and such that have been sneaking through
my spam filter at work recently...

Maybe we can use the techniques to filter those messages out :slight_smile:

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

I vote we assume the best instead of the worst.

James Edward Gray II

···

On Apr 21, 2006, at 8:55 PM, Adam Shelly wrote:

On 4/21/06, Ruby Quiz <james@grayproductions.net> wrote:

Your task for this quiz, then, is to take a text as input and output the text in
this fashion. Scramble each word's center (leaving the first and last letters of
each word intact). Whitespace, punctuation, numbers -- anything that isn't a
word -- should also remain unchanged.

I know everyone here is Nice(tm), so I'm sure this is not the
intent... but between this quiz and the Markov chain one, it seems we
are building a set of utilities perfect for generating those 'Re:
PHARmudMACY'
spam emails selling 'vigara' and such that have been sneaking through
my spam filter at work recently...

Himadri Choudhury wrote:

In order to see any performance benefit from the 3rd method I had to make up
some horrifically long words which aren't terribly likely in the English
language (maybe I should have tried German :)).

Try Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz :slight_smile:

Is the performance better if you skip swaps when i == j ?

Also, for a swap method to give random results doesn't one need to swap from a random position in the array which has not been passed through yet? (see Shuffling - Wikipedia noting Fisher-Yates shuffling.)

-a

···

On 23.4.2006, at 09:45, Himadri Choudhury wrote:

print ARGF.read.gsub!(/\B[a-z]+\B/) {|x|
    x.length.times {|i|
        j = rand(i+1)
        x[j], x[i] = x[i] , x[j]
    }
    x
}

Basically, this is an implementation of scrambling that uses swaps. I
remember this method for scrambling from way back, but I can't seem to find
a good reference for it at the moment.
I also figured that this method would be faster since it is linear, while
the sorts are n log(n) (n = length of the word)

To by surprise, I found this method to actually be slower for any normal
text. One possible explanation is that when words are relatively short you
don't gain much from the n vs. nlogn difference, and you lose because while
this method always has n swaps, sorting may have less.

Just a gentle reminder here folks, please remember that Ruby Quiz has a 48 hour no-spoiler period before solutions should be posted. I'm not a big stickler on this, but I know some people do like the time. It's super easy to figure in your head, just look at the quiz date and time and bump it forward two days. That's when it's OK to submit.

For the record, I do consider posting solutions in other languages (like Perl) a spoiler.

Thank you.

James Edward Gray II

···

On Apr 23, 2006, at 4:45 AM, Himadri Choudhury wrote:

Here's my solution.