First the solution and then the comments:
1 #!/usr/bin/env ruby
2 require 'unicode'
3 class String
4 Diacritic = Regexp.new("[\xcc\x80-\xcd\xaf]",nil,'u')
5 Specials =
"\xc3\x86\xc3\x90\xc3\x98\xc3\x9e\xc3\x9f\xc3\xa6\xc3\xb0\xc3\xb8\xc3\xbe"
6 Letter = Regexp.new("[A-Za-z#{Specials}](?:#{Diacritic}*)",nil,'u')
7 Word = Regexp.new("(#{Letter})(#{Letter}+)(?=#{Letter})",nil,'u')
8 def scramble
9 Unicode.compose(Unicode.decompose(self).gsub(Word) {
10 m = $~
11 m[1] + m[2].scan(Letter).sort_by{rand}.join})
12 end
13 end
14 if __FILE__ == $0
15 while gets
16 puts $_.chomp.scramble
17 end
18 end
First of all, we want the scramble to be able to handle accented
characters. For this, we require the unicode package (available as a
gem) in line 2, for its normalization functions that decompose an
accented character into a standard latin letter and a diacritic.
The letters in iso-latin1 that cannot be decomposed in a plain latin
letter + diacritic are: Thorn, Eth, AE, stroked O, sharp S. The
corresponding 9 forms (excepted the last one, the others can be small
or capital) must be treated as a "special case" in line 5.
The regular expression in line 6 identifies a possibly accented letter.
If ruby had positive zero-width positive look-behind assertions, i.e.,
Perl's /(?<=pattern)/, a word could be decomposed into letters as
Word = Regexp.new("(?<=#{Letter})(#{Letter}+)(?=#{Letter})",nil,'u')
Unfortunately, Ruby doesn't have a $<=, so we are forced to capture
the first character with the regular expression in line 7, and we have
to remember to put it back unchanged (m[1] in line 11).
I might have written lines 10 and 11 together as
$1 + $2.scan(Letter).sort_by{rand}.join})
but you don't have to be a C programmer to understand that using
global variables together with functions that alter them within two
sequence points is a bad idea (See
http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-39.16
).