Different results in command-line vs. TextMate

Hi --

I was working on an answer to James Rasmussen's question, and discovered
the following somewhat puzzling (to me) thing.

James's text file has some non-printing (Word-derived?) characters,
instead of regular spaces:

text = File.read("lines.txt")

=>
"Clark\302\240\302\240\302\240\302\240\302\240\302\240Kent\302\240<super@fakeplace....com>\302\240\r\nPop\302\240Eye\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240\302\240<popeye@fakeplace....com>\302\240\r\n"

puts text

Clark Kent <super@fakeplace....com>
Pop Eye <popeye@fakeplace....com>

What's odd is that when I try to scan these lines, I get different
results depending on whether I'm on the command line or in TextMate.
From the command line:

$ cat parse.rb lines = File.readlines("lines.txt")
p RUBY_DESCRIPTION
p lines.map {|line| line.scan(/\w+/) }

$ ruby parse.rb "ruby 1.8.7 (2008-05-31 patchlevel 0) [i686-darwin9.8.0]"
[["Clark", "Kent", "super", "fakeplace", "com"], ["Pop", "Eye",
"popeye", "fakeplace", "com"]]

And from TextMate, using command-r:

"ruby 1.8.7 (2008-05-31 patchlevel 0) [i686-darwin9.8.0]"
[["Clark Kent ", "super", "fakeplace", "com", " "], ["Pop Eye
", "popeye", "fakeplace", "com", " "]]

As you can see, it's not just the display that's different. The scan
operation actually produced different results.

It feels like some kind of Heisenbug but I can't puzzle it out.

David

···

--
David A. Black, Senior Developer, Cyrus Innovation Inc.

   The Ruby training with Black/Brown/McAnally
   Compleat Philadelphia, PA, October 1-2, 2010
   Rubyist http://www.compleatrubyist.com

James's text file has some non-printing (Word-derived?) characters,
instead of regular spaces:

Those are nonbreak spaces (U+00A0, 0xC2A0) that should be treated as \W.

What's odd is that when I try to scan these lines, I get different

results depending on whether I'm on the command line or in TextMate.

I thought the CRLF line endings might have something to do with it, but the
result was the same. Another clue, with 1.9.1-p378, the result from TextMate
was correct, identical to that of the command line.

Ammar

···

On Sat, Jul 17, 2010 at 3:03 PM, David A. Black <dblack@rubypal.com> wrote:

Hi --

···

On Sat, 17 Jul 2010, Ammar Ali wrote:

On Sat, Jul 17, 2010 at 3:03 PM, David A. Black <dblack@rubypal.com> wrote:

James's text file has some non-printing (Word-derived?) characters,
instead of regular spaces:

Those are nonbreak spaces (U+00A0, 0xC2A0) that should be treated as \W.

What's odd is that when I try to scan these lines, I get different

results depending on whether I'm on the command line or in TextMate.

I thought the CRLF line endings might have something to do with it, but the
result was the same. Another clue, with 1.9.1-p378, the result from TextMate
was correct, identical to that of the command line.

Thanks for checking. It turns out to be an encoding thing: TextMate
invokes Ruby with -KU. Without the -KU (which involved editing an
underlying script file, as well as the Bundle Editor entry, but then
again I'm not a bit TextMate bundle expert), it ran the same as the
unadorned command line.

David

--
David A. Black, Senior Developer, Cyrus Innovation Inc.

   The Ruby training with Black/Brown/McAnally
   Compleat Philadelphia, PA, October 1-2, 2010
   Rubyist http://www.compleatrubyist.com