Smart Quotes

Martin,

Trying to save myself a bit of tedium - has anyone
already written code to replace smart quotes and
other such extensions with their normal ascii
equivalents?

FWIW, pasting into vim and typing 'show ascii' gives
Hex93 for the open " and Hex94 for the close one - I
was hoping someone had already written a tr string
to do the lot (there's an ellipsis, an en- and
em-dash, and a few other punctuation marks too).

    Well, from your description, it looks like you are using the
"Windows Western" character set. If so, it's easy enough to generate
the String#tr parameters. Window's character map, set to Windows
Western encoding reveals:

    0x85 - Ellipsis
    0x91 - Left single quote
    0x92 - Right single quote
    0x93 - Left double quote
    0x94 - Right double quote
    0x96 - En dash
    0x97 - Em dash

    You can look up any other punctuation characters yourself.

    The String#tr call would be something like:

x = "\x91\x92\x93\x94\x96\x97"
x.tr("\x91\x92\x93\x94\x96\x97","''\"\"\-")
=> "''\"\"--"

    Be careful with the "-" in the second parameter. "-" is used to
indicate ranges in String#tr. By placing the "-" at the end of the
string, we force String#tr to interpret it as a character instead of a
range, and we also take advantage of the fact that String#tr replicates
the last character of the second parameter to make the string as long as
the first parameter.

    I'm not sure what you want to change an ellipsis to, my preference
would be "...", but that would require a separate
String#gsub("\x85","...").

    I hope this helps.

    - Warren Brown

[snip]

Thanks!

martin

···

Warren Brown <WBrown@isoft.com> wrote:

    Well, from your description, it looks like you are using the
"Windows Western" character set. If so, it's easy enough to generate
the String#tr parameters. Window's character map, set to Windows
Western encoding reveals: