Smart Quotes

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

martin

"Martin DeMello" <martindemello@yahoo.com> schrieb im Newsbeitrag
news:SjZxc.723802$oR5.571658@pd7tw3no...

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

What exactly are "smart quotes"?

    robert

In what character set? While normal quotes fall in the ASCII set, smart
quotes don't, so the replacing them will depend on the character set.

For UTF-8, try gsub(/“(.*)”/, "\"\1\"") -- though that'll only get one
possible set of quotes. There's also the German-style low-high quotes,
like „this”, and the French like «this» -- I don't know what cases
you're trying to solve.

Ari

···

On Thu, 2004-06-10 at 13:26 +0000, Martin DeMello wrote:

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

Martin DeMello wrote:

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

I think _why did, at least RedCloth handles qoutes nicely.

irb(main):001:0> require "RedCloth"
=> true
irb(main):002:0> a = RedCloth.new( "\"Quotes\" in a RedCloth string")
=> "\"Quotes\" in a RedCloth string"
irb(main):004:0> a.to_html
=> "<p>&#8220;Quotes&#8221; in a RedCloth string</p>"

Now, the other posts to this thread made me think about character sets...

Happy rubying

Stephan

From a Google search:

"Smart quotes are a feature found in many popular word processing
programs. They're smart because they automatically insert open
quotation marks at the beginning of a word and closed quotation marks
at the end. Unfortunately, HTML is not smart enough for smart quotes
since they aren't plain ASCII, so if you have smart quotes in your
code, you'll end up with some strange characters on your Web page. Be
sure to have smart quotes turned off whenever writing HTML code. "

···

On Fri, 11 Jun 2004 01:53:37 +0900, Robert Klemme <bob.news@gmx.net> wrote:

"Martin DeMello" <martindemello@yahoo.com> schrieb im Newsbeitrag
news:SjZxc.723802$oR5.571658@pd7tw3no...

> Trying to save myself a bit of tedium - has anyone already written code
> to replace smart quotes and other such extensions with their normal
> ascii equivalents?

What exactly are "smart quotes"?

    robert

Quotations from interesting people in history that you can use to make
you look smart.

Gavin

···

On Friday, June 11, 2004, 2:53:37 AM, Robert wrote:

"Martin DeMello" <martindemello@yahoo.com> schrieb im Newsbeitrag
news:SjZxc.723802$oR5.571658@pd7tw3no...

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

What exactly are "smart quotes"?

Not what I meant - I want to go through an 'extended ascii' document,
and replace every extended quote character with its ascii equivalent.

FWIW, pasting into vim and typing 'show ascii' gives Hex93 for the open "
and Hex94 for the close one - I was hoping someone had already written a
tr string to do the lot (there's an ellipsis, an en- and em-dash, and a
few other punctuation marks too).

martin

···

Stephan Kämper <Stephan.Kaemper@schleswig-holstein.de> wrote:

Martin DeMello wrote:

> Trying to save myself a bit of tedium - has anyone already written code
> to replace smart quotes and other such extensions with their normal
> ascii equivalents?

I think _why did, at least RedCloth handles qoutes nicely.

"Smart quotes are a feature found in many popular word processing
programs. They're smart because they automatically insert open
quotation marks at the beginning of a word and closed quotation marks
at the end. Unfortunately, HTML is not smart enough for smart quotes
since they aren't plain ASCII, so if you have smart quotes in your
code, you'll end up with some strange characters on your Web page. Be
sure to have smart quotes turned off whenever writing HTML code. "

Actually, just put the appropriate character set declaration in your
code, and it works nicely:

<meta http-equiv='Content-type' value='text/html; charset=iso8859-1' />

If you speak English and the smart quotes are one byte, then iso-8859-1
is for you. If they're two bytes, then UTF-8 is the character set that's
being used.

If you're not in a primarily English-speaking country, it'll be iso-
8859-something-else (-2 for poland, I know -- there's a list if you
look.)

"Gavin Sinclair" <gsinclair@soyabean.com.au> schrieb im Newsbeitrag
news:110-1972184642.20040611091139@soyabean.com.au...

···

On Friday, June 11, 2004, 2:53:37 AM, Robert wrote:

> "Martin DeMello" <martindemello@yahoo.com> schrieb im Newsbeitrag
> news:SjZxc.723802$oR5.571658@pd7tw3no...
>> Trying to save myself a bit of tedium - has anyone already written code
>> to replace smart quotes and other such extensions with their normal
>> ascii equivalents?

> What exactly are "smart quotes"?

Quotations from interesting people in history that you can use to make
you look smart.

Ah, much better... :-))

    robert

This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely
using the windows-1252 character set (aka cp1252), a Microsoft extension
of ISO-8859-1 that much of their software likes to claim is the actual
standard character set. Please accurately label the character set that
is used following the standards.

A list of characters that actually are in ISO-8859-1, along with the
extensions present in windows-1252 is available at:

  <404 Page Not Found -- psacake.com;

···

At 02:19 +0900 11 Jun 2004, Aredridel <aredridel@nbtsc.org> wrote:

If you speak English and the smart quotes are one byte, then iso-8859-1
is for you. If they're two bytes, then UTF-8 is the character set that's

"Aaron Schrab" <aaron@schrab.com> schrieb im Newsbeitrag
news:20040611001516.GA11137@frell.qqx.org...

> If you speak English and the smart quotes are one byte, then iso-8859-1
> is for you. If they're two bytes, then UTF-8 is the character set that's

This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely

Can we please stop writing "smart quotes" when we in fact mean "matching
quotes" or "opening and closing quotes"? IMHO smart quotes do not denote
certain characters but a feature of software, namely that the software
inserts matching opening and closing quotes of the user's language
convention whenever the user enters double / single quotes.

From this it's immediately clear that there is no single pair of "smart
quotes" but a whole bunch of different character pairs that are inserted
whenever certain pieces of software think they should replace other quotes
entered by the user.

Thanks!

Kind regards

    robert

···

At 02:19 +0900 11 Jun 2004, Aredridel <aredridel@nbtsc.org> wrote:

Whoops. My apologies there. I've been using Unicode for so long now that
I forgot they curlies weren't in 8859-1. Windows 1252 is so close that
for a while, I assumed they were the same.

Aren't character sets lovely?

···

On Fri, 2004-06-11 at 09:20 +0900, Aaron Schrab wrote:

At 02:19 +0900 11 Jun 2004, Aredridel <aredridel@nbtsc.org> wrote:
> If you speak English and the smart quotes are one byte, then iso-8859-1
> is for you. If they're two bytes, then UTF-8 is the character set that's

This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely
using the windows-1252 character set (aka cp1252), a Microsoft extension
of ISO-8859-1 that much of their software likes to claim is the actual
standard character set. Please accurately label the character set that
is used following the standards.

Hm - I'm referring specifically to the characters that MSWord inserts,
then (since I keep getting them in my email, and have to convert them to
pure ascii). I blame Outlook's use of Word(!) as a mail editor.

martin

···

Robert Klemme <bob.news@gmx.net> wrote:

Can we please stop writing "smart quotes" when we in fact mean "matching
quotes" or "opening and closing quotes"? IMHO smart quotes do not denote
certain characters but a feature of software, namely that the software
inserts matching opening and closing quotes of the user's language
convention whenever the user enters double / single quotes.

From this it's immediately clear that there is no single pair of "smart
quotes" but a whole bunch of different character pairs that are inserted
whenever certain pieces of software think they should replace other quotes
entered by the user.