Writing accented characters into HTML files?

Kenneth_McDonald · 5 January 2009 19:52

I'm having trouble when I write accented characters into HTML files; though the accents appear properly in my terminal, they are badly "messed up" in the HTML output. CGI.escape doesn't fix the problem, because these are not "special" characters line < or >, but simply accented e's, o's, etc. I'm assuming the problem has something to do with a character set type mismatch between the file Ruby is writing and what the browser (Firefox) expects, but I'm at a loss as to how to correct it.

Any advice most appreciated,
Thanks,
Ken

Gerald_Murray · 5 January 2009 23:03

Look into using a reference. Valid references are dependent on the
version of the HTML used.

Gerald

Brian_Candler · 6 January 2009 09:20

Kenneth McDonald wrote:

Any advice most appreciated,

Use hexdump -C on the file to see what the actual byte sequences are. If
these are single-byte characters then it's probably ISO-8859-1. If they
are two bytes then it's probably UTF-8.

You can use an XML declaration and/or a <meta> tag in the <head> section
to tell the browser which character set your document is in, and/or get
your web server to set the correct charset in the Content-Type header.

···

--
Posted via http://www.ruby-forum.com/\.

marc1 · 6 January 2009 13:34

Kenneth McDonald said...

I'm having trouble when I write accented characters into HTML files;
though the accents appear properly in my terminal, they are badly
"messed up" in the HTML output. CGI.escape doesn't fix the problem,
because these are not "special" characters line < or >, but simply
accented e's, o's, etc. I'm assuming the problem has something to do
with a character set type mismatch between the file Ruby is writing
and what the browser (Firefox) expects, but I'm at a loss as to how to
correct it.

Any advice most appreciated,

Start by ensuring that you have the following at the top of <head>

Also, post the "messed up" characters; they'll tell us something about
the encoding problem.

Oh, and make sure your editor is writing utf-8.

···

--
Cheers,
Marc

James_Edward_Gray_II · 6 January 2009 16:27

I have some code that detects valid UTF-8 data here:

James Edward Gray II

···

On Jan 6, 2009, at 3:20 AM, Brian Candler wrote:

Kenneth McDonald wrote:

Any advice most appreciated,

Use hexdump -C on the file to see what the actual byte sequences are. If
these are single-byte characters then it's probably ISO-8859-1. If they
are two bytes then it's probably UTF-8.

Topic		Replies	Views
Character encoding question ruby-talk	2	98	26 March 2010
Ruby, Unicode, and HTML Entities Problem ruby-talk	4	209	26 September 2010
Problems making UTF-8 text XML/XHTML friendly (no entity conversion?) ruby-talk	1	141	31 May 2004
Problem with special characters like üöä (german: umlaut) ruby-talk	8	126	12 April 2004
REXML & HTMLentities incorrectly map to UTF-8 ruby-talk	12	155	5 November 2012

Writing accented characters into HTML files?

Related topics