I asked this a little while back but maybe didn't ask the right way, so maybe somebody can help me if I rephrase:
I'm trying to build an RSS feed that takes, in its item descriptions, ISO-8859-1 text. (I'm using REXML for now.) I'd like to be able to take a non-ASCII character and turn it into a usable XML entity. So, for example, "\251" would get turned into "©":
str = "\251 2004 Francis Hwang"
elt = REXML::Element.new( 'elt' )
elt.text = str
elt.to_s
=> "<elt>\251 2004 Francis Hwang</elt>"
# But I want "<elt>© 2004 Francis Hwang</elt>"
Is there some sort of setting I can twiddle in REXML so that I can assign a text that includes these sorts of characters, and REXML will know to turn them into entities on output? I know I can do this by hand and then prevent escaping by use the :raw flag, but I'd like to avoid that if possible.
Francis
I think there's an escapeHTML function on the CGI that might do it. Of course, it will also hit the > and <. You could still lift the code from there.
~ pat
···
On Friday, October 15, 2004, at 08:38 PM, Francis Hwang wrote:
I asked this a little while back but maybe didn't ask the right way, so maybe somebody can help me if I rephrase:
I'm trying to build an RSS feed that takes, in its item descriptions, ISO-8859-1 text. (I'm using REXML for now.) I'd like to be able to take a non-ASCII character and turn it into a usable XML entity. So, for example, "\251" would get turned into "©":
str = "\251 2004 Francis Hwang"
elt = REXML::Element.new( 'elt' )
elt.text = str
elt.to_s
=> "<elt>\251 2004 Francis Hwang</elt>"
# But I want "<elt>© 2004 Francis Hwang</elt>"
Is there some sort of setting I can twiddle in REXML so that I can assign a text that includes these sorts of characters, and REXML will know to turn them into entities on output? I know I can do this by hand and then prevent escaping by use the :raw flag, but I'd like to avoid that if possible.
I'm trying to build an RSS feed that takes, in its item descriptions,
ISO-8859-1 text. (I'm using REXML for now.) I'd like to be able to take
a non-ASCII character and turn it into a usable XML entity. So, for
example, "\251" would get turned into "©"
Not exactly what you're asking for, but you could use Iconv to convert
ISO-8859-1 into UTF-8. It should be perfectly legal to include UTF-8
characters directly in XML, without turning them into character entities.
Alternatively, if it's sufficient to convert characters 160-255 straight
into numeric entity refs (which works if the top half of ISO-8859-1 maps
directly into Unicode, as I think it does), then how about
a = "Copyright \251 2004"
a.gsub!(/[\240-\377]/) { |c| "&#%d;" % c[0] }
# => "Copyright © 2004"
Regards,
Brian.
I just tried; it doesn't do it.
irb(main):004:0> CGI.escapeHTML( "<br>")
=> "<br>"
irb(main):005:0> CGI.escapeHTML( "<br>\251")
=> "<br>\251"
···
On Oct 16, 2004, at 2:15 AM, Patrick May wrote:
On Friday, October 15, 2004, at 08:38 PM, Francis Hwang wrote:
I asked this a little while back but maybe didn't ask the right way, so maybe somebody can help me if I rephrase:
I'm trying to build an RSS feed that takes, in its item descriptions, ISO-8859-1 text. (I'm using REXML for now.) I'd like to be able to take a non-ASCII character and turn it into a usable XML entity. So, for example, "\251" would get turned into "©":
str = "\251 2004 Francis Hwang"
elt = REXML::Element.new( 'elt' )
elt.text = str
elt.to_s
=> "<elt>\251 2004 Francis Hwang</elt>"
# But I want "<elt>© 2004 Francis Hwang</elt>"
Is there some sort of setting I can twiddle in REXML so that I can assign a text that includes these sorts of characters, and REXML will know to turn them into entities on output? I know I can do this by hand and then prevent escaping by use the :raw flag, but I'd like to avoid that if possible.
I think there's an escapeHTML function on the CGI that might do it. Of course, it will also hit the > and <. You could still lift the code from there.
~ pat