Unknown character print on irb or command prompt

hi,

I read html file using nokogiri. and its work fine.

But after read when i print it, it shows me unknown charater like

" " in place of <somestarting>hello&nbsp;</somecomplete>

so it looks like "hello ".

it create problem bcoz of &nbsp and ending tag.

If any know about its solution please help.

Thanks,
Priyank Shah

···

--
Posted via http://www.ruby-forum.com/.

Try using
   p str
or
   puts str.inspect
or
   puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

···

--
Posted via http://www.ruby-forum.com/.

Brian Candler wrote:

Try using
   p str
or
   puts str.inspect
or
   puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

Hi

Thanks for reply,

But it is not useful for me if i use inspect it convert "hello\302\240"

i want simple space.

Thanks,
Priyank Shah

···

--
Posted via http://www.ruby-forum.com/\.

Priyank Shah wrote:

But it is not useful for me if i use inspect it convert "hello\302\240"

That is useful.

It shows that the &nbsp; has been converted into the sequence \302\240
(octal)
or \xc2\xa0 (hex)

That happens to be the code for a non-breaking space in UTF-8, codepoint
160:

$ irb19

160.chr("UTF-8")

=> " "

160.chr("UTF-8").bytes.to_a

=> [194, 160]

160.chr("UTF-8").force_encoding("ASCII-8BIT")

=> "\xC2\xA0"

So the terminal you are trying to print it to is non-UTF-8. Perhaps a
Windows box? You didn't say what your platform was.

In that case, you need to re-encode it to the appropriate character set.

···

--
Posted via http://www.ruby-forum.com/\.