Slice! invalid byte sequence in UTF-8

Hello

I am started my adventures with Ruby I want to write simple parser:

if RUBY_VERSION =~ /1.9/
    Encoding.default_external = Encoding::UTF_8
    Encoding.default_internal = Encoding::UTF_8
end

url = URI.parse('example url')

response = Net::HTTP.start(url.host, url.port) do |http|
  http.get(url.path)
end

main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)

I am getting error: parser.rb:17:in `slice!': invalid byte sequence in
UTF-8 (ArgumentError)

Could somebody explain me how to resolve this problem?
All solutions that I found doesn't work for me.

Regards

···

--
Posted via http://www.ruby-forum.com/.

Add this line to check the got body:

puts main_page.inspect

···

2011/3/3 Marek Kis <leisikkeram@gmail.com>:

main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)

--
Iñaki Baz Castillo
<ibc@aliax.net>

Everything seems to looks ok, any strange maybe :

<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>

polish letters in page are problem ?

···

--
Posted via http://www.ruby-forum.com/.

It's encoding is iso-8859-2

···

--
Posted via http://www.ruby-forum.com/.

...then why did you say this:

Encoding.default_external = Encoding::UTF_8

···

--
Posted via http://www.ruby-forum.com/.

I forgot about checking encoding.

I put Encoding.default_external = Encoding::UTF_8 because without this I
got the same error.

Any chance that it will be work with iso-8859-2?

···

--
Posted via http://www.ruby-forum.com/.

Problem is solved. My IDE causing this problem.

Regards

···

--
Posted via http://www.ruby-forum.com/.

"Iñaki Baz Castillo" <ibc@aliax.net> wrote in post #985298:

···

2011/3/3 Marek Kis <leisikkeram@gmail.com>:

main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)

Add this line to check the got body:

puts main_page.inspect

Thanks for yours answer.

It returns me html of the page.

--
Posted via http://www.ruby-forum.com/\.

Maybe such page is not encoded in UTF8.

···

2011/3/3 Marek Kis <leisikkeram@gmail.com>:

Everything seems to looks ok, any strange maybe :

<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>

polish letters in page are problem ?

--
Iñaki Baz Castillo
<ibc@aliax.net>

Sure, but does it print a "strange" symbol in your screen?

···

2011/3/3 Marek Kis <leisikkeram@gmail.com>:

It returns me html of the page.

--
Iñaki Baz Castillo
<ibc@aliax.net>