Hello
I am started my adventures with Ruby I want to write simple parser:
if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end
url = URI.parse('example url')
response = Net::HTTP.start(url.host, url.port) do |http|
http.get(url.path)
end
main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)
I am getting error: parser.rb:17:in `slice!': invalid byte sequence in
UTF-8 (ArgumentError)
Could somebody explain me how to resolve this problem?
All solutions that I found doesn't work for me.
Regards
···
--
Posted via http://www.ruby-forum.com/.
Add this line to check the got body:
puts main_page.inspect
···
2011/3/3 Marek Kis <leisikkeram@gmail.com>:
main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)
--
Iñaki Baz Castillo
<ibc@aliax.net>
Everything seems to looks ok, any strange maybe :
<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>
polish letters in page are problem ?
···
--
Posted via http://www.ruby-forum.com/.
It's encoding is iso-8859-2
···
--
Posted via http://www.ruby-forum.com/.
7stud
(7stud --)
5
...then why did you say this:
Encoding.default_external = Encoding::UTF_8
···
--
Posted via http://www.ruby-forum.com/.
I forgot about checking encoding.
I put Encoding.default_external = Encoding::UTF_8 because without this I
got the same error.
Any chance that it will be work with iso-8859-2?
···
--
Posted via http://www.ruby-forum.com/.
Problem is solved. My IDE causing this problem.
Regards
···
--
Posted via http://www.ruby-forum.com/.
"Iñaki Baz Castillo" <ibc@aliax.net> wrote in post #985298:
···
2011/3/3 Marek Kis <leisikkeram@gmail.com>:
main_page = response.body
links = main_page.slice!(/<table class="regions">.+<\/table>/)
Add this line to check the got body:
puts main_page.inspect
Thanks for yours answer.
It returns me html of the page.
--
Posted via http://www.ruby-forum.com/\.
Maybe such page is not encoded in UTF8.
···
2011/3/3 Marek Kis <leisikkeram@gmail.com>:
Everything seems to looks ok, any strange maybe :
<title>Biura nieruchomo\xB6ci | Agencje nieruchomo\xB6ci</title>
polish letters in page are problem ?
--
Iñaki Baz Castillo
<ibc@aliax.net>
Sure, but does it print a "strange" symbol in your screen?
···
2011/3/3 Marek Kis <leisikkeram@gmail.com>:
It returns me html of the page.
--
Iñaki Baz Castillo
<ibc@aliax.net>