Nokogiri not read html file in Cent OS 32-bit

Hi all,

I want to read and use html file content using Nokogiri in my cent os-32
bit

but it not read any html contents.

@test = Nokogiri::HTML("abc.html")
puts "#{@test}"

but it just shows me java scripts on source page not any html contents.

please reply me if any one know about this issue.

Thanks,
Priyank Shah

···

--
Posted via http://www.ruby-forum.com/.

503 % ri Nokogiri.HTML
= Nokogiri.HTML

(from gem nokogiri-1.4.3.1)

···

On Nov 12, 2010, at 05:53 , Priyank Shah wrote:

@test = Nokogiri::HTML("abc.html")

----------------------------------------------------------------------------
  HTML(thing, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block)

----------------------------------------------------------------------------

Parse HTML. Convenience method for Nokogiri::HTML::Document.parse

---

thing is not a path.

Ryan Davis wrote in post #961099:

@test = Nokogiri::HTML("abc.html")

503 % ri Nokogiri.HTML
= Nokogiri.HTML

(from gem nokogiri-1.4.3.1)
----------------------------------------------------------------------------
  HTML(thing, url = nil, encoding = nil, options =
XML::ParseOptions::DEFAULT_HTML, &block)

hi,

Thanks for reply but not getting solution i get only

<!DOCTYPE html public \"-//W3C DTD HTML 4.0 Tansitional//EN\" .....

as a output, not actual html contents in file.

I check nokogiri but i think it is some html character set encoding
issue.

can you give me some idea about this?

Thanks,
Priyank Shah

···

On Nov 12, 2010, at 05:53 , Priyank Shah wrote:

--
Posted via http://www.ruby-forum.com/\.

What Ryan is telling you: you have to pass a filepointer or the actual HTML as string, not a string containing a filename.

···

Am 15.11.2010 um 06:55 schrieb Priyank Shah <shahpriyank01@gmail.com>:

Ryan Davis wrote in post #961099:

On Nov 12, 2010, at 05:53 , Priyank Shah wrote:

@test = Nokogiri::HTML("abc.html")

503 % ri Nokogiri.HTML
= Nokogiri.HTML

(from gem nokogiri-1.4.3.1)
----------------------------------------------------------------------------
HTML(thing, url = nil, encoding = nil, options =
XML::ParseOptions::DEFAULT_HTML, &block)

hi,

Thanks for reply but not getting solution i get only

<!DOCTYPE html public \"-//W3C DTD HTML 4.0 Tansitional//EN\" .....

as a output, not actual html contents in file.

I check nokogiri but i think it is some html character set encoding
issue.

can you give me some idea about this?

Thanks,
Priyank Shah

--
Posted via http://www.ruby-forum.com/\.

Florian Gilcher wrote in post #961470:

What Ryan is telling you: you have to pass a filepointer or the actual
HTML as string, not a string containing a filename.

hi,

Thanks for explain but still i get the same problem

i us following in cent Os-5.5 32 bit

$> nokogiri -v

Ruby

engine:mri
version:1.8.7
platform:i686-linux

libxml:

loaded: 2.6.26
binding: extension
complied:2.6.26
nokogiri:1.4.3.1

···

--------
my code is like

f = File.open("test.html")
data = Nokogiri::HTML(f)
puts "#{data}"

p "#{data}"

but any of this give

Output:

"<!DOCTYPE html PUBLIC \"-W3C//DTD HTML 4.0 Transitional//EN\" .......

this type of output it shows not get actual html contents.

So help me if you have any more idea.

Thanks,
Priyank Shah

--
Posted via http://www.ruby-forum.com/\.

Can't reproduce your problem. Try this:

  require 'rubygems'
  require 'nokogiri'
  # make sure the file contains sth.
  File.open('test.html', 'w') {|f|
    f.write("<html><body><h1>Foo</h1></body></html>") }

  f = File.open('test.html')
  data = Nokogiri::HTML(f)
  puts data
  p data

----- OUTPUT ------

  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
  "http://www.w3.org/TR/REC-html40/loose.dtd&quot;&gt;
  <html><body><h1>Foo</h1></body></html>
  #<Nokogiri::HTML::Document:0x3ff244a4fb70 name="document"
  children=[#<Nokogiri::XML::DTD:0x3ff244ad5e14 name="html">,
  #<Nokogiri::XML::Element:0x3ff244adf5b8 name="html"
  children=[#<Nokogiri::XML::Element:0x3ff244b50e0c name="body"
  children=[#<Nokogiri::XML::Element:0x3ff244b50b28 name="h1"
  children=[#<Nokogiri::XML::Text:0x3ff244b508a8 "Foo">]>]>]>]>

···

On Mon, 2010-11-15 at 17:56 +0900, Priyank Shah wrote:

Florian Gilcher wrote in post #961470:
> What Ryan is telling you: you have to pass a filepointer or the actual
> HTML as string, not a string containing a filename.

hi,

Thanks for explain but still i get the same problem

i us following in cent Os-5.5 32 bit

$> nokogiri -v

Ruby

engine:mri
version:1.8.7
platform:i686-linux

libxml:

loaded: 2.6.26
binding: extension
complied:2.6.26
nokogiri:1.4.3.1

--------
my code is like

f = File.open("test.html")
data = Nokogiri::HTML(f)
puts "#{data}"

p "#{data}"

but any of this give

Output:

"<!DOCTYPE html PUBLIC \"-W3C//DTD HTML 4.0 Transitional//EN\" .......

this type of output it shows not get actual html contents.

So help me if you have any more idea.

Thanks,
Priyank Shah

Niklas Cathor wrote in post #961490:

Can't reproduce your problem. Try this:

  require 'rubygems'
  require 'nokogiri'
  # make sure the file contains sth.
  File.open('test.html', 'w') {|f|
    f.write("<html><body><h1>Foo</h1></body></html>") }

  f = File.open('test.html')
  data = Nokogiri::HTML(f)
  puts data
  p data

----- OUTPUT ------

  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
  "http://www.w3.org/TR/REC-html40/loose.dtd&quot;&gt;
  <html><body><h1>Foo</h1></body></html>
  #<Nokogiri::HTML::Document:0x3ff244a4fb70 name="document"
  children=[#<Nokogiri::XML::DTD:0x3ff244ad5e14 name="html">,
  #<Nokogiri::XML::Element:0x3ff244adf5b8 name="html"
  children=[#<Nokogiri::XML::Element:0x3ff244b50e0c name="body"
  children=[#<Nokogiri::XML::Element:0x3ff244b50b28 name="h1"
  children=[#<Nokogiri::XML::Text:0x3ff244b508a8 "Foo">]>]>]>]>

Hi,

First thanks to all for helping me in my problem.

I got the solution finally,

I tried

f = open("test.html").read
data = Nokogiri::HTML(f)
puts data
instead of

f = FIle.open("test.html")
data = Nokogiri::HTML(f)
puts data

and i get html data.

so basically i don't use File class.

Thanks,
Priyank Shah

···

--
Posted via http://www.ruby-forum.com/\.