[ANN] nokogiri 1.1.1 Released

nokogiri version 1.1.1 has been released!

* <http://nokogiri.rubyforge.org/>
* <http://github.com/tenderlove/nokogiri/wikis>
* <http://github.com/tenderlove/nokogiri/tree/master>
* <http://rubyforge.org/mailman/listinfo/nokogiri-talk>
* <http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview>

Nokogiri (��) is an HTML, XML, SAX, and Reader parser.

Changes:

### 1.1.1

* New features

  * Added XML::Node#elem?
  * Added XML::Node#attribute_nodes
  * Added XML::Attr
  * XML::Node#delete added.
  * XML::NodeSet#inner_html added.

* Bugfixes

  * Not including an HTML entity for \r for HTML nodes.
  * Removed CSS::SelectorHandler and XML::XPathHandler
  * XML::Node#attributes returns an Attr node for the value.
  * XML::NodeSet implements to_xml

## FEATURES:

* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder
* Drop in replacement for Hpricot (though not bug for bug)

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

  * http://gist.github.com/24605

Nokogiri also features an Hpricot compatibility layer to help ease the change
to using correct CSS and XPath.

## SUPPORT:

The Nokogiri mailing list is available here:

  * http://rubyforge.org/mailman/listinfo/nokogiri-talk

The bug tracker is available here:

  * http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview

## SYNOPSIS:

  require 'nokogiri'
  require 'open-uri'
  
  doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

···

####
  # Search for nodes by css
  doc.css('h3.r a.l').each do |link|
    puts link.content
  end
  
  ####
  # Search for nodes by xpath
  doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
  end
  
  ####
  # Or mix and match.
  doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
  end

## INSTALL:

* sudo gem install nokogiri

* <http://nokogiri.rubyforge.org/>
* <http://github.com/tenderlove/nokogiri/wikis>
* <http://github.com/tenderlove/nokogiri/tree/master>
* <http://rubyforge.org/mailman/listinfo/nokogiri-talk>
* <http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview>

--
Aaron Patterson
http://tenderlovemaking.com/

Thanks Aaron for your work on Nokogiri.

I noticed what looked like JRuby support so I tried installing the gem
(worked) and then an example that failed:

irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> require 'open-uri'
=> true
irb(main):003:0> doc = Nokogiri::HTML(open('http://markwatson.com'))
NoMethodError: undefined method `read_memory' for
Nokogiri::HTML::Document:Class
  from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:36:in `parse'
  from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in `HTML'
  from (irb):4
  from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:in `eval_input'

I am using version jruby 1.1.5 - will a later version of JRuby make
this work?

Thanks,
Mark

No. Unfortunately the jruby release is sort of a lie.... It doesn't
actually work on jruby. I've been releasing a jruby version so that
webrat can use it's CSS to XPath conversion code, then fall back on
REXML.

We're working on a better jruby solution though. We've got a branch
that uses FFI, and Charles Nutter has a branch with a Java
implementation.

  GitHub - headius/nokogiri: Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

I'm not sure what the status is on his branch.

···

On Thu, Jan 15, 2009 at 05:44:05AM +0900, Mark Watson wrote:

Thanks Aaron for your work on Nokogiri.

I noticed what looked like JRuby support so I tried installing the gem
(worked) and then an example that failed:

irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> require 'open-uri'
=> true
irb(main):003:0> doc = Nokogiri::HTML(open('http://markwatson.com'))
NoMethodError: undefined method `read_memory' for
Nokogiri::HTML::Document:Class
  from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:36:in `parse'
  from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
java/lib/nokogiri/html.rb:15:in `HTML'
  from (irb):4
  from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:in `eval_input'

I am using version jruby 1.1.5 - will a later version of JRuby make
this work?

--
Aaron Patterson
http://tenderlovemaking.com/

Thanks Aaron for the update on Charles' and your branches. I am using
nokogiri in 2 examples in a new Ruby book that I am writing for
APress; I had a warning about JRuby incompatibility (and have a little
code using a pure Ruby alternative), but by the time the book is
published (5 months) it looks like we will have a working version for
JRuby. I could be of more help with the pure Java version, so I will
as Charles if he wants help.

···

On Jan 14, 2:00 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:

On Thu, Jan 15, 2009 at 05:44:05AM +0900, Mark Watson wrote:
> Thanks Aaron for your work on Nokogiri.

> I noticed what looked like JRuby support so I tried installing the gem
> (worked) and then an example that failed:

> irb(main):001:0> require 'nokogiri'
> => true
> irb(main):002:0> require 'open-uri'
> => true
> irb(main):003:0> doc = Nokogiri::HTML(open('http://markwatson.com'))
> NoMethodError: undefined method `read_memory' for
> Nokogiri::HTML::Document:Class
> from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
> java/lib/nokogiri/html.rb:36:in `parse'
> from /Users/markw/bin/jruby/lib/ruby/gems/1.8/gems/nokogiri-1.1.1-
> java/lib/nokogiri/html.rb:15:in `HTML'
> from (irb):4
> from /Users/markw/bin/jruby/lib/ruby/1.8/irb.rb:150:in `eval_input'

> I am using version jruby 1.1.5 - will a later version of JRuby make
> this work?

No. Unfortunately the jruby release is sort of a lie.... It doesn't
actually work on jruby. I've been releasing a jruby version so that
webrat can use it's CSS to XPath conversion code, then fall back on
REXML.

We're working on a better jruby solution though. We've got a branch
that uses FFI, and Charles Nutter has a branch with a Java
implementation.

GitHub - headius/nokogiri: Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

I'm not sure what the status is on his branch.

--
Aaron Pattersonhttp://tenderlovemaking.com/