[ANN] nokogiri 1.2.0 Released

nokogiri version 1.2.0 has been released!

* <http://nokogiri.rubyforge.org/>
* <http://github.com/tenderlove/nokogiri/wikis>
* <http://github.com/tenderlove/nokogiri/tree/master>
* <http://rubyforge.org/mailman/listinfo/nokogiri-talk>
* <http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview>

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser.

Changes:

### 1.2.0 / 2008-02-22

* New features

  * CSS search now supports CSS3 namespace queries
  * Namespaces on the root node are automatically registered
  * CSS queries use the default namespace
  * Nokogiri::XML::Document#encoding get encoding used for this document
  * Nokogiri::XML::Document#url get the document url
  * Nokogiri::XML::Node#add_namespace add a namespace to the node LH#38
  * Nokogiri::XML::Node#each iterate over attribute name, value pairs
  * Nokogiri::XML::Node#keys get all attribute names
  * Nokogiri::XML::Node#line get the line number for a node (Thanks Dirkjan Bussink!)
  * Nokogiri::XML::Node#serialize now takes an optional encoding parameter
  * Nokogiri::XML::Node#to_html, to_xml, and to_xhtml take an optional encoding
  * Nokogiri::XML::Node#to_str
  * Nokogiri::XML::Node#to_xhtml to produce XHTML documents
  * Nokogiri::XML::Node#values get all attribute values
  * Nokogiri::XML::Node#write_to writes the node to an IO object with optional encoding
  * Nokogiri::XML::ProcessingInstrunction.new
  * Nokogiri::XML::SAX::PushParser for all your push parsing needs.

* Bugfixes

  * Fixed Nokogiri::XML::Document#dup
  * Fixed header detection. Thanks rubikitch!
  * Fixed a problem where invalid CSS would cause the parser to hang

* Deprecations

  * Nokogiri::XML::Node.new_from_str will be deprecated in 1.3.0

* API Changes

  * Nokogiri::HTML.fragment now returns an XML::DocumentFragment (LH #32)

## FEATURES:

* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder
* Drop in replacement for Hpricot (though not bug for bug)

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

  * http://gist.github.com/24605

Nokogiri also features an Hpricot compatibility layer to help ease the change
to using correct CSS and XPath.

## SUPPORT:

The Nokogiri mailing list is available here:

  * http://rubyforge.org/mailman/listinfo/nokogiri-talk

The bug tracker is available here:

  * http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview

## SYNOPSIS:

  require 'nokogiri'
  require 'open-uri'
  
  doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

···

####
  # Search for nodes by css
  doc.css('h3.r a.l').each do |link|
    puts link.content
  end
  
  ####
  # Search for nodes by xpath
  doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
  end
  
  ####
  # Or mix and match.
  doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
  end

## REQUIREMENTS:

* ruby 1.8 or 1.9
* libxml
* libxslt

## INSTALL:

* sudo gem install nokogiri

* <http://nokogiri.rubyforge.org/>
* <http://github.com/tenderlove/nokogiri/wikis>
* <http://github.com/tenderlove/nokogiri/tree/master>
* <http://rubyforge.org/mailman/listinfo/nokogiri-talk>
* <http://nokogiri.lighthouseapp.com/projects/19607-nokogiri/overview>

--
Aaron Patterson
http://tenderlovemaking.com/

Aaron Patterson wrote:

nokogiri version 1.2.0 has been released!

grabbed! Tx!

Aaron Patterson wrote:

nokogiri version 1.2.0 has been released!

The following is a question, if I got it wrong, and a code snippet if I didn't.

To add nokogiri to a Merb project's Cucumber featurizer, I...

Added it to features/support/env.rb:

   require "merb-core"
   require "spec"
   require "merb_cucumber/world/simple"
   require 'nokogiri' # <-- with the correct style of quote 'ticks'!

Added its calls to features/steps/result_steps.rb

   When /^you go to (.*)/ do |text|
     @response = request(text)
     @xdoc = Nokogiri::HTML(@response.body.to_s)
   end

   Then /^you should see an? (.+) element$/ do |searcher|
     @xdoc.css(searcher).should_not be_nil
   end

Called its steps from features/comics.feature:

   Feature: serve webcomics

     Scenario: root page
       When you go to /
       Then you should see an img.comic element

The result is the usual bunch of pale green. (But note that I personally suck at writing customer-facing verbiage. They don't care if they see an img with a class of comic! They want to see a comic image! More verbiage tuning is in order...)

Also note that I suck at writing RSpec matchers. The point of all this is clear error messages at fault time, and my .should_not be_nil is also not particularly exemplary!

So thanks for the lib! it's going to the top of my list from now on...

···

--
   Phlip
   http://localhost:4000/Pigleg_Too/1

  Then /^you should see an? (.+) element$/ do |searcher|
    @xdoc.css(searcher).should_not be_nil
  end

Nooop. I forgot to test that in the negative - by changing the Then commandment, and seeing if it fails cleanly. It did not, possibly because a failing CSS hit does not return a nil.

After changing to should_not be_blank, I then upgraded the verbiage in the featurizer:

   Then /^you should see an? (.+) (.+)$/ do |style, element|
     element = { 'image' => 'img' }.fetch(element, element)
     @xdoc.css("#{element}.#{style}").should_not be_blank
   end

...

     When you go to /
     Then you should see a comic image

The {} is speculative coding; if I had a real client asking for these features, they might write 'panel', which I must then translate to <div>...

huh?

···

On Feb 22, 2009, at 20:49 , Phlip wrote:

require 'nokogiri' # <-- with the correct style of quote 'ticks'!