[ANN] nokogiri 1.4.1 Released

Hey everyone! Have you finished your holiday shopping yet? I know I haven't.
Fortunately for you guys, Mike and I like programming a lot more than
shopping. I mean, don't get me wrong. I *love* shopping for myself, I
just find shopping for other people to be, well, difficult.

Anyway, let's get down to business:

nokogiri version 1.4.1 has been released!

* <http://nokogiri.org>
* <http://github.com/tenderlove/nokogiri/wikis>
* <http://github.com/tenderlove/nokogiri/tree/master>
* <http://groups.google.com/group/nokogiri-talk>
* <http://github.com/tenderlove/nokogiri/issues>

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's
many features is the ability to search documents via XPath or CSS3 selectors.

XML is like violence - if it doesn't solve your problems, you are not using
enough of it.

Changes:

### 1.4.1 / 2009/12/10

* New Features

  * Added Nokogiri::LIBXML_ICONV_ENABLED
  * Alias Node#[] to Node#attr
  * XML::Node#next_element added
  * XML::Node#> added for searching a nodes immediate children
  * XML::NodeSet#reverse added
  * Added fragment support to Node#add_child, Node#add_next_sibling,
    Node#add_previous_sibling, and Node#replace.
  * XML::Node#previous_element implemented
  * Rubinius support
  * Ths CSS selector engine now supports :has()
  * XML::NodeSet#filter() was added
  * XML::Node.next= and .previous= are aliases for add_next_sibling and
    add_previous_sibling. GH #183

* Bugfixes

  * XML fragments with namespaces do not raise an exception
    (regression in 1.4.0)
  * Node#matches? works in nodes contained by a DocumentFragment. GH #158
  * Document should not define add_namespace() method. GH #169
  * XPath queries returning namespace declarations do not segfault.
  * Node#replace works with nodes from different documents. GH #162
  * Adding XML::Document#collect_namespaces
  * Fixed bugs in the SOAP4R adapter
  * Fixed bug in XML::Node#next_element for certain edge cases
  * Fixed load path issue with JRuby under Windows. GH #160.
  * XSLT#apply_to will honor the "output method". Thanks richardlehane!
  * Fragments containing leading text nodes with newlines now parse properly.
    GH #178.

## FEATURES:

* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder

Nokogiri parses and searches XML/HTML very quickly, and also has
correctly implemented CSS3 selector support as well as XPath support.

Here is a speed test:

  * http://gist.github.com/24605

## SUPPORT:

The Nokogiri {mailing list}[http://groups.google.com/group/nokogiri-talk]
is available here:

  * http://groups.google.com/group/nokogiri-talk

The {bug tracker}[http://github.com/tenderlove/nokogiri/issues]
is available here:

  * http://github.com/tenderlove/nokogiri/issues

The IRC channel is #nokogiri on freenode.

## SYNOPSIS:

  require 'nokogiri'
  require 'open-uri'
  
  # Get a Nokogiri::HTML:Document for the page we’re interested in...

  doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
  
  # Do funky things with it using Nokogiri::XML::Node methods...

···

####
  # Search for nodes by css
  doc.css('h3.r a.l').each do |link|
    puts link.content
  end
  
  ####
  # Search for nodes by xpath
  doc.xpath('//h3/a[@class="l"]').each do |link|
    puts link.content
  end
  
  ####
  # Or mix and match.
  doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
    puts link.content
  end

## REQUIREMENTS:

* ruby 1.8 or 1.9
* libxml2
* libxml2-dev
* libxslt
* libxslt-dev

## INSTALL:

* sudo gem install nokogiri

--
Aaron Patterson
http://tenderlovemaking.com/

Good to hear mate -- and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I've ever used.

Just letting you know your work is thoroughly appreciated.

+1

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

Regards,

Jeremy Henty

···

On 2009-12-11, Bapabooiee <bapabooiee@gmail.com> wrote:

Good to hear mate -- and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I've ever used.

Just letting you know your work is thoroughly appreciated.

next time I'd suggest using mechanize.

···

On Dec 11, 2009, at 13:25 , Jeremy Henty wrote:

On 2009-12-11, Bapabooiee <bapabooiee@gmail.com> wrote:

Good to hear mate -- and I have to admit, Nokogiri is quite possibly
the most elegant XML/HTML parser I've ever used.

Just letting you know your work is thoroughly appreciated.

+1

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

Thanks guys! It's good to hear nice things once in a while. :smiley:

Thanks for using nokogiri, and if you run in to bugs, make sure to
report them!

···

On Sat, Dec 12, 2009 at 06:25:05AM +0900, Jeremy Henty wrote:

On 2009-12-11, Bapabooiee <bapabooiee@gmail.com> wrote:
> Good to hear mate -- and I have to admit, Nokogiri is quite possibly
> the most elegant XML/HTML parser I've ever used.
>
> Just letting you know your work is thoroughly appreciated.

+1

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

--
Aaron Patterson
http://tenderlovemaking.com/

I'll keep it in mind, thanks. *sigh*, so many toys, so little time! :slight_smile:

Jeremy Henty

···

On 2009-12-11, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Dec 11, 2009, at 13:25 , Jeremy Henty wrote:

I recently needed to script up a web scraper and Nokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

next time I'd suggest using mechanize.

You would actually probably want to use a combination of both,
depending on what you're doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.

···

On Dec 12, 4:26 am, Jeremy Henty <onepo...@starurchin.org> wrote:

On 2009-12-11, Ryan Davis <ryand-r...@zenspider.com> wrote:

> On Dec 11, 2009, at 13:25 , Jeremy Henty wrote:

>> I recently needed to script up a web scraper andNokogiri was the
>> hammer that hit the nail of the problem squarely on the head. Thank
>> you for an excellent piece of work.

> next time I'd suggest using mechanize.

I'll keep it in mind, thanks. *sigh*, so many toys, so little time! :slight_smile:

Jeremy Henty

mechanize already uses nokogiri.

···

On Dec 14, 2009, at 14:00 , Bapabooiee wrote:

On Dec 12, 4:26 am, Jeremy Henty <onepo...@starurchin.org> wrote:

On 2009-12-11, Ryan Davis <ryand-r...@zenspider.com> wrote:

On Dec 11, 2009, at 13:25 , Jeremy Henty wrote:

I recently needed to script up a web scraper andNokogiri was the
hammer that hit the nail of the problem squarely on the head. Thank
you for an excellent piece of work.

next time I'd suggest using mechanize.

I'll keep it in mind, thanks. *sigh*, so many toys, so little time! :slight_smile:

Jeremy Henty

You would actually probably want to use a combination of both,
depending on what you're doing. You could use Mechanize for crawling &
scraping the site, and then you use Nokogiri to pry the information
you want out of the markup.