Nokogiri v1.13.0 has been released!
This is a feature update focused on Ruby 3.1 native (precompiled) gem
support, including experimental ARM64 Linux support; but also contains bug
fixes and so users are encouraged to upgrade.
Full release notes are at
but they are also included below to save you a click.
···
---
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from
Ruby. It provides a sensible, easy-to-understand API for reading
<Parsing an HTML/XML document - Nokogiri>,
writing, modifying
<Modifying an HTML/XML document - Nokogiri>, and
querying <Searching a XML/HTML document - Nokogiri>
documents.
It is fast and standards-compliant by relying on native parsers like
libxml2 (CRuby) and xerces (JRuby).
---
1.13.0 / 2022-01-06NotesRuby
This release introduces native gem support for Ruby 3.1. Please note that
Windows users should use the x64-mingw-ucrt platform gem for Ruby 3.1, and
x64-mingw32 for Ruby 2.6–3.0 (see RubyInstaller 3.1.0 release notes
<RubyInstaller 3.1.0-1 released>).
This release ends support for:
- Ruby 2.5, for which official support ended 2021-03-31
<Ruby Maintenance Branches>.
- JRuby 9.2, which is a Ruby 2.5-compatible release.
Faster, more reliable installation: Native Gem for ARM64 Linux
This version of Nokogiri ships experimental native gem support for the
aarch64-linux platform, which should support AWS Graviton and other ARM
Linux platforms. We don't yet have CI running for this platform, and so
we're interested in hearing back from y'all whether this is working, and
what problems you're seeing. Please send us feedback here: Feedback: Have
you used the aarch64-linux native gem?
<Feedback: Have you used the `aarch64-linux` native gem? · Discussion #2359 · sparklemotion/nokogiri · GitHub>
Publishing
This version of Nokogiri opts-in to the "MFA required to publish" setting
<MFA requirement opt-in - RubyGems Guides> on Rubygems.org. This
and all future Nokogiri gem files must be published to Rubygems by an
account with multi-factor authentication enabled. This should provide some
additional protection against supply-chain attacks.
A related discussion about Trust exists at #2357
<RFC: Increase the level of trust in released gem files · Discussion #2357 · sparklemotion/nokogiri · GitHub> in which I invite
you to participate if you have feelings or opinions on this topic.
Dependencies
- [CRuby] Vendored libiconv is updated from 1.15 to 1.16. (Note that
libiconv is only redistributed in the native windows and native darwin
gems, see LICENSE-DEPENDENCIES.md for more information.) [#2206
<Upgrade to libiconv 1.16 · Issue #2206 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Upgrade mini_portile2 dependency from ~> 2.6.1 to ~> 2.7.0.
("ruby" platform gem only.)
Improved
- {XML,HTML4}::DocumentFragment constructors all now take an optional
parse options parameter or block (similar to Document constructors). [
#1692 <Add an options parameter to all the methods that allow creation of an HTML document fragment by JackMc · Pull Request #1692 · sparklemotion/nokogiri · GitHub>] (Thanks,
@JackMc <JackMc (Jack McCracken) · GitHub>!)
- Nokogiri::CSS.xpath_for allows an XPathVisitor to be injected, for
finer-grained control over how CSS queries are translated into XPath.
- [CRuby] XML::Reader#encoding will return the encoding detected by the
parser when it's not passed to the constructor. [#980
<Nokogiri::XML::Reader#encoding and Nokogiri::XML::Reader#xml_version giving wrong output · Issue #980 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Handle abruptly-closed HTML comments as recommended by WHATWG.
(Thanks to tehryanx <HackerOne> for
reporting!)
- [CRuby] Node#line is no longer capped at 65535. libxml v2.9.0 and
later support a new parse option, exposed as
Nokogiri::XML::ParseOptions::PARSE_BIG_LINES, which is turned on by
default in ParseOptions::DEFAULT_{XML,XSLT,HTML,SCHEMA} (Note that JRuby
already supported large line numbers.) [#1764
<investigate using XML_PARSE_BIG_LINES to tell libxml2 to track line numbers bigger than a short int · Issue #1764 · sparklemotion/nokogiri · GitHub>, #1493
<line numbers not working as expected · Issue #1493 · sparklemotion/nokogiri · GitHub>, #1617
<Line number capped at 65535 but libxml2 already returns long. · Issue #1617 · sparklemotion/nokogiri · GitHub>, #1505
<Line number of RelaxNG error seems capped at 65535 · Issue #1505 · sparklemotion/nokogiri · GitHub>, #1003
<Nokogiri::XML::Node#line overflows · Issue #1003 · sparklemotion/nokogiri · GitHub>, #533
<XML line limit · Issue #533 · sparklemotion/nokogiri · GitHub>]
- [CRuby] If a cycle is introduced when reparenting a node (i.e., the
node becomes its own ancestor), a RuntimeError is raised. libxml2 does
no checking for this, which means cycles would otherwise result in infinite
loops on subsequent operations. (Note that JRuby already did this.) [
#1912 <Hangs forever when trying to replace a child node with the parent node · Issue #1912 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Source builds will download zlib and libiconv via HTTPS.
("ruby" platform gem only.) [#2391
<update source links to https where possible by jmartin-r7 · Pull Request #2391 · sparklemotion/nokogiri · GitHub>] (Thanks,
@jmartin-r7 <jmartin-r7 (Jeffrey Martin) · GitHub>!)
- [JRuby] Node#line behavior has been modified to return the line number
of the node in the *final DOM structure*. This behavior is different
from CRuby, which returns the node's position in the *input string*.
Ideally the two implementations would be the same, but at least is now
officially documented and tested. The real-world impact of this change is
that the value returned in JRuby is greater by 1 to account for the XML
prolog in the output. [#2380
<[bug] Nokogiri/JRuby line numbering off by one when XML prolog is in the source document · Issue #2380 · sparklemotion/nokogiri · GitHub>] (Thanks,
@dabdine <dabdine · GitHub>!)
Fixed
- CSS queries on HTML5 documents now correctly match foreign elements
(SVG, MathML) when namespaces are not specified in the query. [#2376
<[bug] HTML5 foreign element namespaces should not be required by CSS queries · Issue #2376 · sparklemotion/nokogiri · GitHub>]
- XML::Builder blocks restore context properly when exceptions are
raised. [#2372 <[bug] XML::Builder blocks don't restore the parent node when an error is raised from the block · Issue #2372 · sparklemotion/nokogiri · GitHub>]
(Thanks, @ric2b <ric2b (Ricardo Amendoeira) · GitHub> and @rinthedev
<rinthedev (Joana Tavares) · GitHub>!)
- The Nokogiri::CSS::Parser cache now uses the XPathVisitor configuration
as part of the cache key, preventing incorrect cache results from being
returned when multiple XPathVisitor options are being used.
- Error recovery from in-context parsing (e.g., Node#parse) now always
uses the correct DocumentFragment class. Previously
Nokogiri::HTML4::DocumentFragment was always used, even for XML
documents. [#1158 <element name case changes when added to document with different encoding · Issue #1158 · sparklemotion/nokogiri · GitHub>
]
- DocumentFragment#> now works properly, matching a CSS selector against
only the fragment roots. [#1857
<`Node#>` operator does not work in a `DocumentFragment` · Issue #1857 · sparklemotion/nokogiri · GitHub>]
- XML::DocumentFragment#errors now correctly contains any parsing errors
encountered. Previously this was always empty. (Note that
HTML::DocumentFragment#errors already did this.)
- [CRuby] Fix memory leak in Document#canonicalize when inclusive
namespaces are passed in. [#2345
<fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Fix memory leak in Document#canonicalize when an argument type
error is raised. [#2345
<fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Fix memory leak in EncodingHandler where iconv handlers were
not being cleaned up. [#2345
<fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Fix memory leak in XPath custom handlers where string
arguments were not being cleaned up. [#2345
<fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
- [CRuby] Fix memory leak in Reader#base_uri where the string returned
by libxml2 was not freed. [#2347
<fix: memory leak in Reader#base_uri by flavorjones · Pull Request #2347 · sparklemotion/nokogiri · GitHub>]
- [JRuby] Deleting a Namespace from a NodeSet no longer modifies the href to
be the default namespace URL.
- [JRuby] Fix XHTML formatting of closing tags for non-container
elements. [#2355 <[bug] Inconsistent behavior on JRuby/CRuby · Issue #2355 · sparklemotion/nokogiri · GitHub>]
Deprecated
- Passing a Nokogiri::XML::Node as the second parameter to Node.new is
deprecated and will generate a warning. This parameter should be a kind of
Nokogiri::XML::Document. This will become an error in a future version
of Nokogiri. [#975 <Node.new should be type-checking the second argument (must be a Document) · Issue #975 · sparklemotion/nokogiri · GitHub>
]
- Nokogiri::CSS::Parser, Nokogiri::CSS::Tokenizer, and
Nokogiri::CSS::Node are now internal-only APIs that are no longer
documented, and should not be considered stable. With the introduction of
XPathVisitor injection into Nokogiri::CSS.xpath_for there should be no
reason to rely on these internal APIs.
- CSS-to-XPath utility classes
Nokogiri::CSS::XPathVisitorAlwaysUseBuiltins and
XPathVisitorOptimallyUseBuiltins are deprecated. Prefer
Nokogiri::CSS::XPathVisitor with appropriate constructor arguments.
These classes will be removed in a future version of Nokogiri.