[ANN] nokogiri v1.13.0 released

Nokogiri v1.13.0 has been released!

This is a feature update focused on Ruby 3.1 native (precompiled) gem
support, including experimental ARM64 Linux support; but also contains bug
fixes and so users are encouraged to upgrade.

Full release notes are at

but they are also included below to save you a click.

···

---

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from
Ruby. It provides a sensible, easy-to-understand API for reading
<Parsing an HTML/XML document - Nokogiri>,
writing, modifying
<Modifying an HTML/XML document - Nokogiri>, and
querying <Searching a XML/HTML document - Nokogiri>
documents.
It is fast and standards-compliant by relying on native parsers like
libxml2 (CRuby) and xerces (JRuby).

---

1.13.0 / 2022-01-06NotesRuby

This release introduces native gem support for Ruby 3.1. Please note that
Windows users should use the x64-mingw-ucrt platform gem for Ruby 3.1, and
x64-mingw32 for Ruby 2.6–3.0 (see RubyInstaller 3.1.0 release notes
<RubyInstaller 3.1.0-1 released>).

This release ends support for:

   - Ruby 2.5, for which official support ended 2021-03-31
   <Ruby Maintenance Branches>.
   - JRuby 9.2, which is a Ruby 2.5-compatible release.

Faster, more reliable installation: Native Gem for ARM64 Linux

This version of Nokogiri ships experimental native gem support for the
aarch64-linux platform, which should support AWS Graviton and other ARM
Linux platforms. We don't yet have CI running for this platform, and so
we're interested in hearing back from y'all whether this is working, and
what problems you're seeing. Please send us feedback here: Feedback: Have
you used the aarch64-linux native gem?
<Feedback: Have you used the `aarch64-linux` native gem? · Discussion #2359 · sparklemotion/nokogiri · GitHub>
Publishing

This version of Nokogiri opts-in to the "MFA required to publish" setting
<MFA requirement opt-in - RubyGems Guides> on Rubygems.org. This
and all future Nokogiri gem files must be published to Rubygems by an
account with multi-factor authentication enabled. This should provide some
additional protection against supply-chain attacks.

A related discussion about Trust exists at #2357
<RFC: Increase the level of trust in released gem files · Discussion #2357 · sparklemotion/nokogiri · GitHub> in which I invite
you to participate if you have feelings or opinions on this topic.
Dependencies

   - [CRuby] Vendored libiconv is updated from 1.15 to 1.16. (Note that
   libiconv is only redistributed in the native windows and native darwin
   gems, see LICENSE-DEPENDENCIES.md for more information.) [#2206
   <Upgrade to libiconv 1.16 · Issue #2206 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Upgrade mini_portile2 dependency from ~> 2.6.1 to ~> 2.7.0.
   ("ruby" platform gem only.)

Improved

   - {XML,HTML4}::DocumentFragment constructors all now take an optional
   parse options parameter or block (similar to Document constructors). [
   #1692 <Add an options parameter to all the methods that allow creation of an HTML document fragment by JackMc · Pull Request #1692 · sparklemotion/nokogiri · GitHub>] (Thanks,
   @JackMc <JackMc (Jack McCracken) · GitHub>!)
   - Nokogiri::CSS.xpath_for allows an XPathVisitor to be injected, for
   finer-grained control over how CSS queries are translated into XPath.
   - [CRuby] XML::Reader#encoding will return the encoding detected by the
   parser when it's not passed to the constructor. [#980
   <Nokogiri::XML::Reader#encoding and Nokogiri::XML::Reader#xml_version giving wrong output · Issue #980 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Handle abruptly-closed HTML comments as recommended by WHATWG.
   (Thanks to tehryanx <HackerOne> for
   reporting!)
   - [CRuby] Node#line is no longer capped at 65535. libxml v2.9.0 and
   later support a new parse option, exposed as
   Nokogiri::XML::ParseOptions::PARSE_BIG_LINES, which is turned on by
   default in ParseOptions::DEFAULT_{XML,XSLT,HTML,SCHEMA} (Note that JRuby
   already supported large line numbers.) [#1764
   <investigate using XML_PARSE_BIG_LINES to tell libxml2 to track line numbers bigger than a short int · Issue #1764 · sparklemotion/nokogiri · GitHub>, #1493
   <line numbers not working as expected · Issue #1493 · sparklemotion/nokogiri · GitHub>, #1617
   <Line number capped at 65535 but libxml2 already returns long. · Issue #1617 · sparklemotion/nokogiri · GitHub>, #1505
   <Line number of RelaxNG error seems capped at 65535 · Issue #1505 · sparklemotion/nokogiri · GitHub>, #1003
   <Nokogiri::XML::Node#line overflows · Issue #1003 · sparklemotion/nokogiri · GitHub>, #533
   <XML line limit · Issue #533 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] If a cycle is introduced when reparenting a node (i.e., the
   node becomes its own ancestor), a RuntimeError is raised. libxml2 does
   no checking for this, which means cycles would otherwise result in infinite
   loops on subsequent operations. (Note that JRuby already did this.) [
   #1912 <Hangs forever when trying to replace a child node with the parent node · Issue #1912 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Source builds will download zlib and libiconv via HTTPS.
   ("ruby" platform gem only.) [#2391
   <update source links to https where possible by jmartin-r7 · Pull Request #2391 · sparklemotion/nokogiri · GitHub>] (Thanks,
   @jmartin-r7 <jmartin-r7 (Jeffrey Martin) · GitHub>!)
   - [JRuby] Node#line behavior has been modified to return the line number
   of the node in the *final DOM structure*. This behavior is different
   from CRuby, which returns the node's position in the *input string*.
   Ideally the two implementations would be the same, but at least is now
   officially documented and tested. The real-world impact of this change is
   that the value returned in JRuby is greater by 1 to account for the XML
   prolog in the output. [#2380
   <[bug] Nokogiri/JRuby line numbering off by one when XML prolog is in the source document · Issue #2380 · sparklemotion/nokogiri · GitHub>] (Thanks,
   @dabdine <dabdine · GitHub>!)

Fixed

   - CSS queries on HTML5 documents now correctly match foreign elements
   (SVG, MathML) when namespaces are not specified in the query. [#2376
   <[bug] HTML5 foreign element namespaces should not be required by CSS queries · Issue #2376 · sparklemotion/nokogiri · GitHub>]
   - XML::Builder blocks restore context properly when exceptions are
   raised. [#2372 <[bug] XML::Builder blocks don't restore the parent node when an error is raised from the block · Issue #2372 · sparklemotion/nokogiri · GitHub>]
   (Thanks, @ric2b <ric2b (Ricardo Amendoeira) · GitHub> and @rinthedev
   <rinthedev (Joana Tavares) · GitHub>!)
   - The Nokogiri::CSS::Parser cache now uses the XPathVisitor configuration
   as part of the cache key, preventing incorrect cache results from being
   returned when multiple XPathVisitor options are being used.
   - Error recovery from in-context parsing (e.g., Node#parse) now always
   uses the correct DocumentFragment class. Previously
   Nokogiri::HTML4::DocumentFragment was always used, even for XML
   documents. [#1158 <element name case changes when added to document with different encoding · Issue #1158 · sparklemotion/nokogiri · GitHub>
   ]
   - DocumentFragment#> now works properly, matching a CSS selector against
   only the fragment roots. [#1857
   <`Node#>` operator does not work in a `DocumentFragment` · Issue #1857 · sparklemotion/nokogiri · GitHub>]
   - XML::DocumentFragment#errors now correctly contains any parsing errors
   encountered. Previously this was always empty. (Note that
   HTML::DocumentFragment#errors already did this.)
   - [CRuby] Fix memory leak in Document#canonicalize when inclusive
   namespaces are passed in. [#2345
   <fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Fix memory leak in Document#canonicalize when an argument type
   error is raised. [#2345
   <fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Fix memory leak in EncodingHandler where iconv handlers were
   not being cleaned up. [#2345
   <fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Fix memory leak in XPath custom handlers where string
   arguments were not being cleaned up. [#2345
   <fix memory leaks by flavorjones · Pull Request #2345 · sparklemotion/nokogiri · GitHub>]
   - [CRuby] Fix memory leak in Reader#base_uri where the string returned
   by libxml2 was not freed. [#2347
   <fix: memory leak in Reader#base_uri by flavorjones · Pull Request #2347 · sparklemotion/nokogiri · GitHub>]
   - [JRuby] Deleting a Namespace from a NodeSet no longer modifies the href to
   be the default namespace URL.
   - [JRuby] Fix XHTML formatting of closing tags for non-container
   elements. [#2355 <[bug] Inconsistent behavior on JRuby/CRuby · Issue #2355 · sparklemotion/nokogiri · GitHub>]

Deprecated

   - Passing a Nokogiri::XML::Node as the second parameter to Node.new is
   deprecated and will generate a warning. This parameter should be a kind of
   Nokogiri::XML::Document. This will become an error in a future version
   of Nokogiri. [#975 <Node.new should be type-checking the second argument (must be a Document) · Issue #975 · sparklemotion/nokogiri · GitHub>
   ]
   - Nokogiri::CSS::Parser, Nokogiri::CSS::Tokenizer, and
   Nokogiri::CSS::Node are now internal-only APIs that are no longer
   documented, and should not be considered stable. With the introduction of
   XPathVisitor injection into Nokogiri::CSS.xpath_for there should be no
   reason to rely on these internal APIs.
   - CSS-to-XPath utility classes
   Nokogiri::CSS::XPathVisitorAlwaysUseBuiltins and
   XPathVisitorOptimallyUseBuiltins are deprecated. Prefer
   Nokogiri::CSS::XPathVisitor with appropriate constructor arguments.
   These classes will be removed in a future version of Nokogiri.