[ANN] Nokogiri v1.11.0 released

Nokogiri version 1.11.0 has been released.

This is a significant release containing installation improvements,
performance improvements, bugfixes, new features, and two behavior changes
which are potentially breaking:

- one change addresses a low-severity security issue, see

- one change fixes a bug where the `strict` parsing option was not
respected by the HTML parser, see

Full release notes are at

Those release notes are reproduced here in markdown format for your
convenience.

···

-----

# Description

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from
Ruby. It provides a sensible, easy-to-understand API for reading, writing,
modifying, and querying documents. It is fast and standards-compliant by
relying on native parsers like libxml2 (C) and xerces (Java).

## v1.11.0 / 2021-01-03

### Notes

#### Faster, more reliable installation: Native Gems for Linux and
OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine
architecture. On supported platforms, this removes the need for compiling
the C extension and the packaged libraries. This results in **much faster
installation** and **more reliable installation**, which as you probably
know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0
we are also shipping native gems for these platforms:

- Linux: `x86-linux` and `x86_64-linux` -- including musl platforms like
alpine
- OSX/Darwin: `x86_64-darwin` and `arm64-darwin`

We'd appreciate your thoughts and feedback on this work at [#2075](
Feedback and final to-do items for precompiled native gems on Linux and OSX · Issue #2075 · sparklemotion/nokogiri · GitHub).

### Dependencies

#### Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled
native gems.

This release ends support for:

* Ruby 2.3, for which [official support ended on 2019-03-31](
Support of Ruby 2.3 has ended)
[[#1886](Deprecate Ruby 2.3 support · Issue #1886 · sparklemotion/nokogiri · GitHub)] (Thanks
[@ashmaroli](ashmaroli (Ashwin Maroli) · GitHub)!)
* Ruby 2.4, for which [official support ended on 2020-04-05](
Support of Ruby 2.4 has ended)
* JRuby 9.1, which is the Ruby 2.3-compatible release.

#### Gems

* Explicitly add racc as a runtime dependency. [[#1988](
Explicit dependency on Racc · Issue #1988 · sparklemotion/nokogiri · GitHub)] (Thanks, [@voxik](
voxik (Vít Ondruch) · GitHub)!)
* [MRI] Upgrade mini_portile2 dependency from `~> 2.4.0` to `~> 2.5.0`
[[#2005](https://github.com/sparklemotion/nokogiri/issues/2005)] (Thanks,
[@alejandroperea](alejandroperea (Alejandro Perea) · GitHub)!)

### Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled
"XML::Schema parsing treats input as untrusted by default".

### Added

* Add Node methods for manipulating "keyword attributes" (for example,
`class` and `rel`): `#kwattr_values`, `#kwattr_add`, `#kwattr_append`, and
`#kwattr_remove`. [[#2000](
https://github.com/sparklemotion/nokogiri/issues/2000)]
* Add support for CSS queries `a:has(> b)`, `a:has(~ b)`, and `a:has(+ b)`.
[[#688](Should support CSS selector: parent:has(>child) · Issue #688 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@jonathanhefner](jonathanhefner (Jonathan Hefner) · GitHub)!)
* Add `Node#value?` to better match expected semantics of a Hash-like
object. [[#1838](Enhancement: Add #key? #value? predicates to Node · Issue #1838 · sparklemotion/nokogiri · GitHub),
[#1840](https://github.com/sparklemotion/nokogiri/issues/1840)] (Thanks,
[@MatzFan](MatzFan · GitHub)!)
* [CRuby] Add `Nokogiri::XML::Node#line=` for use by downstream libs like
nokogumbo. [[#1918](https://github.com/sparklemotion/nokogiri/issues/1918)]
(Thanks, [@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* `nokogiri.gemspec` is back after a 10-year hiatus. We still prefer you
use the official releases, but master is pretty stable these days, and YOLO.

### Performance

* [CRuby] The CSS `~=` operator and class selector `.` are about 2x faster.
[[#2137](https://github.com/sparklemotion/nokogiri/issues/2137), [#2135](
[feature] optimize CSS class queries · Issue #2135 · sparklemotion/nokogiri · GitHub)]
* [CRuby] Patch libxml2 to call `strlen` from `xmlStrlen` rather than the
naive implementation, because `strlen` is generally optimized for the
architecture. [[#2144](https://github.com/sparklemotion/nokogiri/issues/2144)]
(Thanks, [@ilyazub](ilyazub (Ilya Zub) · GitHub)!)
* Improve performance of some namespace operations. [[#1916](
https://github.com/sparklemotion/nokogiri/issues/1916)] (Thanks,
[@ashmaroli](ashmaroli (Ashwin Maroli) · GitHub)!)
* Remove unnecessary array allocations from Node serialization methods
[[#1911](https://github.com/sparklemotion/nokogiri/issues/1911)] (Thanks,
[@ashmaroli](ashmaroli (Ashwin Maroli) · GitHub)!)
* Avoid creation of unnecessary zero-length String objects. [[#1970](
https://github.com/sparklemotion/nokogiri/issues/1970)] (Thanks,
[@ashmaroli](ashmaroli (Ashwin Maroli) · GitHub)!)
* Always compile libxml2 and libxslt with '-O2' [[#2022](
libxml2 should be compiled with "-O2" · Issue #2022 · sparklemotion/nokogiri · GitHub), [#2100](
https://github.com/sparklemotion/nokogiri/issues/2100)] (Thanks, [@ilyazub](
ilyazub (Ilya Zub) · GitHub)!)
* [JRuby] Lots of code cleanup and performance improvements. [[#1934](
https://github.com/sparklemotion/nokogiri/issues/1934)] (Thanks, [@kares](
kares (Karol Bucek) · GitHub)!)
* [CRuby] `RelaxNG.from_document` no longer leaks memory. [[#2114](
[bug] RelaxNG schema parsing has a small memory leak · Issue #2114 · sparklemotion/nokogiri · GitHub)]

### Improved

* [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for
browsers. [[#2058](https://github.com/sparklemotion/nokogiri/issues/2058)]
(Thanks to HackerOne user [mayflower](
HackerOne) for reporting this!)
* {HTML,XML}::Document#parse now accept `Pathname` objects. Previously this
worked only if the referenced file was less than 4096 bytes long; longer
files resulted in undefined behavior because the `read` method would be
repeatedly invoked. [[#1821](
https://github.com/sparklemotion/nokogiri/issues/1821), [#2110](
Pathname objects are treated as IO and corrupt input on parse · Issue #2110 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@doriantaylor](doriantaylor (Dorian Taylor) · GitHub) and [@phokz](
phokz (Josef Liška) · GitHub)!)
* [CRuby] Nokogumbo builds faster because it can now use header files
provided by Nokogiri. [[#1788](
https://github.com/sparklemotion/nokogiri/issues/1788)] (Thanks,
[@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* Add `frozen_string_literal: true` magic comment to all `lib` files.
[[#1745](https://github.com/sparklemotion/nokogiri/issues/1745)] (Thanks,
[@oniofchaos](oniofchaos (Dillon Welch) · GitHub)!)
* [JRuby] Clean up deprecated calls into JRuby. [[#2027](
https://github.com/sparklemotion/nokogiri/issues/2027)] (Thanks, [@headius](
headius (Charles Oliver Nutter) · GitHub)!)

### Fixed

* HTML Parsing in "strict" mode (i.e., the `RECOVER` parse option not set)
now correctly raises a `XML::SyntaxError` exception. Previously the value
of the `RECOVER` bit was being ignored by CRuby and was misinterpreted by
JRuby. [[#2130]([bug] The `strict` parsing mode does not raise for invalid HTML documents · Issue #2130 · sparklemotion/nokogiri · GitHub)]
* The CSS `~=` operator now correctly handles non-space whitespace in the
`class` attribute. commit e45dedd
* The switch to turn off the CSS-to-XPath cache is now thread-local, rather
than being shared mutable state. [[#1935](
Race condition in Nokogiri::CSS::Parser · Issue #1935 · sparklemotion/nokogiri · GitHub)]
* The Node methods `add_previous_sibling`, `previous=`, `before`,
`add_next_sibling`, `next=`, `after`, `replace`, and `swap` now correctly
use their parent as the context node for parsing markup. These methods now
also raise a `RuntimeError` if they are called on a node with no parent.
[[nokogumbo#160](coerce(str) results in Text node rather than html · Issue #160 · rubys/nokogumbo · GitHub)]
* [JRuby] XML::Schema XSD validation errors are captured in
`XML::Schema#errors`. These errors were previously ignored.
* [JRuby] Standardize reading from IO like objects, including StringIO.
[[#1888](Need help parsing a standard nginx directory listing. Different results with ruby and jruby. · Issue #1888 · sparklemotion/nokogiri · GitHub), [#1897](
https://github.com/sparklemotion/nokogiri/issues/1897)]
* [JRuby] Fix how custom XPath function namespaces are inferred to be less
naive. [[#1890](https://github.com/sparklemotion/nokogiri/issues/1890),
[#2148](https://github.com/sparklemotion/nokogiri/issues/2148)]
* [JRuby] Clarify exception message when custom XPath functions can't be
resolved.
* [JRuby] Comparison of Node to Document with `Node#<=>` now matches
CRuby/libxml2 behavior.
* [CRuby] Syntax errors are now correctly captured in `Document#errors` for
short HTML documents. Previously the SAX parser used for encoding detection
was clobbering libxml2's global error handler.
* [CRuby] Fixed installation on AIX with respect to `vasprintf`. [[#1908](
https://github.com/sparklemotion/nokogiri/issues/1908)]
* [CRuby] On some platforms, avoid symbol name collision with glibc's
`canonicalize`. [[#2105](
https://github.com/sparklemotion/nokogiri/issues/2105)]
* [Windows Visual C++] Fixed compiler warnings and errors. [[#2061](
https://github.com/sparklemotion/nokogiri/issues/2061), [#2068](
https://github.com/sparklemotion/nokogiri/issues/2068)]
* [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release
candidates. [[#1788](https://github.com/sparklemotion/nokogiri/issues/1788)]
(Thanks, [@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* [JRuby] Fixed document encoding regression in v1.11.0 release candidates.
[[#2080](https://github.com/sparklemotion/nokogiri/issues/2080), [#2083](
https://github.com/sparklemotion/nokogiri/issues/2083)] (Thanks, [@thbar](
https://github.com/thbar)!)

### Removed

* The internal method `Nokogiri::CSS::Parser.cache_on=` has been removed.
Use `.set_cache` if you need to muck with the cache internals.
* The class method `Nokogiri::CSS::Parser.parse` has been removed. This was
originally deprecated in 2009 in 13db61b. Use `Nokogiri::CSS.parse` instead.

### Changed

#### `XML::Schema` input is now "untrusted" by default

Address [CVE-2020-26247](

).

In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by
`Nokogiri::XML::Schema` were **trusted** by default, allowing external
resources to be accessed over the network, potentially enabling XXE or SSRF
attacks.

This behavior is counter to the security policy intended by Nokogiri
maintainers, which is to treat all input as **untrusted** by default
whenever possible.

Please note that this security fix was pushed into a new minor version,
1.11.x, rather than a patch release to the 1.10.x branch, because it is a
breaking change for some schemas and the risk was assessed to be "Low
Severity".

More information and instructions for enabling "trusted input" behavior in
v1.11.0.rc4 and later is available at the [public advisory](

).

#### HTML parser now obeys the `strict` or `norecover` parsing option

(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode
(i.e., the `RECOVER` parse option not set) now correctly raises a
`XML::SyntaxError` exception. Previously the value of the `RECOVER` bit was
being ignored by CRuby and was misinterpreted by JRuby.

If you're using the default parser options, you will be unaffected by this
fix. If you're passing `strict` or `norecover` to your HTML parser call,
you may be surprised to see that the parser now fails to recover and raises
a `XML::SyntaxError` exception. Given the number of HTML documents on the
internet that libxml2 would consider to be ill-formed, this is probably not
what you want, and you can omit setting that parse option to restore the
behavior that you have been relying upon.

Apologies to anyone inconvenienced by this breaking bugfix being present in
a minor release, but I felt it was appropriate to introduce this fix
because it's straightforward to fix any code that has been relying on this
buggy behavior.

#### `VersionInfo`, the output of `nokogiri -v`, and related constants

This release changes the metadata provided in `Nokogiri::VersionInfo` which
also affects the output of `nokogiri -v`. Some related constants have also
been changed. If you're using `VersionInfo` programmatically, or relying on
constants related to underlying library versions, please read the detailed
changes for `Nokogiri::VersionInfo` at [#2139](
https://github.com/sparklemotion/nokogiri/issues/2139) and accept our
apologies for the inconvenience.

will you provide a plugin for parsing MS Excel doc? thanks

···

On Mon, Jan 4, 2021 at 12:49 PM Mike Dalessio <mike.dalessio@gmail.com> wrote:

Nokogiri version 1.11.0 has been released.

This is a significant release containing installation improvements,
performance improvements, bugfixes, new features, and two behavior changes
which are potentially breaking:

- one change addresses a low-severity security issue, see
Nokogiri::XML::Schema trusts input by default, exposing risk of an XXE vulnerability · Advisory · sparklemotion/nokogiri · GitHub
- one change fixes a bug where the `strict` parsing option was not
respected by the HTML parser, see
[bug] The `strict` parsing mode does not raise for invalid HTML documents · Issue #2130 · sparklemotion/nokogiri · GitHub

Full release notes are at
Release v1.11.0 / 2021-01-03 · sparklemotion/nokogiri · GitHub

Those release notes are reproduced here in markdown format for your
convenience.

-----

# Description

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from
Ruby. It provides a sensible, easy-to-understand API for reading, writing,
modifying, and querying documents. It is fast and standards-compliant by
relying on native parsers like libxml2 (C) and xerces (Java).

## v1.11.0 / 2021-01-03

### Notes

#### Faster, more reliable installation: Native Gems for Linux and
OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine
architecture. On supported platforms, this removes the need for compiling
the C extension and the packaged libraries. This results in **much faster
installation** and **more reliable installation**, which as you probably
know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in
v1.11.0 we are also shipping native gems for these platforms:

- Linux: `x86-linux` and `x86_64-linux` -- including musl platforms like
alpine
- OSX/Darwin: `x86_64-darwin` and `arm64-darwin`

We'd appreciate your thoughts and feedback on this work at [#2075](
https://github.com/sparklemotion/nokogiri/issues/2075).

### Dependencies

#### Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled
native gems.

This release ends support for:

* Ruby 2.3, for which [official support ended on 2019-03-31](
Support of Ruby 2.3 has ended)
[[#1886](Deprecate Ruby 2.3 support · Issue #1886 · sparklemotion/nokogiri · GitHub)] (Thanks
[@ashmaroli](https://github.com/ashmaroli)!)
* Ruby 2.4, for which [official support ended on 2020-04-05](
Support of Ruby 2.4 has ended
)
* JRuby 9.1, which is the Ruby 2.3-compatible release.

#### Gems

* Explicitly add racc as a runtime dependency. [[#1988](
Explicit dependency on Racc · Issue #1988 · sparklemotion/nokogiri · GitHub)] (Thanks, [@voxik](
voxik (Vít Ondruch) · GitHub)!)
* [MRI] Upgrade mini_portile2 dependency from `~> 2.4.0` to `~> 2.5.0`
[[#2005](Update mini_portile2 gem to 2.5.0 by alejandroperea · Pull Request #2005 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@alejandroperea](alejandroperea (Alejandro Perea) · GitHub)!)

### Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled
"XML::Schema parsing treats input as untrusted by default".

### Added

* Add Node methods for manipulating "keyword attributes" (for example,
`class` and `rel`): `#kwattr_values`, `#kwattr_add`, `#kwattr_append`, and
`#kwattr_remove`. [[#2000](
https://github.com/sparklemotion/nokogiri/issues/2000)]
* Add support for CSS queries `a:has(> b)`, `a:has(~ b)`, and `a:has(+
b)`. [[#688](https://github.com/sparklemotion/nokogiri/issues/688)]
(Thanks, [@jonathanhefner](https://github.com/jonathanhefner)!)
* Add `Node#value?` to better match expected semantics of a Hash-like
object. [[#1838](Enhancement: Add #key? #value? predicates to Node · Issue #1838 · sparklemotion/nokogiri · GitHub),
[#1840](add #value?, plus test by MatzFan · Pull Request #1840 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@MatzFan](MatzFan · GitHub)!)
* [CRuby] Add `Nokogiri::XML::Node#line=` for use by downstream libs like
nokogumbo. [[#1918](Add `Nokogiri::XML::Node#line=` by stevecheckoway · Pull Request #1918 · sparklemotion/nokogiri · GitHub)]
(Thanks, [@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* `nokogiri.gemspec` is back after a 10-year hiatus. We still prefer you
use the official releases, but master is pretty stable these days, and YOLO.

### Performance

* [CRuby] The CSS `~=` operator and class selector `.` are about 2x
faster. [[#2137](speed up CSS class queries by flavorjones · Pull Request #2137 · sparklemotion/nokogiri · GitHub),
[#2135]([feature] optimize CSS class queries · Issue #2135 · sparklemotion/nokogiri · GitHub)]
* [CRuby] Patch libxml2 to call `strlen` from `xmlStrlen` rather than the
naive implementation, because `strlen` is generally optimized for the
architecture. [[#2144](
Use glibc strlen to speed up xmlStrlen by ilyazub · Pull Request #2144 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@ilyazub](ilyazub (Illia) · GitHub)!)
* Improve performance of some namespace operations. [[#1916](
Reduce array allocations via ary.each_with_object by ashmaroli · Pull Request #1916 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@ashmaroli](https://github.com/ashmaroli)!)
* Remove unnecessary array allocations from Node serialization methods
[[#1911](Use private constant to test broken libxml version by ashmaroli · Pull Request #1911 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@ashmaroli](https://github.com/ashmaroli)!)
* Avoid creation of unnecessary zero-length String objects. [[#1970](
Use the same empty string literal by ashmaroli · Pull Request #1970 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@ashmaroli](https://github.com/ashmaroli)!)
* Always compile libxml2 and libxslt with '-O2' [[#2022](
libxml2 should be compiled with "-O2" · Issue #2022 · sparklemotion/nokogiri · GitHub), [#2100](
Compile native extensions with '-O2 -g' flags by ilyazub · Pull Request #2100 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@ilyazub](ilyazub (Illia) · GitHub)!)
* [JRuby] Lots of code cleanup and performance improvements. [[#1934](
[jruby] cleanup/refactor extension code by kares · Pull Request #1934 · sparklemotion/nokogiri · GitHub)] (Thanks, [@kares](
kares (Karol Bucek) · GitHub)!)
* [CRuby] `RelaxNG.from_document` no longer leaks memory. [[#2114](
[bug] RelaxNG schema parsing has a small memory leak · Issue #2114 · sparklemotion/nokogiri · GitHub)]

### Improved

* [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for
browsers. [[#2058](Follow WHATWG guidance for incorrectly-closed HTML comments by flavorjones · Pull Request #2058 · sparklemotion/nokogiri · GitHub)]
(Thanks to HackerOne user [mayflower](
HackerOne) for reporting this!)
* {HTML,XML}::Document#parse now accept `Pathname` objects. Previously
this worked only if the referenced file was less than 4096 bytes long;
longer files resulted in undefined behavior because the `read` method would
be repeatedly invoked. [[#1821](
https://github.com/sparklemotion/nokogiri/issues/1821), [#2110](
Pathname objects are treated as IO and corrupt input on parse · Issue #2110 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@doriantaylor](doriantaylor (Dorian Taylor) · GitHub) and [@phokz](
phokz (Josef Liška) · GitHub)!)
* [CRuby] Nokogumbo builds faster because it can now use header files
provided by Nokogiri. [[#1788](
Nokogumbo should be able to compile against Nokogiri by finding headers in extensions directory by stevecheckoway · Pull Request #1788 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* Add `frozen_string_literal: true` magic comment to all `lib` files.
[[#1745](Add frozen_string_literal: true to all files by dillonwelch · Pull Request #1745 · sparklemotion/nokogiri · GitHub)] (Thanks,
[@oniofchaos](https://github.com/oniofchaos)!)
* [JRuby] Clean up deprecated calls into JRuby. [[#2027](
https://github.com/sparklemotion/nokogiri/issues/2027)] (Thanks,
[@headius](headius (Charles Oliver Nutter) · GitHub)!)

### Fixed

* HTML Parsing in "strict" mode (i.e., the `RECOVER` parse option not set)
now correctly raises a `XML::SyntaxError` exception. Previously the value
of the `RECOVER` bit was being ignored by CRuby and was misinterpreted by
JRuby. [[#2130]([bug] The `strict` parsing mode does not raise for invalid HTML documents · Issue #2130 · sparklemotion/nokogiri · GitHub)]
* The CSS `~=` operator now correctly handles non-space whitespace in the
`class` attribute. commit e45dedd
* The switch to turn off the CSS-to-XPath cache is now thread-local,
rather than being shared mutable state. [[#1935](
Race condition in Nokogiri::CSS::Parser · Issue #1935 · sparklemotion/nokogiri · GitHub)]
* The Node methods `add_previous_sibling`, `previous=`, `before`,
`add_next_sibling`, `next=`, `after`, `replace`, and `swap` now correctly
use their parent as the context node for parsing markup. These methods now
also raise a `RuntimeError` if they are called on a node with no parent.
[[nokogumbo#160](https://github.com/rubys/nokogumbo/issues/160)]
* [JRuby] XML::Schema XSD validation errors are captured in
`XML::Schema#errors`. These errors were previously ignored.
* [JRuby] Standardize reading from IO like objects, including StringIO.
[[#1888](https://github.com/sparklemotion/nokogiri/issues/1888), [#1897](
https://github.com/sparklemotion/nokogiri/issues/1897)]
* [JRuby] Fix how custom XPath function namespaces are inferred to be less
naive. [[#1890](https://github.com/sparklemotion/nokogiri/issues/1890),
[#2148](https://github.com/sparklemotion/nokogiri/issues/2148)]
* [JRuby] Clarify exception message when custom XPath functions can't be
resolved.
* [JRuby] Comparison of Node to Document with `Node#<=>` now matches
CRuby/libxml2 behavior.
* [CRuby] Syntax errors are now correctly captured in `Document#errors`
for short HTML documents. Previously the SAX parser used for encoding
detection was clobbering libxml2's global error handler.
* [CRuby] Fixed installation on AIX with respect to `vasprintf`. [[#1908](
https://github.com/sparklemotion/nokogiri/issues/1908)]
* [CRuby] On some platforms, avoid symbol name collision with glibc's
`canonicalize`. [[#2105](
https://github.com/sparklemotion/nokogiri/issues/2105)]
* [Windows Visual C++] Fixed compiler warnings and errors. [[#2061](
https://github.com/sparklemotion/nokogiri/issues/2061), [#2068](
https://github.com/sparklemotion/nokogiri/issues/2068)]
* [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release
candidates. [[#1788](Nokogumbo should be able to compile against Nokogiri by finding headers in extensions directory by stevecheckoway · Pull Request #1788 · sparklemotion/nokogiri · GitHub)]
(Thanks, [@stevecheckoway](stevecheckoway (Stephen Checkoway) · GitHub)!)
* [JRuby] Fixed document encoding regression in v1.11.0 release
candidates. [[#2080](https://github.com/sparklemotion/nokogiri/issues/2080),
[#2083](https://github.com/sparklemotion/nokogiri/issues/2083)] (Thanks,
[@thbar](https://github.com/thbar)!)

### Removed

* The internal method `Nokogiri::CSS::Parser.cache_on=` has been removed.
Use `.set_cache` if you need to muck with the cache internals.
* The class method `Nokogiri::CSS::Parser.parse` has been removed. This
was originally deprecated in 2009 in 13db61b. Use `Nokogiri::CSS.parse`
instead.

### Changed

#### `XML::Schema` input is now "untrusted" by default

Address [CVE-2020-26247](
Nokogiri::XML::Schema trusts input by default, exposing risk of an XXE vulnerability · Advisory · sparklemotion/nokogiri · GitHub
).

In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by
`Nokogiri::XML::Schema` were **trusted** by default, allowing external
resources to be accessed over the network, potentially enabling XXE or SSRF
attacks.

This behavior is counter to the security policy intended by Nokogiri
maintainers, which is to treat all input as **untrusted** by default
whenever possible.

Please note that this security fix was pushed into a new minor version,
1.11.x, rather than a patch release to the 1.10.x branch, because it is a
breaking change for some schemas and the risk was assessed to be "Low
Severity".

More information and instructions for enabling "trusted input" behavior in
v1.11.0.rc4 and later is available at the [public advisory](
Nokogiri::XML::Schema trusts input by default, exposing risk of an XXE vulnerability · Advisory · sparklemotion/nokogiri · GitHub
).

#### HTML parser now obeys the `strict` or `norecover` parsing option

(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode
(i.e., the `RECOVER` parse option not set) now correctly raises a
`XML::SyntaxError` exception. Previously the value of the `RECOVER` bit was
being ignored by CRuby and was misinterpreted by JRuby.

If you're using the default parser options, you will be unaffected by this
fix. If you're passing `strict` or `norecover` to your HTML parser call,
you may be surprised to see that the parser now fails to recover and raises
a `XML::SyntaxError` exception. Given the number of HTML documents on the
internet that libxml2 would consider to be ill-formed, this is probably not
what you want, and you can omit setting that parse option to restore the
behavior that you have been relying upon.

Apologies to anyone inconvenienced by this breaking bugfix being present
in a minor release, but I felt it was appropriate to introduce this fix
because it's straightforward to fix any code that has been relying on this
buggy behavior.

#### `VersionInfo`, the output of `nokogiri -v`, and related constants

This release changes the metadata provided in `Nokogiri::VersionInfo`
which also affects the output of `nokogiri -v`. Some related constants have
also been changed. If you're using `VersionInfo` programmatically, or
relying on constants related to underlying library versions, please read
the detailed changes for `Nokogiri::VersionInfo` at [#2139](
https://github.com/sparklemotion/nokogiri/issues/2139) and accept our
apologies for the inconvenience.

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;