[YAML] To quote or not to quote (I think it's a bug)

Hi,

I'm not sure this is the right place to report issues about Syck but that's
the best I've found so far :slight_smile: I think I've found a problem with the way Syck
quotes strings that could look like floats. Here is a short example:

irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> YAML.dump(["1.2", "1.2.3", "1.2_3"])
=> "--- \n- \"1.2\"\n- 1.2.3\n- 1.2_3\n"

The first element is quoted appropriately, the second isn't because there's
no ambiguity but the third should be quoted. The YAML float can have
underscores, they're used as visual separators and should be ignored by the
parser. So when seeing 1.2_3 the parser should read the float 1.23. Then to
disambiguates the string "1.2_3" should be quoted.

Practically this isn't really a problem as YAML converts on writes so
YAML.dump(1.2_3) gets directly written as 1.23. However it creates some
interoperability issues when "1.2_3" gest written unquoted by Syck and then
read by another parser as being 1.23. The string isn't a string anymore in
that case.

What do you think?

Thanks,
Matthieu Riou

Are you certain that 1.2_3 is a valid YAML float?

http://yaml.org/type/float.html

I do not believe underscores are valid in either YAML floats or YAML
integers. Although they are allowed in Ruby, I believe Syck is
handling your example case properly and without ambiguity.

Blessings,
TwP

···

On 2/19/07, Matthieu Riou <matthieu.riou@gmail.com> wrote:

Hi,

I'm not sure this is the right place to report issues about Syck but that's
the best I've found so far :slight_smile: I think I've found a problem with the way Syck
quotes strings that could look like floats. Here is a short example:

irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> YAML.dump(["1.2", "1.2.3", "1.2_3"])
=> "--- \n- \"1.2\"\n- 1.2.3\n- 1.2_3\n"

The first element is quoted appropriately, the second isn't because there's
no ambiguity but the third should be quoted. The YAML float can have
underscores, they're used as visual separators and should be ignored by the
parser. So when seeing 1.2_3 the parser should read the float 1.23. Then to
disambiguates the string "1.2_3" should be quoted.

Practically this isn't really a problem as YAML converts on writes so
YAML.dump(1.2_3) gets directly written as 1.23. However it creates some
interoperability issues when "1.2_3" gest written unquoted by Syck and then
read by another parser as being 1.23. The string isn't a string anymore in
that case.

Only in YAML 1.1. Syck only speaks YAML 1.0. Unfortunately, the
types repository for 1.0 is no longer at yaml.org.

See http://web.archive.org/web/20030815095109/yaml.org/type/float/
for the floats which Syck should support.

_why

···

On Tue, Feb 20, 2007 at 10:03:25AM +0900, Matthieu Riou wrote:

The first element is quoted appropriately, the second isn't because there's
no ambiguity but the third should be quoted. The YAML float can have
underscores, they're used as visual separators and should be ignored by the
parser.

Hi Tim,

In that page:

"Any "*_*" characters in the number are ignored, allowing a readable
representation of large values."

So clearly they're allowed, in 1.2_3 the underscore is simply ignored and
the parser should undestand 1.23. If I fancy to write my own YAML by hand
and make it easily readable (which is kind of the original purpose) I could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with "1.2_3".

Cheers,
Matthieu

···

On 2/19/07, Tim Pease <tim.pease@gmail.com> wrote:

On 2/19/07, Matthieu Riou <matthieu.riou@gmail.com> wrote:
> Hi,
>
> I'm not sure this is the right place to report issues about Syck but
that's
> the best I've found so far :slight_smile: I think I've found a problem with the way
Syck
> quotes strings that could look like floats. Here is a short example:
>
> irb(main):001:0> require 'yaml'
> => true
> irb(main):002:0> YAML.dump(["1.2", "1.2.3", "1.2_3"])
> => "--- \n- \"1.2\"\n- 1.2.3\n- 1.2_3\n"
>
> The first element is quoted appropriately, the second isn't because
there's
> no ambiguity but the third should be quoted. The YAML float can have
> underscores, they're used as visual separators and should be ignored by
the
> parser. So when seeing 1.2_3 the parser should read the float 1.23. Then
to
> disambiguates the string "1.2_3" should be quoted.
>
> Practically this isn't really a problem as YAML converts on writes so
> YAML.dump(1.2_3) gets directly written as 1.23. However it creates some
> interoperability issues when "1.2_3" gest written unquoted by Syck and
then
> read by another parser as being 1.23. The string isn't a string anymore
in
> that case.
>

Are you certain that 1.2_3 is a valid YAML float?

http://yaml.org/type/float.html

I do not believe underscores are valid in either YAML floats or YAML
integers. Although they are allowed in Ruby, I believe Syck is
handling your example case properly and without ambiguity.

Blessings,
TwP

"Any "*_*" characters in the number are ignored, allowing a readable
representation of large values."

So clearly they're allowed, in 1.2_3 the underscore is simply
ignored and
the parser should undestand 1.23. If I fancy to write my own
YAML by hand
and make it easily readable (which is kind of the original
purpose) I could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with "1.2_3".

But the specification is not just in english, it's also in regex form:

[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)? (base 10)

[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+\.[0-9_]* (base 60)
[-+]?\.(inf|Inf|INF) # (infinity)
\.(nan|NaN|NAN) # (not a number)

For a base 10 number, underscores are permitted only if they appear
BEFORE the decimal point, so for 1.2_3 there is no ambiguity.

Dan.

So clearly they're allowed, in 1.2_3 the underscore is simply
ignored and
the parser should undestand 1.23. If I fancy to write my own
YAML by hand
and make it easily readable (which is kind of the original
purpose) I could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with "1.2_3"

Hmmmm... I just noticed that the specification
(http://yaml.org/type/float.html\) isn't just unclear, it's inconsistent.

The examples listed include:

exponentioal: 685.230_15e+03

Notwithstanding the spelling of exponential, that example is not
consistent with the regular expression (which allows underscores only if
they preceed the decimal point -
[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)?).

Dan.

This is horrible, IMHO. It has to be fixed. I’m very sorry for the
late response.

Clark

···

On Tue, Feb 20, 2007 at 04:17:20PM +1100, Daniel Sheppard wrote:

So clearly they’re allowed, in 1.2_3 the underscore is simply
ignored and
the parser should undestand 1.23. If I fancy to write my own
YAML by hand
and make it easily readable (which is kind of the original
purpose) I could
write 100_000_000.03 which would be a nice looking float. Hence the
ambiguity with “1.2_3”

Hmmmm… I just noticed that the specification
(http://yaml.org/type/float.html) isn’t just unclear, it’s inconsistent.

The examples listed include:

exponentioal: 685.230_15e+03

Notwithstanding the spelling of exponential, that example is not
consistent with the regular expression (which allows underscores only if
they preceed the decimal point -
[-+]?([0-9][0-9_])?.[0-9.]([eE][-+][0-9]+)?).

Dan.


Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net’s Techsay panel and you’ll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV


Yaml-core mailing list
Yaml-core-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
yaml-core List Signup and Options


Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net’s Techsay panel and you’ll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV