Bug in URI.parse?

Hello all,

looking for the right place to post a bug, hope this is the right mailing
list :slight_smile:

irb(main):003:0> URI.parse "
http://london_airport_transfers.hotels-london.co.uk/transfer.php"
URI::InvalidURIError: the scheme http does not accept registry part:
london_airport_transfers.hotels-london.co.uk <http://london.co.uk> (or bad
hostname?)
from /usr/local/lib/ruby/1.8/uri/generic.rb:194:in `initialize'
from /usr/local/lib/ruby/1.8/uri/http.rb:46:in `initialize'
from /usr/local/lib/ruby/1.8/uri/common.rb:484:in `parse'
from (irb):3

The address is valid, check yourself. I started looking at the code, but
cannot figure it out.

Please help!

Cheers

Hagen

An _ is forbidden by RFC952.

If you still don't believe me, go try to register a domain with an _.

路路路

On Nov 7, 2005, at 2:29 PM, <sixtus@gmail.com> <sixtus@gmail.com> wrote:

looking for the right place to post a bug, hope this is the right mailing
list :slight_smile:

irb(main):003:0> URI.parse "
http://london_airport_transfers.hotels-london.co.uk/transfer.php"
URI::InvalidURIError: the scheme http does not accept registry part:
london_airport_transfers.hotels-london.co.uk <http://london.co.uk> (or bad
hostname?)
from /usr/local/lib/ruby/1.8/uri/generic.rb:194:in `initialize'
from /usr/local/lib/ruby/1.8/uri/http.rb:46:in `initialize'
from /usr/local/lib/ruby/1.8/uri/common.rb:484:in `parse'
from (irb):3

The address is valid, check yourself. I started looking at the code, but
cannot figure it out.

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

URI.parse "http://london_airport_transfers.hotels-london.co.uk/transfer.php"
URI::InvalidURIError: the scheme http does not accept registry part:

It's an invalid URI - the "_" character is not permitted, only hyphens, numbers and letters. See RFC 1035

http://www.freesoft.org/CIE/RFC/1035/6.htm

The address is valid, check yourself.

It might be able to be created within the domain admin tool that you are using to create the subdomain, and it might work with some client software, but it will cause lots of problems for others. You should change alter the name to use hyphens or Ruby will only be the first of many problems :wink:

a

> irb(main):003:0> URI.parse "
> http://london_airport_transfers.hotels-london.co.uk/transfer.php"
> URI::InvalidURIError: the scheme http does not accept registry part:
> london_airport_transfers.hotels-london.co.uk <http://london.co.uk> <
http://london.co.uk>

An _ is forbidden by RFC952.

Okay, so the address is not valid. But Firefox resolves it and there is a
webserver answering. I do want to access the URI from Ruby.

Any help on how to overload the URI class to accept URIs with _ greatly
appreciated.

Cheers

Hagen

URI.parse "http://london_airport_transfers.hotels-london.co.uk/transfer.php"
URI::InvalidURIError: the scheme http does not accept registry part:

Interestingly, the site also seems to respond to:

http://london-airport-transfers.hotels-london.co.uk/transfer.php

...with dashes. (Seems to bring up the same page.)

Regards,

Bill

It might be able to be created within the domain admin tool that you are
using to create the subdomain, and it might work with some client software,
but it will cause lots of problems for others. You should change alter the
name to use hyphens or Ruby will only be the first of many problems :wink:

I'm just writing a crawler and need to worry about Ruby knowing the RFC and
throwing an exception. What ever happend to Postel's Law?

Cheers

Hagen

It also responds to http://super-chunky-bacon.hotels-london.co.uk/transfer.php

The magic of wildcard DNS.

路路路

On 11/7/05, Bill Kelly <billk@cts.com> wrote:

>> URI.parse "http://london_airport_transfers.hotels-london.co.uk/transfer.php"
>> URI::InvalidURIError: the scheme http does not accept registry part:

Interestingly, the site also seems to respond to:

http://london-airport-transfers.hotels-london.co.uk/transfer.php

...with dashes. (Seems to bring up the same page.)

Regards,

Bill

URI follows it perfectly.

It is your seatbelt on the sending side by not allowing you to send bad DNS packets around the internet.

It is your seatbelt on the receiving side by giving you a decent message when it encounters bad data allowing you to handle it gracefully rather than breaking.

路路路

On Nov 7, 2005, at 3:34 PM, <sixtus@gmail.com> <sixtus@gmail.com> wrote:

It might be able to be created within the domain admin tool that you are
using to create the subdomain, and it might work with some client software,
but it will cause lots of problems for others. You should change alter the
name to use hyphens or Ruby will only be the first of many problems :wink:

I'm just writing a crawler and need to worry about Ruby knowing the RFC and
throwing an exception. What ever happend to Postel's Law?

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

It is your seatbelt on the receiving side by giving you a decent
message when it encounters bad data allowing you to handle it
gracefully rather than breaking.

I consider throwing an exception as breaking. As I said, Firefox just
renders the page, Ruby breaks. I'm willing to just overload URI with my
expected behavior, if someone gives me a hint on where to look (I started
and didn't see anything obvious).

Cheers

Hagen

Don't overload. If you get an exception, rescue and see if the URI
contains _ -- which are illegal. Print a warning and silently convert
_ to - and retry.

-austin

路路路

On 11/7/05, sixtus@gmail.com <sixtus@gmail.com> wrote:

> It is your seatbelt on the receiving side by giving you a decent
> message when it encounters bad data allowing you to handle it
> gracefully rather than breaking.
I consider throwing an exception as breaking. As I said, Firefox just
renders the page, Ruby breaks. I'm willing to just overload URI with my
expected behavior, if someone gives me a hint on where to look (I started
and didn't see anything obvious).

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca