Danish Characters in Ruby Uri

Hi,

There would appear to be a deficiency in the common.rb
URI module of uri when processing non-standard
characters

e.g.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad
URI(is not URI?): http://
fakebase/twiki/bin/view/Main/Østermark
(URI::InvalidURIError)
from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
from c:/ruby/lib/ruby/1.8/open-uri.rb:85:in `open'
from testbed.rb:15

I have chcped till I am blue in the face, gsubbed the
string, etc, etc, but the uri parser won't have some
Danish characters in any shape or form. Anyone have
any idea how I can persuade uri to take them, or some
other viable method of getting a page back with Danish
chars in it, or am I going to have to hack my local
copy of common.rb apart to make this happen?

rgds

Steve Callaway

···

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Steve Callaway wrote:

There would appear to be a deficiency in the common.rb
URI module of uri when processing non-standard
characters

[snip]

Hmm... what is your $KCODE set to, and are you
using the jcode lib?

I'm not certain uri honors those, but I would
certainly assume so...

Or maybe it's an issue where certain chars need
to be escaped...

Sorry, I'm not being very helpful, am I?

Hal

Steve Callaway wrote:

Hi,

There would appear to be a deficiency in the common.rb
URI module of uri when processing non-standard
characters

e.g.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad
URI(is not URI?): http://
fakebase/twiki/bin/view/Main/Østermark
(URI::InvalidURIError)
from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
from c:/ruby/lib/ruby/1.8/open-uri.rb:85:in `open'
from testbed.rb:15

I have chcped till I am blue in the face, gsubbed the
string, etc, etc, but the uri parser won't have some
Danish characters in any shape or form. Anyone have
any idea how I can persuade uri to take them, or some
other viable method of getting a page back with Danish
chars in it, or am I going to have to hack my local
copy of common.rb apart to make this happen?

Just apply URI.encode first. That will change the URI to http://fakebase/twiki/bin/view/Main/Østermark (or something different if you are not using UTF-8), which is valid.

Good luck.

Hi Hal,

Thanks for your swift reply, don't feel $KCODE is the
issue but will play with it on the offchance, been
down the escape road though with no success. My
reading of the situation is the regexp parser in
common.rb can't see the chars and throws the URL out
on a (misguided) safety first basis.

rgds

Steve

···

--- Hal Fulton <hal9000@hypermetrics.com> wrote:

Steve Callaway wrote:
>
> There would appear to be a deficiency in the
common.rb
> URI module of uri when processing non-standard
> characters

[snip]

Hmm... what is your $KCODE set to, and are you
using the jcode lib?

I'm not certain uri honors those, but I would
certainly assume so...

Or maybe it's an issue where certain chars need
to be escaped...

Sorry, I'm not being very helpful, am I?

Hal

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

ah cool, thanks very much, that looks a much sweeter
fix :slight_smile:

rgds

Steve

···

--- Carlos <angus@quovadis.com.ar> wrote:

Steve Callaway wrote:

> Hi,
>
> There would appear to be a deficiency in the
common.rb
> URI module of uri when processing non-standard
> characters
>
> e.g.
>
> c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split':
bad
> URI(is not URI?): http://
> fakebase/twiki/bin/view/Main/Østermark
> (URI::InvalidURIError)
> from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in
`parse'
> from c:/ruby/lib/ruby/1.8/open-uri.rb:85:in `open'
> from testbed.rb:15
>
> I have chcped till I am blue in the face, gsubbed
the
> string, etc, etc, but the uri parser won't have
some
> Danish characters in any shape or form. Anyone
have
> any idea how I can persuade uri to take them, or
some
> other viable method of getting a page back with
Danish
> chars in it, or am I going to have to hack my
local
> copy of common.rb apart to make this happen?

Just apply URI.encode first. That will change the
URI to
http://fakebase/twiki/bin/view/Main/Østermark
(or something
different if you are not using UTF-8), which is
valid.

Good luck.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

I have a workaround now (which will be doing for the
moment) which involves a Unicode gsub of the string
for DK characters e.g

str.gsub!(/Ø/, '&#197')
str.gsub!(/Å/, '&#197')

etc...

Fortunately uri seems to be happy with this.

···

--- Steve Callaway <sjc2000_uk@yahoo.com> wrote:

Hi Hal,

Thanks for your swift reply, don't feel $KCODE is
the
issue but will play with it on the offchance, been
down the escape road though with no success. My
reading of the situation is the regexp parser in
common.rb can't see the chars and throws the URL out
on a (misguided) safety first basis.

rgds

Steve

--- Hal Fulton <hal9000@hypermetrics.com> wrote:

> Steve Callaway wrote:
> >
> > There would appear to be a deficiency in the
> common.rb
> > URI module of uri when processing non-standard
> > characters
>
> [snip]
>
> Hmm... what is your $KCODE set to, and are you
> using the jcode lib?
>
> I'm not certain uri honors those, but I would
> certainly assume so...
>
> Or maybe it's an issue where certain chars need
> to be escaped...
>
> Sorry, I'm not being very helpful, am I?
>
>
> Hal
>
>

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam
protection around
http://mail.yahoo.com

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

It's not so much a fix as the right way to do it. 'Ø' isn't an allowed
character in URLs:

http://www.ietf.org/rfc/rfc1738.txt

'Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.'

Paul.

···

On 07/07/06, Steve Callaway <sjc2000_uk@yahoo.com> wrote:

ah cool, thanks very much, that looks a much sweeter
fix :slight_smile:

While this doesn't quite apply to the original question, does Ruby perhaps need to add support for IDNA?

matthew smillie.

···

On Jul 7, 2006, at 9:15, Paul Battley wrote:

On 07/07/06, Steve Callaway <sjc2000_uk@yahoo.com> wrote:

ah cool, thanks very much, that looks a much sweeter
fix :slight_smile:

It's not so much a fix as the right way to do it. 'Ø' isn't an allowed
character in URLs:

http://www.ietf.org/rfc/rfc1738.txt

'Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.'

They may not necessarily be 'legal' nevertheless they
will turn up from time to time, particularly in
camelcased wiki & twiki links in Denmark, which is why
I need to be able to get them :slight_smile: But you're right, it
isn't a fix, it is indeed the correct way of doing it,
but I was so happy to be able to kiss the problem
goodbye that I allowed a little terminological
inexactitude to creep in in my enthusiasm for the
resolution :slight_smile:

rgds

Steve

···

--- Paul Battley <pbattley@gmail.com> wrote:

On 07/07/06, Steve Callaway <sjc2000_uk@yahoo.com> > wrote:
> ah cool, thanks very much, that looks a much
sweeter
> fix :slight_smile:

It's not so much a fix as the right way to do it.
'Ø' isn't an allowed
character in URLs:

http://www.ietf.org/rfc/rfc1738.txt

'Thus, only alphanumerics, the special characters
"$-_.+!*'(),", and
reserved characters used for their reserved purposes
may be used
unencoded within a URL.'

Paul.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

A couple of relevant libraries already exist, so it should be a
straightforward task.

http://sourceforge.jp/projects/ruexli/
http://idn.rubyforge.org/

Paul.

···

On 07/07/06, Matthew Smillie <M.B.Smillie@sms.ed.ac.uk> wrote:

While this doesn't quite apply to the original question, does Ruby
perhaps need to add support for IDNA?