URI bug or bad URI?

Have a proof-of-concept application which parses drop texts. One of the
drop texts that I’m processing is a mailto uri. Ruby’s URI library dies
when I try to parse the following uri.

mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

Are ‘@’ symbols allowed in uri’s in this context or do they need to be
escaped?

Here’s the backtrace:
/usr/lib/ruby/1.6/uri/common.rb:230:in unescape': private method gsub’ called for nil (NameError)
from /usr/lib/ruby/1.6/uri/mailto.rb:184:in `to_mailtext’
from /home/alan/bin/mailto:18

I’m using the following ruby version:
ruby 1.6.7 (2002-03-19) [i386-linux]

···


Alan Chen
Digikata LLC

There’s no “?subject” in mailto:

···

Alan Chen alan@digikata.com wrote:

mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

In [ruby-talk : No.53493]
mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

Are ‘@’ symbols allowed in uri’s in this context or do they need to be
escaped?

‘@’ is OK. But ‘=’ cannot be put in hvalue. I think that ‘subject:
test (@test=10)’ should be written as ‘subject=test%20(@test%3D10)’ in
URI.

/usr/lib/ruby/1.6/uri/common.rb:230:in unescape': private method gsub’ called for nil (NameError)
from /usr/lib/ruby/1.6/uri/mailto.rb:184:in `to_mailtext’

It is bug of uri/mailto.rb. ‘URI.parse’ should raise
InvalidComponentError for the URI.

···

Alan Chen alan@digikata.com wrote:


akira yamada

Hello,

···

Stefan Scholl stefan.scholl@brave.de wrote:

Alan Chen alan@digikata.com wrote:

mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

There’s no “?subject” in mailto:

Of course there is. You can add a querystring. But brackets are not
allowed in URIs, neither are @s.

Greetings,
CK

RFC2368: ``Note that all URL reserved characters in "to" must be encoded:
  in particular, parentheses, commas, and the percent sign ("%"), which
  commonly occur in the "mailbox" syntax.

  "hname" and "hvalue" are encodings of an RFC 822 header name and
  value, respectively. As with "to", all URL reserved characters must
  be encoded.''

But that conflicts with RFC1738 which explicitly allows parentheses and
commas as unreserved characters which can always be used unencoded in a URL.
Grr.

Anyway, if you want to err on the safe side, you might choose

subject=test%20%28@test%3D10%29

···

On Sun, Oct 20, 2002 at 08:24:07PM +0900, akira yamada wrote:

>>>>> In [ruby-talk : No.53493]
>>>>> Alan Chen <alan@digikata.com> wrote:
> mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)
>
> Are '@' symbols allowed in uri's in this context or do they need to be
> escaped?

'@' is OK. But '=' cannot be put in hvalue. I think that 'subject:
test (@test=10)' should be written as 'subject=test%20(@test%3D10)' in
URI.

mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

There’s no “?subject” in mailto:

Of course there is. You can add a querystring.

No querystring with mailto scheme.

RFC 1738
3.5. MAILTO

The mailto URL scheme is used to designate the Internet mailing
address of an individual or service. No additional information other
than an Internet mailing address is present or implied.

A mailto URL takes the form:

    mailto:<rfc822-addr-spec>

where is (the encoding of an) addr-spec, as
specified in RFC 822 [6]. Within mailto URLs, there are no reserved
characters.

Note that the percent sign (“%”) is commonly used within RFC 822
addresses and must be encoded.

Unlike many URLs, the mailto scheme does not represent a data object
to be accessed directly; there is no sense in which it designates an
object. It has a different use than the message/external-body type in
MIME.

[…]

; MAILTO (see also RFC822)

mailtourl = “mailto:” encoded822addr
encoded822addr = 1*xchar ; further defined in RFC822

But brackets are not
allowed in URIs, neither are @s.

No, “(”, “)”, and “@” aren’t unsafe.

RFC1738:
Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters “<” and “>” are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (“”") is used to
delimit URLs in some systems. The character “#” is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character “%” is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are “{”, “}”, “|”, "", “^”, “~”,
“[”, “]”, and “`”.

···

Christian Kruse ckruse@wwwtech.de wrote:

Stefan Scholl stefan.scholl@brave.de wrote:

Alan Chen alan@digikata.com wrote:

Neither of these statements is true.

Specifically for mailboxes:
[From RFC2368, RFC 2368 - The mailto URL scheme (RFC2368)]

Following the syntax conventions of RFC 1738 [RFC1738], a “mailto”
URL has the form:

 mailtoURL  =  "mailto:" [ to ] [ headers ]
 to         =  #mailbox
 headers    =  "?" header *( "&" header )
 header     =  hname "=" hvalue
 hname      =  *urlc
 hvalue     =  *urlc

#mailbox” is as specified in RFC 822 [RFC822]. This means that it
consists of zero or more comma-separated mail addresses, possibly
including “phrase” and “comment” components. Note that all URL
reserved characters in “to” must be encoded: in particular,
parentheses, commas, and the percent sign (“%”), which commonly
occur
in the “mailbox” syntax.

“hname” and “hvalue” are encodings of an RFC 822 header name and
value, respectively. As with “to”, all URL reserved characters
must
be encoded.

The special hname “body” indicates that the associated hvalue is
the
body of the message. The “body” hname should contain the content
for
the first text/plain body part of the message. The mailto URL is
primarily intended for generation of short text messages that are
actually the content of automatic processing (such as “subscribe”
messages for mailing lists), not general MIME bodies.

Within mailto URLs, the characters “?”, “=”, “&” are reserved.

Because the “&” (ampersand) character is reserved in HTML, any
mailto
URL which contains an ampersand must be spelled differently in
HTML
than in other contexts. A mailto URL which appears in an HTML
document must use “&” instead of “&”.

Also note that it is legal to specify both “to” and an “hname”
whose
value is “to”. That is,

 mailto:addr1%2C%20addr2

 is equivalent to

 mailto:?to=addr1%2C%20addr2

 is equivalent to

 mailto:addr1?to=addr2

8-bit characters in mailto URLs are forbidden. MIME encoded words
(as
defined in [RFC2047]) are permitted in header values, but not for
any
part of a “body” hname.

For URIs in general:
[From RFC1738, RFC 1738 - Uniform Resource Locators (URL) (RFC1738)]

Many URL schemes reserve certain characters for a special
meaning: their appearance in the scheme-specific part of the URL
has a designated semantics. If the character corresponding to an
octet is reserved in a scheme, the octet must be encoded.  The
characters ";", "/", "?", ":", "@", "=" and "&" are the
characters which may be reserved for special meaning within a
scheme. No other characters may be reserved within a scheme.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.10.18 at 10.03.34

···

On Fri, 18 Oct 2002 17:13:56 +0900, Christian Kruse wrote:

Stefan Scholl stefan.scholl@brave.de wrote:

Alan Chen alan@digikata.com wrote:

mailto = URI.new(
'mailto:user@domain.com?subject=test%20(@test=10)
There’s no “?subject” in mailto:
Of course there is. You can add a querystring. But brackets are
not allowed in URIs, neither are @s.

I really wan’t choosing this uri encoding, it was something that my
application encountered “in the wild” of the internet. I guess I can
write some sort of pre-filter for the URI or add some code to extend
URI.parse. Maybe I’ll modify it for URI.parse(uristring, parsetype)
where parsetype could be :rfc2368, :rfc1738, :rfc2368_lax or similiar.

···

On Mon, Oct 21, 2002 at 08:05:46PM +0900, Brian Candler wrote:

On Sun, Oct 20, 2002 at 08:24:07PM +0900, akira yamada wrote:

In [ruby-talk : No.53493]
Alan Chen alan@digikata.com wrote:
mailto = URI.new( 'mailto:user@domain.com?subject=test%20(@test=10)

Are ‘@’ symbols allowed in uri’s in this context or do they need to be
escaped?

‘@’ is OK. But ‘=’ cannot be put in hvalue. I think that ‘subject:
test (@test=10)’ should be written as ‘subject=test%20(@test%3D10)’ in
URI.

RFC2368: ``Note that all URL reserved characters in “to” must be encoded:
in particular, parentheses, commas, and the percent sign (“%”), which
commonly occur in the “mailbox” syntax.

“hname” and “hvalue” are encodings of an RFC 822 header name and
value, respectively. As with “to”, all URL reserved characters must
be encoded.‘’

But that conflicts with RFC1738 which explicitly allows parentheses and
commas as unreserved characters which can always be used unencoded in a URL.
Grr.

Anyway, if you want to err on the safe side, you might choose

subject=test%20%28@test%3D10%29


Alan Chen
Digikata LLC
http://digikata.com

RFC1738 has been superceded by RFC2368 for mailto.

If URI has been implemented to RFC1738, then it’s broken.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.10.18 at 10.16.40

···

On Fri, 18 Oct 2002 21:08:45 +0900, Stefan Scholl wrote:

Christian Kruse ckruse@wwwtech.de wrote:

Stefan Scholl stefan.scholl@brave.de wrote:

Alan Chen alan@digikata.com wrote:

mailto = URI.new(
'mailto:user@domain.com?subject=test%20(@test=10)
There’s no “?subject” in mailto:
Of course there is. You can add a querystring.
No querystring with mailto scheme.
RFC 1738