Open-uri bug

Steve_H · 26 February 2008 19:50

Hello all, I'm using open-uri combined with hpricot to make a basic
web crawler that scrapes for different links that I need. It seems to
be working perfectly, but I have encountered the following bug when
this type of link is encountered:

irb(main):015:0> URI.parse('http://hello.com/a.php?%1')
URI::InvalidURIError: bad URI(is not URI?): http://hello.com/a.php?%1
        from c:/ruby/lib/ruby/1.8/uri/common.rb:436:in `split'
        from c:/ruby/lib/ruby/1.8/uri/common.rb:485:in `parse'
        from (irb):15

Can anyone illuminate why this is a problem? Thanks!

Rob_Biedenharn1 · 26 February 2008 20:20

Probably because %1 looks like a partially escaped character. Try:

?%251
Where %25 is an escaped %

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Feb 26, 2008, at 2:50 PM, Steve H. wrote:

Hello all, I'm using open-uri combined with hpricot to make a basic
web crawler that scrapes for different links that I need. It seems to
be working perfectly, but I have encountered the following bug when
this type of link is encountered:

irb(main):015:0> URI.parse('http://hello.com/a.php?%1'\)
URI::InvalidURIError: bad URI(is not URI?): http://hello.com/a.php?%1
       from c:/ruby/lib/ruby/1.8/uri/common.rb:436:in `split'
       from c:/ruby/lib/ruby/1.8/uri/common.rb:485:in `parse'
       from (irb):15

Can anyone illuminate why this is a problem? Thanks!

Steve_H · 26 February 2008 21:20

I appreciate the reply. This is a bit unfortunate, I am developing a
tool which has to handle URIs the same way the browser does. While I
realize that is not a "correct" URI, the browser still fetches the
pages without a problem. In some sense, I wish I could mirror the
functionality of the browser fetch using the URI module. Anyhow, thank
you for your help!

···

On Feb 26, 12:20 pm, Rob Biedenharn <R...@AgileConsultingLLC.com> wrote:

Probably because %1 looks like a partially escaped character. Try:

?%251
Where %25 is an escaped %

-Rob

Siep_Korteling · 26 February 2008 21:43

Steve H. wrote:

···

On Feb 26, 12:20 pm, Rob Biedenharn <R...@AgileConsultingLLC.com> > wrote:

Probably because %1 looks like a partially escaped character. Try:

?%251
Where %25 is an escaped %

-Rob

I appreciate the reply. This is a bit unfortunate, I am developing a
tool which has to handle URIs the same way the browser does. While I
realize that is not a "correct" URI, the browser still fetches the
pages without a problem. In some sense, I wish I could mirror the
functionality of the browser fetch using the URI module. Anyhow, thank
you for your help!

Maybe this helps:

URI.escape('http://hello.com/a.php?%1'\)

=> "http://hello.com/a.php?%251"

Regards,

Siep
--
Posted via http://www.ruby-forum.com/\.

Eric_Hodel1 · 27 February 2008 04:51

What about Mechanize?

···

On Feb 26, 2008, at 13:20 PM, Steve H. wrote:

On Feb 26, 12:20 pm, Rob Biedenharn <R...@AgileConsultingLLC.com> > wrote:

Probably because %1 looks like a partially escaped character. Try:

?%251
Where %25 is an escaped %

-Rob

I appreciate the reply. This is a bit unfortunate, I am developing a
tool which has to handle URIs the same way the browser does. While I
realize that is not a "correct" URI, the browser still fetches the
pages without a problem. In some sense, I wish I could mirror the
functionality of the browser fetch using the URI module. Anyhow, thank
you for your help!

Piyush_Ranjan · 1 March 2008 08:45

I too want to know how to handle invalid URIs in mechanize. Is there any way
to override url checking ?

···

On Wed, Feb 27, 2008 at 10:21 AM, Eric Hodel <drbrain@segment7.net> wrote:

On Feb 26, 2008, at 13:20 PM, Steve H. wrote:
> On Feb 26, 12:20 pm, Rob Biedenharn <R...@AgileConsultingLLC.com> > > wrote:
>> Probably because %1 looks like a partially escaped character. Try:
>>
>> ?%251
>> Where %25 is an escaped %
>>
>> -Rob
>>
>
> I appreciate the reply. This is a bit unfortunate, I am developing a
> tool which has to handle URIs the same way the browser does. While I
> realize that is not a "correct" URI, the browser still fetches the
> pages without a problem. In some sense, I wish I could mirror the
> functionality of the browser fetch using the URI module. Anyhow, thank
> you for your help!

What about Mechanize?

Topic		Replies	Views
Open-URI and percent sign in url ruby-talk	2	108	27 September 2007
Found a ruby bug in the URI class, what do I do? ruby-talk	4	157	28 August 2009
Is this an open-uri bug? ruby-talk	0	109	1 November 2004
Is this an open-uri bug? ruby-talk	4	119	25 November 2004
Open-uri error ruby-talk	2	112	8 June 2006

Open-uri bug

Related topics