Substring by range parameter (bug?)

Correct to my previous conclusion.
The empty string, itself, does have one and only one empty substring too.
And the empty string case is even more strange...

List all what I found here:

s="ab"; s[-1..-2]="xxx"; p s #=> s = "axxxb"
s="ab"; s[-2..-3]="xxx"; p s #=> s = "xxxab"
s="ab"; s[1..0]="xxx"; p s #=> s = "axxxb", same case as s[-1..-2]
s="ab"; s[2..1]="xxx"; p s #=> s = "abxxx"

"ab"[3..2] #=> nil
"ab"[-3..-4] #=> nil

empty string case: (only 1 empty substring)
""[-1..-2] #=> nil
""[-1..0] #=> nil
""[1..2] #=> nil
""[0..-1] #=> ""
""[0..1] #=> ""
""[0..2] #=> ""
""[0..3] #=> ""
""[0..4] #=> ""
""[0..5] #=> ""
... etc

(All empty substring of empty string is the same, you could re-assing to see it)

···

From: Hal Fulton <hal9000@hypermetrics.com>
Reply-To: ruby-talk@ruby-lang.org
To: ruby-talk@ruby-lang.org (ruby-talk ML)
Subject: Re: substring by range parameter (bug?)
Date: Thu, 22 Jul 2004 07:19:40 +0900

D T wrote:

OK. You got my point.
And your explanation seems logic to me.
Thanks.

Anyway, I still feel this is very strange...
"a"[-1..-2] #=> ""
""[-1..-2] #=> nil

My Conclusion is : For any no empty string, it exists exactly string's length + 1 of empty substring!
(location does matter)

Example: for "ab", there are exactly 3 empty substrings locate at "^a^b^" (^ shows empty string position)

s="ab"; s[-1..-2]="xxx"; p s #==> s = "axxxb"
s="ab"; s[-2..-3]="xxx"; p s #==> s = "xxxab"
s="ab"; s[1..0]="xxx"; p s #==> s = "axxxb", it is the same as s[-1..-2]
s="ab"; s[2..1]="xxx"; p s #==> s = "abxxx"

As you can see, there are exactly 3 empty substrings on "ab". (and you can re-assing) :wink:

This is the most logical analysis of this issue that I remember
seeing. Thank you for that.

And also IMO it's the best justification for this behavior -- after
all, "before the beginning" and "after the end" are valid locations
as far as insertion goes.

I now understand better what matz said long ago, about an imaginary
pointer in between the elements. I understand this better WRT insertion
than WRT accessing data.

Array subranges work much the same way, I believe, correct?

Hal

_________________________________________________________________
Planning a family vacation? Check out the MSN Family Travel guide! http://dollar.msn.com

if you look at the source, it is clear that rb_range_beg_len intends to fail
with ranges like -1...-42. i think it is a bug that it does not. this patch
should fix it:

--- range.c.org 2004-07-21 15:56:02.000000000 -0600
+++ range.c 2004-07-21 16:41:42.000000000 -0600
@@ -22,6 +22,7 @@
             end = len;
      }
      if (end < 0) end += len;
+ if (beg < end) goto out_of_range; /* b4 including end point require end >= beg */
      if (!EXCL(range)) end++; /* include end point */
      if (end < 0) goto out_of_range;
      len = end - beg;

beg and end are mapped to positive values and then the endpoint is included
iff range is not exclusive. however, even before this step it is clearly a
pre-condition that end >= beg.

-a

Correct to my previous conclusion.
The empty string, itself, does have one and only one empty substring too.
And the empty string case is even more strange...

List all what I found here:

s="ab"; s[-1..-2]="xxx"; p s #=> s = "axxxb"
s="ab"; s[-2..-3]="xxx"; p s #=> s = "xxxab"
s="ab"; s[1..0]="xxx"; p s #=> s = "axxxb", same case as s[-1..-2]
s="ab"; s[2..1]="xxx"; p s #=> s = "abxxx"

"ab"[3..2] #=> nil
"ab"[-3..-4] #=> nil

empty string case: (only 1 empty substring)
""[-1..-2] #=> nil
""[-1..0] #=> nil
""[1..2] #=> nil
""[0..-1] #=> ""
""[0..1] #=> ""
""[0..2] #=> ""
""[0..3] #=> ""
""[0..4] #=> ""
""[0..5] #=> ""
.. etc

(All empty substring of empty string is the same, you could re-assing to see
it)

From: Hal Fulton <hal9000@hypermetrics.com>
Reply-To: ruby-talk@ruby-lang.org
To: ruby-talk@ruby-lang.org (ruby-talk ML)
Subject: Re: substring by range parameter (bug?)
Date: Thu, 22 Jul 2004 07:19:40 +0900

D T wrote:

OK. You got my point.
And your explanation seems logic to me.
Thanks.

Anyway, I still feel this is very strange...
"a"[-1..-2] #=> ""
""[-1..-2] #=> nil

My Conclusion is : For any no empty string, it exists exactly string's
length + 1 of empty substring!
(location does matter)

Example: for "ab", there are exactly 3 empty substrings locate at "^a^b^"
(^ shows empty string position)

s="ab"; s[-1..-2]="xxx"; p s #==> s = "axxxb"
s="ab"; s[-2..-3]="xxx"; p s #==> s = "xxxab"
s="ab"; s[1..0]="xxx"; p s #==> s = "axxxb", it is the same as
s[-1..-2]
s="ab"; s[2..1]="xxx"; p s #==> s = "abxxx"

As you can see, there are exactly 3 empty substrings on "ab". (and you can
re-assing) :wink:

This is the most logical analysis of this issue that I remember
seeing. Thank you for that.

And also IMO it's the best justification for this behavior -- after
all, "before the beginning" and "after the end" are valid locations
as far as insertion goes.

I now understand better what matz said long ago, about an imaginary
pointer in between the elements. I understand this better WRT insertion
than WRT accessing data.

Array subranges work much the same way, I believe, correct?

Hal

_________________________________________________________________
Planning a family vacation? Check out the MSN Family Travel guide!
http://dollar.msn.com

-a

···

On Thu, 22 Jul 2004, D T wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

oops - i'm an idiot, that should've been:

--- range.c.org 2004-07-21 15:56:02.000000000 -0600
+++ range.c 2004-07-21 16:54:48.000000000 -0600
@@ -22,6 +22,7 @@
             end = len;
      }
      if (end < 0) end += len;
+ if (end < beg) goto out_of_range; /* b4 including end point require end >= beg */
      if (!EXCL(range)) end++; /* include end point */
      if (end < 0) goto out_of_range;
      len = end - beg;

sorry for confusion - i've gotten so used to thinking with 'unless'!

-a

-a

···

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Hi,

···

In message "[PATCH] was - Re: substring by range parameter (bug?)" on 04/07/22, "Ara.T.Howard" <ahoward@noaa.gov> writes:

if you look at the source, it is clear that rb_range_beg_len intends to fail
with ranges like -1...-42. i think it is a bug that it does not. this patch
should fix it:

I think I understand you, but not sure. Can you show me an example of
wrong behavior?

              matz.

note the second case:

   wrong:

     ~ > irb
     irb(main):001:0> 'foobar'[-1..-1]
     => "r"
     irb(main):002:0> 'foobar'[-1..-2]
     => ""
     irb(main):003:0> 'foobar'[-1..-3]
     => nil

   right (patch applied):

     ~ > irb
     irb(main):001:0> 'foobar'[-1..-1]
     => "r"
     irb(main):002:0> 'foobar'[-1..-2]
     => nil
     irb(main):003:0> 'foobar'[-1..-3]
     => nil

it seems like a simple off by one error cause by mapping

   -n,...,-3,-2,-1 => 0,...,n-3,n-2,n-1

my interpretation of rb_range_beg_len was that the test

   if (len < 0)

is to catch errors made by specifying ranges whose ends are before their
beginings, like in

   'foobar'[5..2] # => nil

however this breaks down in the case of

   'foobar'[-1..-2] # => ''

because

   end++;

occurs AFTER neg ranges are mapped to pos ranges but before

   if (!EXCL(range)...

i think it should always be the case that

   beg <= end

after beg and end have been mapped positive but BEFORE end is incremented to
include end if it's exclusive. else we have and off by one problem...

this is, i think, what testing

   len < 0

was supposed to catch - but this fails when

   beg = -n
   end = -n - 1

as is the case with -1..-2

maybe i've misinterpreted this? if not here is the patch again - please note
my first one was wrong! sorry.

diff -u range.c.org range.c
--- range.c.org 2004-07-21 15:56:02.000000000 -0600
+++ range.c 2004-07-21 16:54:48.000000000 -0600
@@ -22,6 +22,7 @@
             end = len;
      }
      if (end < 0) end += len;
+ if (end < beg) goto out_of_range; /* b4 including end point require end >= beg */
      if (!EXCL(range)) end++; /* include end point */
      if (end < 0) goto out_of_range;
      len = end - beg;

kind regards.

-a

···

On Thu, 22 Jul 2004, Yukihiro Matsumoto wrote:

Hi,

In message "[PATCH] was - Re: substring by range parameter (bug?)" > on 04/07/22, "Ara.T.Howard" <ahoward@noaa.gov> writes:

>if you look at the source, it is clear that rb_range_beg_len intends to fail
>with ranges like -1...-42. i think it is a bug that it does not. this patch
>should fix it:

I think I understand you, but not sure. Can you show me an example of
wrong behavior?

              matz.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

Hi,

>if you look at the source, it is clear that rb_range_beg_len intends to fail
>with ranges like -1...-42. i think it is a bug that it does not. this patch
>should fix it:

I think I understand you, but not sure. Can you show me an example of
wrong behavior?

              matz.

note the second case:

  wrong:

    ~ > irb
    irb(main):001:0> 'foobar'[-1..-1]
    => "r"
    irb(main):002:0> 'foobar'[-1..-2]
    => ""
    irb(main):003:0> 'foobar'[-1..-3]
    => nil

This seems right to me... aren't these two expressions equivalent?

   'foobar'[-1..-2]
     ==>""
   'foobar'[5..4]
     ==>""

I admit that the beginning being lower than the end is a little strange at first, but I understood it to be an artifact of the range type. Since 1..2 includes the end number, it makes sense that the only way to get an empty string would be to go backwards...

  right (patch applied):

    ~ > irb
    irb(main):001:0> 'foobar'[-1..-1]
    => "r"
    irb(main):002:0> 'foobar'[-1..-2]
    => nil
    irb(main):003:0> 'foobar'[-1..-3]
    => nil

Correct me if I'm wrong, but doesn't that make it impossible to reference an empty string between characters?

cheers,
Mark

···

On Jul 21, 2004, at 9:32 PM, Ara.T.Howard wrote:

On Thu, 22 Jul 2004, Yukihiro Matsumoto wrote:

In message "[PATCH] was - Re: substring by range parameter (bug?)" >> on 04/07/22, "Ara.T.Howard" <ahoward@noaa.gov> writes:

Hi,

···

In message "Re: [PATCH] was - Re: substring by range parameter (bug?)" on 04/07/22, "Ara.T.Howard" <ahoward@noaa.gov> writes:

note the second case:

  wrong:

    ~ > irb
    irb(main):001:0> 'foobar'[-1..-1]
    => "r"
    irb(main):002:0> 'foobar'[-1..-2]
    => ""
    irb(main):003:0> 'foobar'[-1..-3]
    => nil

  right (patch applied):

    ~ > irb
    irb(main):001:0> 'foobar'[-1..-1]
    => "r"
    irb(main):002:0> 'foobar'[-1..-2]
    => nil
    irb(main):003:0> 'foobar'[-1..-3]
    => nil

Now I fully understand. I will commit your fix. Can you prepare
ChangeLog entry for this fix?

              matz.

This seems right to me... aren't these two expressions equivalent?

  'foobar'[-1..-2]
    ==>""
  'foobar'[5..4]
    ==>""

yes - both are broken by exactly the same bug! :wink:

I admit that the beginning being lower than the end is a little strange
at first,

??

-1 > -2 and 5 > 4 so i assume you mean 'beginning being __higher__ than the end'

??

but I understood it to be an artifact of the range type. Since 1..2
includes the end number,

i assume you mean -1..-2 here, and it (the range) does not seem to think it
includes the end number:

   ~ > irb
   irb(main):001:0> (-1..-2).include? -2
   => false

nor the starting point for that matter!

   irb(main):002:0> (-1..-2).include? -1
   => false

how can a range not include it's starting point - even if it's going backwards?

it makes sense that the only way to get an empty string would be to go
backwards...

nothing else about ranges can go backwards though:

   irb(main):001:0> (0..-42).to_a
   =>

and everything about backward ranges is broken and/or inconsistent (see
above). if you read the source for rb_range_beg_len you'll see that backward
ranges are actively serached for (len < 0) and an exception thrown (or Qnil
retruned if err == 0) if one is found. that's why

   irb(main):001:0> 'foobar'[-1..-3]
   => nil

also, the docs claim otherwise with no mention of this special range feature:

   /*
    * a = "hello there"
    * a[1] #=> 101
    * a[1,3] #=> "ell"
    * a[1..3] #=> "ell"
    * a[-3,2] #=> "er"
    * a[-4..-2] #=> "her"
    * a[-2..-4] #=> nil
    * a[/[aeiou](.)\1/] #=> "ell"
    * a[/[aeiou](.)\1/, 0] #=> "ell"
    * a[/[aeiou](.)\1/, 1] #=> "l"
    * a[/[aeiou](.)\1/, 2] #=> nil
    * a["lo"] #=> "lo"
    * a["bye"] #=> nil
    */

so there's nothing in the code that suggests 'backward' ranges where end =
start -1 (-1..-2) should be handled any differently. the reason they are
handled differently appears to be an off by one error cause by mapping negative
indexes to positive ones and then handling them the exactly same - remember the
absolute value of the range of negative indexes is off by one from the absolute
value of the range of positive indexes...

Correct me if I'm wrong, but doesn't that make it impossible to reference an
empty string between characters?

yes. and it i think it should. you cannot reference this empty string using
positive indexes at all, nor negative ones EXCEPT for the special case of end =
beg -1 (-1..-2).

you can always reference the empty string using offset plus len == 0

   ~ > irb
   irb(main):001:0> s='foobar'
   => "foobar"

   irb(main):002:0> s[4,0]
   => ""

   irb(main):003:0> s[4,0] = '<emtpy>'
   => "<emtpy>"

   irb(main):004:0> s
   => "foob<emtpy>ar"

cheers.

-a

···

On Thu, 22 Jul 2004, Mark Hubbart wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

In article <1090572876.683170.2693.nullmailer@picachu.netlab.jp>,
  matz@ruby-lang.org (Yukihiro Matsumoto) writes:

Now I fully understand. I will commit your fix. Can you prepare
ChangeLog entry for this fix?

Do you notice that the patch causes many errors in the test?

% make test-all
./miniruby ./runruby.rb --extout=.ext -- -C "./test" runner.rb --runner=console
Loaded suite .
Started
.......................................................................................................................................................................................................................................................................................................................................................................E.............................................................................................................................F..............................................................................................................................................................................................................................................................................................................................................................................................................................
..........................................................................................
...............................................................................................................................................E..EEEE.EEE.E.EF............................EEEEEEEEF, [2004-07-23T18:20:58.717811 #10037] FATAL -- SOAP::Calc::TestCalc: Detected an exception. Stopping ... Address already in use - bind(2) (Errno::EADDRINUSE)
/home/akr/ruby/tmp-ruby/ruby/lib/webrick/utils.rb:62:in `initialize'
/home/akr/ruby/tmp-ruby/ruby/lib/webrick/utils.rb:62:in `new'
/home/akr/ruby/tmp-ruby/ruby/lib/webrick/utils.rb:62:in `create_listeners'

...snip...

63) Error:
test_soapbodyparts(WSDL::soap::TestSOAPBodyParts):
NoMethodError: undefined method `reset_stream' for nil:NilClass
    ./wsdl/soap/test_soapbodyparts.rb:73:in `teardown_client'
    ./wsdl/soap/test_soapbodyparts.rb:63:in `teardown'

1499 tests, 10358 assertions, 4 failures, 59 errors

···

--
Tanaka Akira

tanaka has pointed out that this patch causes test failures. however, it does
seem to be a bug - in fact negative beg/end in ranges does many odd things:

   ~/build/ruby > irb

   irb(main):001:0> (-1..-2).include? -1
   => false

   irb(main):002:0> (-1..-2).include? -2
   => false

so perhaps this problem is bigger than DT initially suggested.

it seems like Range#initialize should throw and error if beg > end (after
mapping negative -> positive for cases like (5..-3)) but this would be a
massive change...

i am happy to contribute, but what do you suggest?

kind regards.

-a

···

On Fri, 23 Jul 2004, Yukihiro Matsumoto wrote:

Hi,

In message "Re: [PATCH] was - Re: substring by range parameter (bug?)" > on 04/07/22, "Ara.T.Howard" <ahoward@noaa.gov> writes:

>note the second case:
>
> wrong:
>
> ~ > irb
> irb(main):001:0> 'foobar'[-1..-1]
> => "r"
> irb(main):002:0> 'foobar'[-1..-2]
> => ""
> irb(main):003:0> 'foobar'[-1..-3]
> => nil
>
> right (patch applied):
>
> ~ > irb
> irb(main):001:0> 'foobar'[-1..-1]
> => "r"
> irb(main):002:0> 'foobar'[-1..-2]
> => nil
> irb(main):003:0> 'foobar'[-1..-3]
> => nil

Now I fully understand. I will commit your fix. Can you prepare
ChangeLog entry for this fix?

              matz.

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

i have too. but what to do - it should be fixed?

-a

···

On Fri, 23 Jul 2004, Tanaka Akira wrote:

In article <1090572876.683170.2693.nullmailer@picachu.netlab.jp>,
matz@ruby-lang.org (Yukihiro Matsumoto) writes:

Now I fully understand. I will commit your fix. Can you prepare
ChangeLog entry for this fix?

Do you notice that the patch causes many errors in the test?

--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

In article <Pine.LNX.4.60.0407230631440.13379@harp.ngdc.noaa.gov>,
  "Ara.T.Howard" <ahoward@noaa.gov> writes:

i have too. but what to do - it should be fixed?

I think the patch should be rejected.

I guess it breaks many programs as well as it breaks many tests. So
many people will be unhappy with the patch.

Who is happy with the patch?

Do you have a program which will be easiar to read/write when
s[n..(n-1)] returns nil?

···

--
Tanaka Akira