[rcr] String#first / String#last

Doh. I forgot my main point. I wanted to say that one might think Integer()
would take care of all this, but imagine my suprise when I did this:

irb(main):001:0> Integer(:a)
=> 12169

T.

···

On Sunday 24 October 2004 12:48 pm, trans. (T. Onoma) wrote:

Seems like a lot overhead though when one considers this kind of thing is
going on throughout the system.

Robert Klemme wrote:

>adding a to_a might be good
"to_a" works line-wise. Perhaps "explode"

RCR for #chars

This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

The problem with String#each_byte is that nobody wants to handle characters as Integers in Ruby, IMHO. (I think one-character Strings are preferred, because they still let you use lots of Strings useful methods.)

Kind regards
    robert

More regards,
Florian Gross

"Robert Klemme" <bob.news@gmx.net> wrote in message news:<2u3rh2F22mbfkU1@uni-berlin.de>...

"trans. (T. Onoma)" <transami@runbox.com> schrieb im Newsbeitrag
news:200410250001.59113.transami@runbox.com...
> > Hi,
> >
> > >I think I'd be opposed to that. But adding a to_a might be good -- I
> > >frequently do a gratuitous split operation for that purpose.
> >
> > It already has "to_a" which works line-wise. Perhaps something called
> > "explode" in other language is what you want.
>
> Believe their is an RCR for #chars
>
> def chars
> split(//)
> end

This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

Kind regards

    robert

I prefer String#unpack("C*"). I'm not sure what encoding issues there
are with that, however.

Dan

···

> On Sunday 24 October 2004 11:54 pm, Yukihiro Matsumoto wrote:
> > In message "Re: [rcr] String#first / String#last" > > > > > > on Mon, 25 Oct 2004 09:18:51 +0900, Hal Fulton > <hal9000@hypermetrics.com> writes:

Robert Klemme wrote:

"trans. (T. Onoma)" <transami@runbox.com> schrieb im Newsbeitrag

It already has "to_a" which works line-wise. Perhaps something called
"explode" in other language is what you want.

Believe their is an RCR for #chars

def chars
   split(//)
end

This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

In my mind, I blur the distinction between chars and "length 1" strings.

I am fully aware of the difference, but I rarely have need of characters
as Fixnums.

Ruby in the future (as I understand it) will also blur this distinction
further: Apparently ?x will be the same as "x" and "abc"[1] will be "b".

Hal

···

On Sunday 24 October 2004 11:54 pm, Yukihiro Matsumoto wrote:

"Florian Gross" <flgr@ccan.de> schrieb im Newsbeitrag
news:2u4cv2F24vocaU1@uni-berlin.de...

Robert Klemme wrote:

>>> >adding a to_a might be good
>>> "to_a" works line-wise. Perhaps "explode"
>>RCR for #chars
> This does not yield characters but strings with length 1. Note also

that

> there is String#each_byte which is often sufficient.

The problem with String#each_byte is that nobody wants to handle
characters as Integers in Ruby, IMHO. (I think one-character Strings are
preferred, because they still let you use lots of Strings useful

methods.)

Yeah possibly. Another problem with each_byte is that byte != char for
many encodings. But AFAIK #each_byte and #split(//) share this problem.

I believe a drawback of using "foo".split(//) is that it's less
performant: you need to create the tmp array plus all the string instances
(although they share the buffer AFAIK).

> Kind regards
> robert

More regards,
Florian Gross

Even more regards

    robert

:slight_smile:

That is good to hear. I remember when I began learning Ruby I was put off/confused by the fact that String# returned a Fixnum when given a single index.

-Charlie

···

On Oct 25, 2004, at 10:15 AM, Hal Fulton wrote:

In my mind, I blur the distinction between chars and "length 1" strings.

I am fully aware of the difference, but I rarely have need of characters
as Fixnums.

Ruby in the future (as I understand it) will also blur this distinction
further: Apparently ?x will be the same as "x" and "abc"[1] will be "b".

I prefer String#unpack("C*"). I'm not sure what encoding issues there
are with that, however.

If it's a UTF8 string then use unpack("U*")

As for #first and #last:

  a = "abcdefg"
  a[0...2] # first two characters
  a[0,2] # first two characters
  a[-2..-1] # last two characters
  a[-2,9999] # last two characters (cheating)

I think the last case shows where a sanctioned "infinity" would be nice. But
otherwise:

  class String
    def first(n=1)
      self[0,n]
    end
    def last(n=1)
      return nil if n < 0
      return "" if n == 0
      return self if n > self.size
      self[-n..-1]
    end
  end

I still find that string operations sit uncomfortably together.

  string[a..b] # Start pos is a, end pos is b
                 # If either a or b is negative, it's an offset from the end
                 # (which means it's not a Range in a useful sense)
                 # Return nil if a is not within string, or b is to the
                 # 'left' of a, after resolving negative offsets

  string[a,b] # Start pos is a, length is b
                 # If a is negative, it's an offset from the end
                 # If b is negative, nil is returned

  string[a] # return the a'th byte of string as an integer (ick)

I have to remind myself of these with irb each time I use them. "How do I
get from character a to the end of the string?" => str[a..-1]

"How do I get just the a'th character by itself (as a string)?" => str[a,1]

Regards,

Brian.

I think this is a very interesting point. I wonder how Ruby 2 will progress in
the area. Isn't i18N (not that I really know what that entails) on the map?
How does that effect things. Is it then prudent to create an actual Character
class, such that a String is essentially an Array of Characters? (not to say
there won't be differences between Array and String, but in essence)

T.

···

On Monday 25 October 2004 09:49 am, Robert Klemme wrote:

Yeah possibly. Another problem with each_byte is that byte != char for
many encodings. But AFAIK #each_byte and #split(//) share this problem.

I believe a drawback of using "foo".split(//) is that it's less
performant: you need to create the tmp array plus all the string instances
(although they share the buffer AFAIK).

Robert Klemme wrote:

"Florian Gross" <flgr@ccan.de> schrieb im Newsbeitrag
news:2u4cv2F24vocaU1@uni-berlin.de...

Robert Klemme wrote:

>adding a to_a might be good
"to_a" works line-wise. Perhaps "explode"

RCR for #chars

This does not yield characters but strings with length 1. Note also

that

there is String#each_byte which is often sufficient.

The problem with String#each_byte is that nobody wants to handle
characters as Integers in Ruby, IMHO. (I think one-character Strings are
preferred, because they still let you use lots of Strings useful

methods.)

Yeah possibly. Another problem with each_byte is that byte != char for
many encodings. But AFAIK #each_byte and #split(//) share this problem.

I think .split(//) works correctly (returning characters) with -Ku, but I'm not sure about #each_byte.

I believe a drawback of using "foo".split(//) is that it's less
performant: you need to create the tmp array plus all the string instances
(although they share the buffer AFAIK).

Agreed, and this is a big problem in current Ruby -- we don't really need the Array if Strings themselves let you do character-based operation instead of line-based ones easily.

The overhead required for individual Character Objects could on the other hand be quite low. They could also be value Objects meaning you only need one single Object per character which you then store in a big hash. Plus they would not need much of the reallocation overhead of Strings. (They would just need to support most of Strings interface.) I think it is enough for them to be a wrapper around a char-trait in C.

Kind regards
   robert

More regards,
Florian Gross

Even more regards
    robert

Yet more regards,
Florian Gross

:wink:

I would like that. Characters are an interesting class in there
own right--and they are _not_ bytes. Strings are much more like arrays
of Characters than Characters are like Integers, so that would be a much
cleaner way to go.

     *smile* We could also resolve the discrepancies by deprecating
strings in favor of Bignums, which I suspect Kurt Gödel would like, if
no one else.

-- Markus

···

On Mon, 2004-10-25 at 07:25, trans. (T. Onoma) wrote:

On Monday 25 October 2004 09:49 am, Robert Klemme wrote:
> Yeah possibly. Another problem with each_byte is that byte != char for
> many encodings. But AFAIK #each_byte and #split(//) share this problem.
>
> I believe a drawback of using "foo".split(//) is that it's less
> performant: you need to create the tmp array plus all the string instances
> (although they share the buffer AFAIK).

I think this is a very interesting point. I wonder how Ruby 2 will progress in
the area. Isn't i18N (not that I really know what that entails) on the map?
How does that effect things. Is it then prudent to create an actual Character
class, such that a String is essentially an Array of Characters? (not to say
there won't be differences between Array and String, but in essence)