StringScanner question

Dear Ruby,

------------------------------------------------- StringScanner#get_byte
     get_byte()

···

------------------------------------------------------------------------
     Scans one byte and returns it. Similar to, but not the same as,
     #getch.

       s = StringScanner.new('ab')
       s.getch # => "a"
       s.getch # => "b"
       s.getch # => nil

---------------------------------------------------- StringScanner#getch
     getch()
------------------------------------------------------------------------
     Scans one character and returns it.
           s = StringScanner.new('ab')
      s.get_byte # => "a"
      s.get_byte # => "b"
      s.get_byte # => nil

     I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.

Anyone know what the difference is, if any?

Thanks

--
J Lambert

Jon A. Lambert wrote:

Anyone know what the difference is, if any?

Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on the ass as StringScanner will suddenly be popping and hopping through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first chapters of the "Coding Ruby: The Canonical Coder's Guide". Pay attention and do some research before wasting our time.

···

--
J. Lambert

From looking at strscan.c, getch seems to be able to process multibyte characters.

Use get_byte.

···

On 17 Sep 2005, at 19:32, Jon A. Lambert wrote:

    I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

[snip docs]

   I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?
Thanks
--
J Lambert

Have you considered looking at String#unpack ? Its designed for all that "random binary cruft"

···

On Sep 17, 2005, at 10:32 PM, Jon A. Lambert wrote:

It's necessary now to read C source code to figure out the API for
StringScanner?

···

On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:

Jon A. Lambert wrote:
> Anyone know what the difference is, if any?

Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

Logan Capaldo wrote:

Have you considered looking at String#unpack ? Its designed for all
that "random binary cruft"

Yes I am using String#unpack after gathering up all the bytes together to do it. Unfortunately StringScanner doesn't have the unpack method, which would be quite handy and fine addition to the class. StringScanner saves me the hassle of writing a bunch of lexical navigation code.

···

--
J Lambert

To be clear, I believe Jon's harsh response was written in response to himself. He was saying "Oops, I figured it out myself."

···

On Sep 18, 2005, at 5:05 PM, Joe Van Dyk wrote:

On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:

Jon A. Lambert wrote:
Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

It's necessary now to read C source code to figure out the API for
StringScanner?

Joe Van Dyk wrote:

It's necessary now to read C source code to figure out the API for
StringScanner?

Apparently t'was "necessary" in the practical, "Well I had to", rather than the idealic "Well I oughta not had to".

···

--
J. Lambert

Yeah, I noticed that. But still, it shouldn't be necessary to read
source code to figure out API documentation.

···

On 9/18/05, Gavin Kistner <gavin@refinery.com> wrote:

On Sep 18, 2005, at 5:05 PM, Joe Van Dyk wrote:
> On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:
>> Jon A. Lambert wrote:
>> Dear Jon,
>>
>> If you had bothered to read the source code you would have found a
>> bunch of slick character encoding tables in regex.c and know that
>> the lengths of characters in strings are dependent on the encoding
>> options you be running on. As long as you be using usacii then
>> you'll be alright, but if you start messing with kanji you'll be
>> bitten on
>> the ass as StringScanner will suddenly be popping and hopping
>> through 1,2, or n bytes at a time with getch. So I'd recommend
>> using getbyte.
>>
>> There are enough hints about such things dropped in the very first
>> chapters of the "Coding Ruby: The Canonical Coder's Guide".
>> Pay attention and do some research before wasting our time.
>>
>
> It's necessary now to read C source code to figure out the API for
> StringScanner?

To be clear, I believe Jon's harsh response was written in response
to himself. He was saying "Oops, I figured it out myself."