StringScanner question

Jon_A_Lambert2 · 18 September 2005 02:32

Dear Ruby,

------------------------------------------------- StringScanner#get_byte
get_byte()

···

------------------------------------------------------------------------
Scans one byte and returns it. Similar to, but not the same as,
#getch.

       s = StringScanner.new('ab')
       s.getch # => "a"
       s.getch # => "b"
       s.getch # => nil

---------------------------------------------------- StringScanner#getch
     getch()
------------------------------------------------------------------------
     Scans one character and returns it.
           s = StringScanner.new('ab')
      s.get_byte # => "a"
      s.get_byte # => "b"
      s.get_byte # => nil

I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.

Anyone know what the difference is, if any?

Thanks

--
J Lambert

Jon_A_Lambert2 · 18 September 2005 03:06

Jon A. Lambert wrote:

Anyone know what the difference is, if any?

Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on the ass as StringScanner will suddenly be popping and hopping through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first chapters of the "Coding Ruby: The Canonical Coder's Guide". Pay attention and do some research before wasting our time.

···

--
J. Lambert

Eric_Hodel1 · 18 September 2005 03:10

From looking at strscan.c, getch seems to be able to process multibyte characters.

Use get_byte.

···

On 17 Sep 2005, at 19:32, Jon A. Lambert wrote:

I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Logan_Capaldo · 18 September 2005 20:13

[snip docs]

I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft. Now I haven't noticed anything out of the ordinary using getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?
Thanks
--
J Lambert

Have you considered looking at String#unpack ? Its designed for all that "random binary cruft"

···

On Sep 17, 2005, at 10:32 PM, Jon A. Lambert wrote:

J-Van · 18 September 2005 23:05

It's necessary now to read C source code to figure out the API for
StringScanner?

···

On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:

Jon A. Lambert wrote:
> Anyone know what the difference is, if any?

Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

Jon_A_Lambert2 · 19 September 2005 06:24

Logan Capaldo wrote:

Have you considered looking at String#unpack ? Its designed for all
that "random binary cruft"

Yes I am using String#unpack after gathering up all the bytes together to do it. Unfortunately StringScanner doesn't have the unpack method, which would be quite handy and fine addition to the class. StringScanner saves me the hassle of writing a bunch of lexical navigation code.

···

--
J Lambert

Gavin_Kistner2 · 18 September 2005 23:18

To be clear, I believe Jon's harsh response was written in response to himself. He was saying "Oops, I figured it out myself."

···

On Sep 18, 2005, at 5:05 PM, Joe Van Dyk wrote:

On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:

Jon A. Lambert wrote:
Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

It's necessary now to read C source code to figure out the API for
StringScanner?

Jon_A_Lambert2 · 19 September 2005 06:14

Joe Van Dyk wrote:

It's necessary now to read C source code to figure out the API for
StringScanner?

Apparently t'was "necessary" in the practical, "Well I had to", rather than the idealic "Well I oughta not had to".

···

--
J. Lambert

J-Van · 18 September 2005 23:30

Yeah, I noticed that. But still, it shouldn't be necessary to read
source code to figure out API documentation.

···

On 9/18/05, Gavin Kistner <gavin@refinery.com> wrote:

On Sep 18, 2005, at 5:05 PM, Joe Van Dyk wrote:
> On 9/17/05, Jon A. Lambert <jlsysinc@alltel.net> wrote:
>> Jon A. Lambert wrote:
>> Dear Jon,
>>
>> If you had bothered to read the source code you would have found a
>> bunch of slick character encoding tables in regex.c and know that
>> the lengths of characters in strings are dependent on the encoding
>> options you be running on. As long as you be using usacii then
>> you'll be alright, but if you start messing with kanji you'll be
>> bitten on
>> the ass as StringScanner will suddenly be popping and hopping
>> through 1,2, or n bytes at a time with getch. So I'd recommend
>> using getbyte.
>>
>> There are enough hints about such things dropped in the very first
>> chapters of the "Coding Ruby: The Canonical Coder's Guide".
>> Pay attention and do some research before wasting our time.
>>
>
> It's necessary now to read C source code to figure out the API for
> StringScanner?

To be clear, I believe Jon's harsh response was written in response
to himself. He was saying "Oops, I figured it out myself."

Topic		Replies	Views
Ruby 1.9 string slicing and StringScanner pointers ruby-talk	6	146	14 December 2009
StringScanner and UTF-8 in ruby 1.9 ruby-talk	0	125	16 September 2009
Strange StringScanner behaviour ruby-talk	15	95	16 June 2009
Ruby-dev summary 26385-26467 ruby-talk	1	111	18 July 2005
Reading binary files (or strings) ruby-talk	1	155	1 June 2002

StringScanner question

Related topics