String#unpack and null-terminated strings

George5 · 24 April 2004 22:12

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

“abc\000def\000”.unpack(“??”) # => [“abc”, “def”]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

Regards,

Michael

Index: pack.c

···

===================================================================
RCS file: /src/ruby/pack.c,v
retrieving revision 1.69
diff -r1.69 pack.c
1287a1288,1290

  T    | String  | read zero-terminated string

       |         | (with null char removed)

-------+---------+-----------------------------------------

1389a1393,1408

  break;

case 'T':
        /* read until end of string or until a null character occurs */
        {
            char *start = s;

            while (s < send) {     /* don't read more than the whole string */
              if (*s == '\000') break;
              s++;
            }

            rb_ary_push(ary, infected_str_new(start, s-start, str));

            if (s < send && *s == '\000') s++; /* skip null character */
        }

Mike_Stok · 24 April 2004 22:44

Michael Neumann mneumann@ntecs.de writes:

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

“abc\000def\000”.unpack(“??”) # => [“abc”, “def”]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

You could use String#split e.g.

irb(main):001:0> “abc\000def\000”.split(/\0/)
=> [“abc”, “def”]

I know it’s not String#unpack, but hope it helps.

Mike

···

–
mike@stok.co.uk | The “`Stok’ disclaimers” apply.
http://www.stok.co.uk/~mike/ | GPG PGP Key 1024D/059913DA
mike@exegenix.com | Fingerprint 0570 71CD 6790 7C28 3D60
http://www.exegenix.com/ | 75D2 9EC4 C1C0 0599 13DA

Daniel_Berger3 · 25 April 2004 02:14

Michael Neumann mneumann@ntecs.de wrote in message news:20040424221220.GA6199@miya.intranet.ntecs.de…

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

“abc\000def\000”.unpack(“??”) # => [“abc”, “def”]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

Regards,

Michael

“abc\000def\000”.unpack(“A3xA3”) # => [“abc”,“def”]

Using the example you later posted…

“\100String\000\100”.unpack(“CA6xC”) # => [64,“String”,64]

Regards,

Dan

daz · 25 April 2004 05:34

Michael Neumann wrote:

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

“abc\000def\000”.unpack(“??”) # => [“abc”, “def”]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

[snip] diff -r1.69 pack.c

At the risk of being told to clear off and write my own spec.,
I think that an ambuiguity has intruded into the designers mind.

The A and Z string field formats should IMO be recovered from
left to right. Doesn’t the term “string” relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

If this is going to break code, I wish that it could happen
from 1.9.
As it is now, A and Z are behaving the way I would expect
A* and Z* to (i.e. * uses all remaining elements).

There’s String#rstrip for removing spaces and nulls from the
end of a String.

Unpack is very useful for decoding structures but with the
current behaviour if a structure were to contain a null-
terminated string element it would break the flow …
… as Michael has highlighted.

Please, Matz.

daz

George5 · 24 April 2004 23:06

Sure this works. But I want to mix it with other data-types like:

“\100String\000\100”.unpack(“CTC”) # T=null-term string

=> [64, “String”, 64]

Otherwise I have to write:

str = “\100String\000\100”
a, str = str.unpack(“Ca*”)
b, str = str.split(“\000”, 2)
c, _ = str.unpack(“Ca*”)

p [a, b, c] # => [64, “String”, 64]

Which is a bit ugly

Pyhtons struct.unpack has a “s” format specifier which does exactly what
I want. Perl and Ruby doesn’t have this.

http://www.python.org/doc/current/lib/module-struct.html

Regards,

Michael

···

On Sun, Apr 25, 2004 at 07:44:05AM +0900, Mike Stok wrote:

Michael Neumann mneumann@ntecs.de writes:

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

“abc\000def\000”.unpack(“??”) # => [“abc”, “def”]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

You could use String#split e.g.

irb(main):001:0> “abc\000def\000”.split(/\0/)
=> [“abc”, “def”]

Nobuyoshi_Nakada · 25 April 2004 06:39

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:

The A and Z string field formats should IMO be recovered from
left to right. Doesn’t the term “string” relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Sounds nice.

Index: pack.c

···

===================================================================
RCS file: /cvs/ruby/src/ruby/pack.c,v
retrieving revision 1.69
diff -u -2 -p -r1.69 pack.c
— pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000
@@ -435,5 +435,5 @@ static unsigned long utf8_to_uv _((char*

```
  X     |  Back up a byte
```
```
  x     |  Null byte
```

- ```
  Z     |  Same as ``A''
```

  Z     |  Same as ``a'', except that null is added with *

*/

@@ -524,6 +524,9 @@ pack_pack(ary, fmt)
case ‘A’: /* ASCII string (space padded) /
case ‘Z’: / null terminated ASCII string */

```
  if (plen >= len)
```

  if (plen >= len) {
      rb_str_buf_cat(res, ptr, len);

```
      if (p[-1] == '*' && type == 'Z')
```
```
  	rb_str_buf_cat(res, nul10, 1);
```

  }
  else {
      rb_str_buf_cat(res, ptr, plen);

@@ -1174,4 +1177,5 @@ infected_str_new(ptr, len, str)

"abc \0\0abc \0\0".unpack('A6Z6')   #=> ["abc", "abc "]

"abc \0\0".unpack('a3a3')           #=> ["abc", " \000\000"]

"abc \0abc \0".unpack('Z*Z*')       #=> ["abc ", "abc "]

"aa".unpack('b8B8')                 #=> ["10000110", "01100001"]

"aaa".unpack('h2H2c')               #=> ["16", "61", 97]

@@ -1285,4 +1289,5 @@ infected_str_new(ptr, len, str)

-------+---------+-----------------------------------------

  Z    | String  | with trailing nulls removed

       |         | upto first null with *

-------+---------+-----------------------------------------

  @    | ---     | skip to the offset given by the

@@ -1377,5 +1382,13 @@ pack_unpack(str, fmt)
case ‘Z’:
if (len > send - s) len = send - s;

```
  {
```

```
  if (star) {
```
```
  char *t = s;
```
```
  while (t < send && *t) t++;
```

  rb_ary_push(ary, infected_str_new(s, t - s, str));

```
  if (t < send) t++;
```
```
  s = t;
```
```
  }
```

  else {
  long end = len;
  char *t = s + len - 1;

–
Nobu Nakada

George5 · 25 April 2004 10:14

That’s exactly I expected how Z behaves. Thanks!

Regards,

Michael

···

On Sun, Apr 25, 2004 at 03:39:53PM +0900, nobu.nokada@softhome.net wrote:

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:

The A and Z string field formats should IMO be recovered from
left to right. Doesn’t the term “string” relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Sounds nice.

[patch]

daz · 26 April 2004 07:19

Nobu patched:

— pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000

[…]

case 'Z':
  if (len > send - s) len = send - s;

```
{
```

```
if (star) {
```
```
    char *t = s;
```
```
    while (t < send && *t) t++;
```

    rb_ary_push(ary, infected_str_new(s, t - s, str));

```
    if (t < send) t++;
```
```
    s = t;
```
```
}
```
```
else {
```

Combining that with recognition of the length specifier:

···

===============================

case ‘Z’:
{
char *t = s;

    if (len > send-s) len = send-s;
    while (t < s+len && *t) t++;
    rb_ary_push(ary, infected_str_new(s, t-s, str));
    if (t < send) t++;
    s = star ? t : s+len;
 }
 break;

===============================

s = “abc\0def\0\0jkl\0”

s.unpack(‘Z2ZZ’) #-> [“ab”, “c”, “def”]
s.unpack(‘Z6ZZ’) #-> [“abc”, “f”, “”]
s.unpack(‘Z7ZZ’) #-> [“abc”, “”, “”]
s.unpack(‘Z8ZZ’) #-> [“abc”, “”, “jkl”]
s.unpack(‘Z9ZZ’) #-> [“abc”, “jkl”, “”]
s.unpack(‘Z*Z42’) #-> [“abc”, “def”]

daz

Nobuyoshi_Nakada · 26 April 2004 14:42

Hi,

At Mon, 26 Apr 2004 16:19:04 +0900,
daz wrote in [ruby-talk:98364]:

Combining that with recognition of the length specifier:

===============================

case ‘Z’:
{
char *t = s;
    if (len > send-s) len = send-s;
    while (t < s+len && *t) t++;
    rb_ary_push(ary, infected_str_new(s, t-s, str));
    if (t < send) t++;
    s = star ? t : s+len;
 }
 break;
===============================

I’d also considered about it, but

s = “abc\0def\0\0jkl\0”

s.unpack(‘Z6ZZ’) #-> [“abc”, “f”, “”]

It can’t round trip with Array#pack, so I discarded this plan.

···

–
Nobu Nakada

daz · 26 April 2004 21:54

Nobu wrote:

daz wrote in [ruby-talk:98364]:

Combining that with recognition of the length specifier:

I’d also considered about it, but

s = “abc\0def\0\0jkl\0”

s.unpack(‘Z6ZZ’) #-> [“abc”, “f”, “”]

It can’t round trip with Array#pack, so I discarded this plan.

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

“abc\000de” == “abc\000\000\000” ==> “abc”

Everything from “\000” to the end of the field is junk
because the user told us so by using ‘Z’.

We don’t need to apologise that pack didn’t replace the
exact junk that was there before :-?

Round trip:

s = “abc\000def\000\000jkl\000”
zf = ‘Z6ZZ’

s.unpack(zf) #-> [“abc”, “f”, “”]
s.unpack(zf).pack(zf) #-> “abc\000\000\000f\000\000”
s.unpack(zf).pack(zf).unpack(zf) #-> [“abc”, “f”, “”]

The fixed width consumes the added zero padding bytes so
it doesn’t create bogus extra fields.

···

To me, the result below seems not to do what was requested:

s.unpack(‘Z6ZZ’) #-> [“abc\000de”, “f”, “”]

I’m probably missing a crucial point here?

daz

Nobuyoshi_Nakada · 27 April 2004 14:40

Hi,

At Tue, 27 Apr 2004 06:54:03 +0900,
daz wrote in [ruby-talk:98456]:

s = “abc\0def\0\0jkl\0”

s.unpack(‘Z6ZZ’) #-> [“abc”, “f”, “”]

It can’t round trip with Array#pack, so I discarded this plan.

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

“abc\000de” == “abc\000\000\000” ==> “abc”

Everything from “\000” to the end of the field is junk
because the user told us so by using ‘Z’.

We don’t need to apologise that pack didn’t replace the
exact junk that was there before :-?

Hmmm, sounds reasonable.

···

–
Nobu Nakada

Yukihiro_Matsumoto2 · 10 May 2004 08:53

Hi,

···

In message “Re: String#unpack and null-terminated strings” on 04/04/27, nobu.nokada@softhome.net nobu.nokada@softhome.net writes:

Hmmm, sounds reasonable.

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

						matz.

Nobuyoshi_Nakada · 12 May 2004 09:49

Hi,

At Mon, 10 May 2004 17:53:35 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:99719]:

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

What about 1.8?

···

–
Nobu Nakada

Yukihiro_Matsumoto2 · 13 May 2004 01:07

Hi.

···

In message “Re: String#unpack and null-terminated strings” on 04/05/12, nobu.nokada@softhome.net nobu.nokada@softhome.net writes:

At Mon, 10 May 2004 17:53:35 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:99719]:

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

What about 1.8?

Hmm. Go ahead. I now think it’s the only reasonable behavior for “Z”
with NUL containing strings.

						matz.

daz · 13 May 2004 02:18

Yukihiro Matsumoto wrote:

Hi.

At Mon, 10 May 2004 17:53:35 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:99719]:

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

What about 1.8?

Hmm. Go ahead. I now think it’s the only reasonable behavior for “Z”
with NUL containing strings.

matz.

Thanks, Matz.

The plea below is now wasted :))

···

In message “Re: String#unpack and null-terminated strings” > on 04/05/12, nobu.nokada@softhome.net nobu.nokada@softhome.net writes:

===============================================================

Hi Nobu,

Good to see your return, as always.

As the changes only affects ‘Z’-types in Strings with embedded null(s),
the impact should be extremely low.

I’m trying to think of any kind of string which might contain
significant nulls but also has a null as terminator.

I’ve seen some where null delimits fields and double-null terminates
but that rare case might be the only one to break iff a
programmer had decided that the best method to use on that type of string
was unpack(‘Z*’).

Embedded nulls are common when reading from binary files
(e.g. encoded characters) but I feel that it would never be a good idea
to strip trailing nulls in that context.

Voting +1 for inclusion in 1.8, also. Much more usable

Thanks,

daz

===============================================================

Topic		Replies	Views
Confusing string parsing problem ruby-talk	9	148	2 September 2009
'Z' and 'A' String#unpack question ruby-talk	1	117	11 April 2003
Can anyone tell me the computational logic of Unpack() method of string? ruby-talk	16	227	7 February 2013
Question about String.unpack ruby-talk	8	90	22 February 2006
Buffer to string ruby-talk	9	134	8 November 2007

String#unpack and null-terminated strings

=> [64, “String”, 64]

Related topics