Better way to read data from IO into packets?

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

  [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>, <crc:16>]

I have a thread that reads the data from an IO object:

  @receiver_thread = Thread.new do
    Thread.abort_on_exception = true
    loop do
      begin
        char = @io.readchar
        add_char_to_packet(char) if char
      rescue EOFError
        Thread.pass # there is currently nothing to read
      end
    end

and a state machine, that decodes the format:

    def add_char_to_packet(char)
      @state = :first_checksum if (@state == 0)
      case @state
      when :first_startbyte
        @data = ""
        @state = ((char == STARTBYTES[0]) ? :second_startbyte :
:first_startbyte)
      when :second_startbyte
        @state = (char == STARTBYTES[1]) ? :type :
          # special case: first startbyte is repeated
          (char == STARTBYTES[0] ? :second_startbyte : :first_startbyte)
      when :type
        @type = TYPE.invert[char]
        @state = :counter
      when :counter
        @counter = char
        @state = :length
      when :length
        @length = char
        @state = @length
      when Integer
        @data << char
        @state -= 1
      when :first_checksum
        @checksum = (char << 8)
        @state = :second_checksum
     [...]

This works, but the code is ugly and also a little slow because I have
to process each byte seperately. Is there a better way?

Thank you,
Levin

Levin Alexander wrote:

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

  [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>, <crc:16>]

Why not read enough bytes to make sure you get the length byte:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Or is the problem that there may be a variable number of "start bytes"?
In that case, maybe you could tell from the first 5 bytes how many start
bytes there are, and then read enough to capture the length byte.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf wrote:

Levin Alexander wrote:

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

  [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>,
<crc:16>]

Why not read enough bytes to make sure you get the length byte:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Why do you use sysread? I'd prefer to use #read in this case - I don't
see any advantage of resorting to sysread here - it may even prevent read
buffering => things get slower than necessary.

Or is the problem that there may be a variable number of "start
bytes"? In that case, maybe you could tell from the first 5 bytes how
many start bytes there are, and then read enough to capture the
length byte.

Hm... doesn't seem to be the case.

    robert

> I have a small program, that reads data from a serial port and chops
> it into packets. A Packet has the following format:
>
> [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>, <crc:16>]

Why not read enough bytes to make sure you get the length byte:

Because the application may be started in the middle of a packet or
the stream may be corrupted due to transmission errors.

But you are right, I should optimize that and only read single bytes
if I need to resynchronize.

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Or is the problem that there may be a variable number of "start bytes"?
In that case, maybe you could tell from the first 5 bytes how many start
bytes there are, and then read enough to capture the length byte.

Hmm, maybe I can use regular expressions to check for the correct format:

  buffer = "bad data" << [0x65,0xEB,0,4,65,66,67,68,00,00].pack("C*")
  buffer.scan( /
    \x65\xEB # startbytes
    (.) # type
    (.) # length-byte
    (.*) # data
    (..) # checksum
  /x )

I would need a way to discard old or bad data from the buffer,
probably need to think about it more

(btw: is there a way to use the length-byte in the regular expression
itself? Something like /(.)(.{\1})/)

Thank you,
Levin

···

On 12/18/05, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

Robert Klemme wrote:

Joel VanderWerf wrote:

Levin Alexander wrote:

Hi,

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

[0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>,
<crc:16>]

Why not read enough bytes to make sure you get the length byte:

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Why do you use sysread? I'd prefer to use #read in this case - I don't
see any advantage of resorting to sysread here - it may even prevent read
buffering => things get slower than necessary.

You're right. I was thinking about readchar, which the op used and I assumed to be unbuffered, but I don't even know if it is!

Or is the problem that there may be a variable number of "start
bytes"? In that case, maybe you could tell from the first 5 bytes how
many start bytes there are, and then read enough to capture the
length byte.

Hm... doesn't seem to be the case.

But there is some logic in the op's code that looks for repeated "start byte". I'm just not sure what the limit is.

···

--
        vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Levin Alexander wrote:

I have a small program, that reads data from a serial port and chops
it into packets. A Packet has the following format:

  [0x65, 0xEB, <type:8>, <counter:8>, <length:8>, data:length>,
<crc:16>]

Why not read enough bytes to make sure you get the length byte:

Because the application may be started in the middle of a packet or
the stream may be corrupted due to transmission errors.

But you are right, I should optimize that and only read single bytes
if I need to resynchronize.

s = io.sysread(5)
len = s[4]
s << io.sysread(len+2)
ary = s.unpack "C5a#{len}n"

Or is the problem that there may be a variable number of "start
bytes"? In that case, maybe you could tell from the first 5 bytes
how many start bytes there are, and then read enough to capture the
length byte.

Hmm, maybe I can use regular expressions to check for the correct
format:

  buffer = "bad data" << [0x65,0xEB,0,4,65,66,67,68,00,00].pack("C*")
  buffer.scan( /
    \x65\xEB # startbytes
    (.) # type
    (.) # length-byte
    (.*) # data
    (..) # checksum
  /x )

I would need a way to discard old or bad data from the buffer,
probably need to think about it more

Why not just use a regexp to verify the initial sequence and use their
offsets. Or do something like

buffer.gsub!(/\A.*?(\x65\xEB)/, '\\1')

(btw: is there a way to use the length-byte in the regular expression
itself? Something like /(.)(.{\1})/)

No.

Kind regards

    robert

···

On 12/18/05, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

A packet always starts with the two bytes "\x65\xEB", everything else
resets the state machine.

The special case in the code was needed to correcly handle
"\x65\x65\xEB" (one bad character before valid start of packet) --
the state machine needs to always look for "\xEB" after "\x65"

-Levin

···

On 12/19/05, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:

But there is some logic in the op's code that looks for repeated "start
byte". I'm just not sure what the limit is.