I have some final results on the problem described.
First, I must correct something on my initial post; I had stated that
there was an occasional high delay between the moment of the 'select'
call and the 'select' return (ie, although the timeout set in the select
was 50msec, the delay could be of 5 seconds).
Actually, tracing across all calls in that section of code, the delay
occurs between the select return and the 'recvfrom'. The details of what
follows may be of interest to anyone using Ruby for fast communication.
Test environment:
- pure Ruby 1.9.2 (no gems, just the 'socket' library) on an ubuntu
machine (lots of memory)
- Ruby sends 4 Udp msgs per second to a micro-controller
- The micro (C/assembler) responds (Udp) within a 10-30 milliseconds
range
- So it is 4 msgs sent and 4 responses rcvd every second
This is what I saw since midnight in one of the systems (the symbol
'<->' means 1 msg sent and response; the symbol '!!' was inserted to
grep all abnormal results):
# Time as Hour:Min:Sec:Msec; the 'delay_sel_rcv' (the time between
return of 'select' and 'recvfrom') value is in Seconds
# log from midnight; all perfect until 1:21 am
01:21:19:914: <->: !! delay_sel_rcv=10.006525661
01:21:29:928: <->: !! delay_sel_rcv=10.010217133
01:21:39:937: <->: !! delay_sel_rcv=10.004327574
01:21:49:954: <->: !! delay_sel_rcv=10.011541082
01:21:59:972: <->: !! delay_sel_rcv=10.005877574
01:22:05:973: <->: !! delay_sel_rcv= 5.998151639
# then all ok unti:
02:22:27:374: <->: !! delay_sel_rcv=10.008022384
02:22:37:394: <->: !! delay_sel_rcv=10.008430684
02:22:47:401: <->: !! delay_sel_rcv=10.004019076
02:22:57:409: <->: !! delay_sel_rcv=10.005836859
02:23:07:580: <->: !! delay_sel_rcv=10.008476556
02:23:17:610: <->: !! delay_sel_rcv=10.007506338
02:23:27:642: <->: !! delay_sel_rcv=10.007311141
02:23:37:655: <->: !! delay_sel_rcv=10.008225368
02:23:47:685: <->: !! delay_sel_rcv=10.018187389
# then all ok until
04:24:08:873: <->: !! delay_sel_rcv=10.006355125
...
We can see from the above:
- the first 80 minutes (from midnight to 01:21) went fine
- then we see several delays of 10 seconds, in the same minute (each 10
seconds apart from the other)
- for 1 hour all was pefect again, exchanging some 12,000 messages with
perfect timing
- then we have 9 delays of 10 seconds (again separated by 10 seconds)
- for 1 hour all went fine again; then the cycle repeats
This pattern can only indicate (in my view) the garbage collector, which
Ruby seems to run for 10 seconds several times in the same minute or so.
I could not put the calls to GC.disable/enable (to have the final
proof), around the select/recvfrom (not to interfere with a real
experiment that was moving heavy equipment). Notice that, if it is the
GC, disabling/enabling it will only shift the problem from one area of
the communication handler to another (and thus having a similar impact
on the applications using the comm handler).
Interestingly, this problem does not happen within 1 computer; I used
the identical Ruby program but replacing the Firmware with a Ruby
simulator (with same machines, same Udp and the same binary strings
exchanged); in a test of 10 hours, I only saw occasional "delays" betwen
select and recvfrom, but in the order of 100 milliseconds, and never of
5/10 seconds.
This would seem to indicate an inefficiency in the Udp stack (when used
across computers).
My conclusion is that if you want a predictable delay (with values
spread across a 'tight bell' curve, not just increasing the timeout to
cope with 'everything'), you must use (for that section of the software)
a compiled language; at least until the technology of garbage collector
changes.
I hope that this is useful to others who use Ruby for high speed
communication (and the ones working on garbage collectors).
···
--
Last note: one year ago I met in a party a JPL engineer working on the
Mars exploration program; he was admirative of Ruby, but after some
jokes on the expressivity and beauty in old and new languages, he added
that they would never use scripting languages because "we don't want the
garbage collector to enter in action just when we should to begin to
slow down the spacecraft near Mars and miss the landing! in fact, we
don't even use C++, as we did not find it totally predictable; so we
will still use C for years to come".
I never knew how well I would learn to appreciate his point
Raul Parolari
raulparolari@gmail.com
--
Posted via http://www.ruby-forum.com/.