[ANN] EventLoop 0.0.20050825.1600

Hi list,

Due to the somewhat popular demand of an event loop for Ruby,
I've recently been working on packaging the one I've written
for a network application of mine (Refusde, an NMDC client).

With the help of Tilman Sauerbeck, I've now managed to put
together some documentation and a gem/tarball of what I've
decided is going to be the first publicly announced version.

Here's the canonical short package overview:

   EventLoop is a simple IO::select-based main event loop
   featuring IO event notification and timeout callbacks.
   It comes with a signal system inspired by that of GLib.

The code is licensed under the GPL and can be found at
<http://www.brockman.se/software/ruby-event-loop/>.

At this point, some of you will probably want to see an
example of how it works --- a kind of screenshot.

For this purpose, I chose to implement a simple asynchronous
buffered IO reader:

   require "event-loop"

   class BufferedReader
     include SignalEmitter

     define_signals :line, :done

     def initialize(io, eol="\n")
       yield self if block_given?
       io = File.new(io) if io.kind_of? String
       buffer = String.new
       io.on_readable do
         begin
           buffer << io.readpartial(1024)
           while i = buffer.index(eol)
             signal :line, buffer.slice!(0, i)
             buffer.slice!(0, eol.size)
           end
         rescue EOFError
           signal :done, buffer
           io.close
         end
       end
     end
   end

   reader = BufferedReader.new("/etc/passwd") do |r|
     r.on_line { |content| puts "Line: #{content}" }
     r.on_done { |leftover| puts "Done: #{leftover}" }
     r.on_done { EventLoop.quit }
   end

   EventLoop.run

See how easy the event loop is to use, and how nicely it
blends into the rest of Ruby?

For good measure, maybe I should also attach a section of
the manual (i.e., the README file) that describes how event
loops fit into the rest of the world:

The Event Loop

···

==============

This section explains how IO multiplexing works in general
(albeit briefly and not very in-depth), and specifically the
issues relevant for Ruby applications. You may safely skip
it if you (a) already know this subject, or (b) don't care.

Plain ol' blocking IO works well when you're reading from
just a single file descriptor. But when you're interested
in a whole bunch of FDs, you can't wait for any single one
of them to become readable or writable, because then you'll
inevitably miss that happening to the other ones. Instead,
you need a multiplexer that can wait for them *all at once*.

There are a handful of low-level multiplexing primitives:
‘select’, ‘poll’, ‘epoll’, ‘/dev/poll’, and ‘kqueue’.
In addition, there are portable low-level wrapper libraries
such as libevent, which can use any of those primitives.
The event loop in this package uses the standard ‘select’
wrapper shipped with Ruby, ‘IO::select’. But in the future,
I'd like to use libevent instead, because that'd be cooler.

Most applications use a higher-level abstraction built on
top of the low-level multiplexer, usually called a ‘main
loop’, an ‘event loop’, or an ‘event source’. There are
also libraries such as liboop, which generalizes the event
source and event sink concepts, so that components (event
sinks) written against liboop become event-source-agnostic.

Actually, the combination of blocking IO and Ruby's green
threads works well in most cases where you would normally
use an event loop. When you call ‘IO#read’ on an empty file
descriptor, for instance, Ruby suspends that thread until
its internal event loop, known as the scheduler (currently
based on ‘select’), determines that the file descriptor has
become readable. In particular, Ruby never calls the
low-level ‘read’ function unless it knows that it will not
block (because ‘select’ said it wouldn't, but see below).

There are several reasons why you would use an event loop
such as the one implemented by this library instead of
not-so-plain ol' blocking IO with Ruby's green threads.

First of all, you may consider the event loop API more
pleasant than Ruby's threads and not-quite-blocking IO.
Otherwise, don't listen to me; go on using the latter. :slight_smile:

Blocking IO can occasionally cause unexpected problems.
For example, in some cases a blocking read *can* block even
though select said that the file descriptor was readable.
This problem may be rare (it can happen, for instance, when
the checksum of a piece of data fails to match the payload),
but the bottom line is that non-blocking IO is safer.

Perhaps most importantly, while Ruby's threads are green,
they are still effectively preemptively scheduled, with all
the implications thereof — in a word, synchronization hell.
By contrast, event handlers are executed in a strictly
sequential manner; an event loop will never run two event
handlers simultaneously. (Though, of course, all bets are
off if you run multiple event loops in separate threads.)

--
Daniel Brockman <daniel@brockman.se>

Daniel Brockman wrote:

First of all, you may consider the event loop API more
pleasant than Ruby's threads and not-quite-blocking IO.
Otherwise, don't listen to me; go on using the latter. :slight_smile:

I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.

Blocking IO can occasionally cause unexpected problems.
For example, in some cases a blocking read *can* block even
though select said that the file descriptor was readable.
This problem may be rare (it can happen, for instance, when
the checksum of a piece of data fails to match the payload),
but the bottom line is that non-blocking IO is safer.

Well, at least in recent linux 2.4 and 2.6 kernels, this particular
problem is fixed (see the recent ruby-talk discussion entitled "event
driven framework for ruby", particularly comments by Akira Tanaka and
Ralf Horstmann).

Perhaps most importantly, while Ruby's threads are green,
they are still effectively preemptively scheduled, with all
the implications thereof — in a word, synchronization hell.
By contrast, event handlers are executed in a strictly
sequential manner; an event loop will never run two event
handlers simultaneously. (Though, of course, all bets are
off if you run multiple event loops in separate threads.)

But in some cases you *want* preemptive scheduling. One handler's
execution shouldn't block the others, if it might take a significant
time to finish. Take care of synchronization with concurrent data
structures like queues, or if that isn't sufficient, lower level
mechanisms like mutexes.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

In article <430DFCD4.6000405@path.berkeley.edu>,
  Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.

File#flock works well since Ruby 1.8.2. It blocks only the calling
thread. It doesn't block other threads.

% ruby-1.8.2 -ve '
f1 = open("z", "w")
f1.flock(File::LOCK_EX)
t = Thread.new {
  f2 = open("z", "w")
  p :f2_lock_start
  f2.flock(File::LOCK_EX)
  p :f2_lock_end
}
3.times {|i| p i; sleep 1 }
f1.flock(File::LOCK_UN)
t.join
'
ruby 1.8.2 (2004-12-25) [i686-linux]
:f2_lock_start
0
1
2
:f2_lock_end

···

--
Tanaka Akira

Joel VanderWerf <vjoel@path.berkeley.edu> writes:

Blocking IO can occasionally cause unexpected problems.
For example, in some cases a blocking read *can* block even
though select said that the file descriptor was readable.
This problem may be rare (it can happen, for instance, when
the checksum of a piece of data fails to match the payload),
but the bottom line is that non-blocking IO is safer.

Well, at least in recent linux 2.4 and 2.6 kernels, this
particular problem is fixed (see the recent ruby-talk
discussion entitled "event driven framework for ruby",
particularly comments by Akira Tanaka and Ralf Horstmann).

Hmm, okay, then I guess I shall have to remove or annotate
that paragraph to avoid spreading FUD.

Perhaps most importantly, while Ruby's threads are green,
they are still effectively preemptively scheduled, with all
the implications thereof — in a word, synchronization hell.
By contrast, event handlers are executed in a strictly
sequential manner; an event loop will never run two event
handlers simultaneously. (Though, of course, all bets are
off if you run multiple event loops in separate threads.)

But in some cases you *want* preemptive scheduling.

Sure. If I want preemptive scheduling, I use threads.
Sometimes, I don't even have a choice. (Ruby's GNU Readline
wrapper only supports blocking calls, for instance.)

One handler's execution shouldn't block the others, if it
might take a significant time to finish.

Event handlers should not take a significant amount of time
to finish. If they do, you have coded them wrong. :slight_smile:

Take care of synchronization with concurrent data
structures like queues, or if that isn't sufficient, lower
level mechanisms like mutexes.

Or use a deterministic event loop and avoid the problem of
synchronization altogether.

In a callback-based system, you have to deal with callbacks.
In a preemptively multithreaded system, you have to deal
with synchronization. It's a tradeoff, and largely a matter
of taste, preference and familiarity.

You might also ask yourself, do you really *need* to have
the scheduler arbitrarily switch contexts back and forth?
Do your event handlers really take that much time to run?
If so, fine. Otherwise, why not have determinism instead?

···

--
Daniel Brockman <daniel@brockman.se>

Well, at least in recent linux 2.4 and 2.6 kernels, this particular
problem is fixed (see the recent ruby-talk discussion entitled "event
driven framework for ruby", particularly comments by Akira Tanaka and
Ralf Horstmann).

I got the impression from that thread that it pertained only to 2.6... ?
It's really fixed in 2.4 too?

Thanks,

Bill

···

From: "Joel VanderWerf" <vjoel@path.berkeley.edu>

Bill Kelly wrote:

From: "Joel VanderWerf" <vjoel@path.berkeley.edu>

Well, at least in recent linux 2.4 and 2.6 kernels, this particular
problem is fixed (see the recent ruby-talk discussion entitled "event
driven framework for ruby", particularly comments by Akira Tanaka and
Ralf Horstmann).

I got the impression from that thread that it pertained only to 2.6... ?
It's really fixed in 2.4 too?

Apparently:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/151776

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Tanaka Akira wrote:

In article <430DFCD4.6000405@path.berkeley.edu>,
  Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.

File#flock works well since Ruby 1.8.2. It blocks only the calling
thread. It doesn't block other threads.

Wow! Thanks.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Daniel Brockman wrote:

Or use a deterministic event loop and avoid the problem of
synchronization altogether.

In a callback-based system, you have to deal with callbacks.
In a preemptively multithreaded system, you have to deal
with synchronization. It's a tradeoff, and largely a matter
of taste, preference and familiarity.

You might also ask yourself, do you really *need* to have
the scheduler arbitrarily switch contexts back and forth?
Do your event handlers really take that much time to run?
If so, fine. Otherwise, why not have determinism instead?

That's a good point.

I do like what using threads does to the architecture of my program.
It's very easy to separate all the functionality out into components,
each of which performs a specific task, has a ThreadGroup to manage its
own threads, and communicates with other components by queues. The
components can be tested idependently and even executed in other
processes/hosts, if you replace Queue with something based on Sockets
and Marshal, or DRb.

So I guess another consideration in making this tradeoff is the degree
to which the system as a whole can be decoupled.

If, for example, the handlers are making atomic updates to some
monolithic data structure, or to a GUI, then decoupling doesn't make
sense: the overhead to make the updates atomic would be too high.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

In article <87k6i9rifs.fsf@wigwam.deepwood.net>,
  Daniel Brockman <daniel@brockman.se> writes:

In a callback-based system, you have to deal with callbacks.
In a preemptively multithreaded system, you have to deal
with synchronization. It's a tradeoff, and largely a matter
of taste, preference and familiarity.

It seems that a giant lock can be some compromise of them.
(like GIL of Python)

Apart from that, Ruby's IO methods are not so good for event loop.
You may have frustration when you find that some methods block even if
O_NONBLOCK is set.

The blocking behavior is good for threaded programs. The context
switch behind the blocking is enough to do some works because the
works are held by other threads. So the blocking behavior makes
threaded programs happy even if O_NONBLOCK is set. Anyway O_NONBLOCK
is required to avoid entire process blocking on write operation.

However the behavior is bad for event loop style programs. Because
the works are held by the event loop in the caller's thread.

So I think it is good to have both blocking methods and nonblocking
methods. The nonblocking methods should make event loop style
programs happy. However it is not accepted by matz because good names
for nonblocking methods are not found yet. Recently I proposed
connect_nonblock, nonblock_connect, nbconnect for nonblocking connect
but they are rejected.

···

--
Tanaka Akira

Daniel Brockman <daniel@brockman.se> writes:

You might also ask yourself, do you really *need* to have
the scheduler arbitrarily switch contexts back and forth?
Do your event handlers really take that much time to run?
If so, fine. Otherwise, why not have determinism instead?

To nitpick, neither pre-emptive threading nor cooperative threading
(of which explicit event handling loop is a form of) has anything to
do with determinism.

It is what is being executed in that thread that determines whether it
is deterministic or not.

YS.

Joel VanderWerf wrote:

Tanaka Akira wrote:

In article <430DFCD4.6000405@path.berkeley.edu>,
Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.

File#flock works well since Ruby 1.8.2. It blocks only the calling
thread. It doesn't block other threads.

Wow! Thanks.

I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.

(*) FSDB

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Hi,

So I think it is good to have both blocking methods and nonblocking
methods. The nonblocking methods should make event loop style
programs happy. However it is not accepted by matz because good names
for nonblocking methods are not found yet. Recently I proposed
connect_nonblock, nonblock_connect, nbconnect for nonblocking connect
but they are rejected.

Wow - the main thing holding back progress on this front is
method names? I could embrace connect_nonblock or nbconnect,
there.

The blocking I/O issues are the thorniest problem for me
writing applications in ruby. (Of course, it's 1000 times
worse on Windows, ... where nonblocking I/O is apparently not
supported at all yet. That is just a nightmare.)

But regarding the method names - I'm wondering - are separate
methods really needed? Are there any cases where Ruby can't
just inspect the fcntl() flags of the socket, and if
O_NONBLOCK is set, provide nonblocking behavior? You mentioned
connect(), which is an instance method. Couldn't connect()
just check for O_NONBLOCK? Why would a separate method be
needed? (Sorry if this is a FAQ. :slight_smile:

Regards,

Bill

···

From: "Tanaka Akira" <akr@m17n.org>

Yohanes Santoso <ysantoso-rubytalk@dessyku.is-a-geek.org> writes:

Daniel Brockman <daniel@brockman.se> writes:

You might also ask yourself, do you really *need* to have
the scheduler arbitrarily switch contexts back and forth?
Do your event handlers really take that much time to run?
If so, fine. Otherwise, why not have determinism instead?

To nitpick, neither pre-emptive threading nor cooperative
threading (of which explicit event handling loop is a form
of) has anything to do with determinism.

To nitpick back, I think you overstated that claim a bit.
Cooperatively threaded systems are deterministic by default;
pre-emptively scheduled ones are probablistic by default.

If you write a multithreaded program without keeping
synchronization in mind, it is likely to still end up
essentially deterministic under cooperative threading.
If you are using pre-emptive threading, however, you are
very likely to introduce race conditions.

So what I'm saying here is that while I agree that the
determinism of a correctly written program does not depend
fundamentally on the kind of threading in use, I must object
to the claim that ``[neither threading model] has anything
to do with determinism.''

In a cooperatively multithreaded program, control progresses
linearly through the source --- every line of code will be
executed immediately after the previous one has finished.
In a pre-emptively scheduled one, on the other hand, control
jumps around probablistically. Determinism is clearly
relevant here, IMHO.

But I see your point. I did sort of imply that pre-emptive
threading leads to non-determinism, which might not be the
fairest way of putting it. Sorry about that.

It is what is being executed in that thread that
determines whether it is deterministic or not.

I agree. It's just you don't have to put anything fancy in
cooperative threads to make them deterministic, because they
already are by default. Unless you put `rand' everywhere.

···

--
Daniel Brockman <daniel@brockman.se>

Joel VanderWerf wrote:

Joel VanderWerf wrote:

Tanaka Akira wrote:

In article <430DFCD4.6000405@path.berkeley.edu>,
Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.

File#flock works well since Ruby 1.8.2. It blocks only the calling
thread. It doesn't block other threads.

Wow! Thanks.

I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.

(*) FSDB

In other news, 1989 called. They want their version numbering system back.

Please give us a sane version number. :slight_smile:

Dan

In article <430E404E.6070303@path.berkeley.edu>,
  Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.

Ruby does the polling. You may call it a hack.

···

--
Tanaka Akira

In article <033601c5a9fa$26ba7840$6442a8c0@musicbox>,
  "Bill Kelly" <billk@cts.com> writes:

Wow - the main thing holding back progress on this front is
method names? I could embrace connect_nonblock or nbconnect,
there.

Do you have a problem with threads?

If you use threads, nonblocking methods are not required in general.

I'd like to know why people doesn't use threads.

The blocking I/O issues are the thorniest problem for me
writing applications in ruby. (Of course, it's 1000 times
worse on Windows, ... where nonblocking I/O is apparently not
supported at all yet. That is just a nightmare.)

I heard Windows has nonblocking I/O for sockets.

But regarding the method names - I'm wondering - are separate
methods really needed? Are there any cases where Ruby can't
just inspect the fcntl() flags of the socket, and if
O_NONBLOCK is set, provide nonblocking behavior? You mentioned
connect(), which is an instance method. Couldn't connect()
just check for O_NONBLOCK? Why would a separate method be
needed? (Sorry if this is a FAQ. :slight_smile:

1. The threaded programs needs blocking methods for a IO object with
  O_NONBLOCK. O_NONBLOCK is required to avoid enteire process
  blocking by write operations. But the threaded programs still
  needs blocking behavior because most threaded programs doesn't
  expects EAGAIN. I think nonblocking methods are better than
  implementing EAGAIN retry loop for all threaded programs.

2. There is no F_GETFL on Windows.
  Ruby cannot test O_NONBLOCK is set/clear on a fd. So connect
  method cannot check O_NONBLOCK.

···

--
Tanaka Akira

Daniel Berger wrote:

Joel VanderWerf wrote:

...

I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.

(*) FSDB

In other news, 1989 called. They want their version numbering system back.

Please give us a sane version number. :slight_smile:

Dan

I'm still living in 1989 in many ways....

It's not quite a three-year old project yet, so I don't think it
deserves 1.0 status :wink:

Or do you mean more digits? (Internally, it is 0.5.5, but, for a minor
project like this, I only release the last in each 0.x series.)

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Daniel Berger wrote:

Please give us a sane version number. :slight_smile:

It just dawned on me that you were probably talking about

  EventLoop 0.0.20050825.1600

That version number _is_ a bit ambiguous. It really should have "UTC" in
it somewhere to make clear that the 1600 is not in the poster's local
time zone. :wink:

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Tanaka Akira wrote:

In article <430E404E.6070303@path.berkeley.edu>,
  Joel VanderWerf <vjoel@path.berkeley.edu> writes:

I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.

Ruby does the polling. You may call it a hack.

Oh, well as long as ruby does it, it's more efficient than me doing it,
so less of a hack.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Windows actually has plenty of support for nonblocking operations on
sockets and files.
Here's an example hit from MSDN:

The problem is that it doesn't work exactly the way it does on Linux
and BSD, and so "direct port" software like Ruby tends to not have
particularly good support for it. I believe the ActiveState Perl and
Python distributions give you some good hooks into it, but I've never
really gone to that level with Ruby, so I won't spread any bad
information.

--Wilson.

···

On 8/26/05, Tanaka Akira <akr@m17n.org> wrote:

In article <033601c5a9fa$26ba7840$6442a8c0@musicbox>,
  "Bill Kelly" <billk@cts.com> writes:

> Wow - the main thing holding back progress on this front is
> method names? I could embrace connect_nonblock or nbconnect,
> there.

Do you have a problem with threads?

If you use threads, nonblocking methods are not required in general.

I'd like to know why people doesn't use threads.

> The blocking I/O issues are the thorniest problem for me
> writing applications in ruby. (Of course, it's 1000 times
> worse on Windows, ... where nonblocking I/O is apparently not
> supported at all yet. That is just a nightmare.)

I heard Windows has nonblocking I/O for sockets.