Is there a standard pattern for threaded access to a file?

1) Where are you checking a return value:

threads.map{|t| t.join}

i'm not, but in a real piece of code longer than 5 lines it would be

In fact, you discard map's return value.

2) How is map's return value ever going to be different than your
threads array?

ah - 'join' should indeed be 'value' there. sorry.

basically one should use Thread.current.abort_on_exception, check the return values, or be prepared that threads may fail and you might no know about it (which is obviously ok sometimes)

a @ http://codeforpeople.com/

···

On Oct 13, 2007, at 11:43 AM, 7stud -- wrote:
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama

you win the golf for sure - here's something similar to what i've used in production code:

···

On Oct 13, 2007, at 4:35 AM, Robert Klemme wrote:

Here's my version with all the remarks incorporated.

require 'thread'

MAX_IN_QUEUE = 1024
NUM_THREADS = 5

queue = SizedQueue.new MAX_IN_QUEUE

threads = (1..NUM_THREADS).map do
  # we use the mechanism to pass the queue through
  # the constructor to avoid nasty effects of
  # variable "queue" changing
  Thread.new queue do |q|
    # we use the queue itself as terminator
    until q == (item = q.deq)
      begin
        # whatever processing
      rescue Exception => e
        # whatever error handling
      end
    end
  end
end

# read from files on the command line
ARGF.each do |line|
  queue.enq line
end

threads.each do |th|
  # send the terminator and wait
  queue.enq queue
  th.join
end

Have fun!

  robert

#
# a lines producer feeds chunks of lines to consuming threads. the producer
# itself does not slurp the potentially huge log file into memory at once,
# rather, it reads only 'bufsize' lines at a time. consumers process
# 'bufsize' lines of the file at a time where 'bufsize' means that the number
# of lines yielded to the block with be that big *at most*: near the end of a
# file it's possible that consumers will be given less that 'bufsize' lines to
# process
#

   Lines::Producer.new :path => __FILE__, :bufsize => 10 do
     consumer :bufsize => 2 do |lines|
       lines.each{|line| puts line}
     end

     consumer :bufsize => 3 do |lines|
       lines.each{|line| puts line}
     end
   end

#
# Lines module and Producer/Consumer classes
#
   BEGIN do
     require 'thread'

     module Lines
       class Error < ::StandardError
         class Starvation < Error; end
       end

       class Producer
         %w[ path bufsize ].each{|a| attr a}

         def initialize options = {}, &block
           @path = String options[:path]
           @bufsize = Integer options[:bufsize] || 1
           produce &block if block
         end

         def produce &block
           setup
           configure &block
           [ new_buffered_reader, new_buffered_writer ].each{|t| t.join}
           teardown
         end

         def setup
           @consumers =
           @sq = SizedQueue.new @bufsize
         end

         def configure &block
           instance_eval &block
         end

         def new_buffered_reader
           Thread.new do
             Thread.current.abort_on_exception = true
             open(@path){|fd| fd.each{|line| @sq.push line}}
             @sq.push(:eof)
           end
         end

         def new_buffered_writer
           Thread.new do
             Thread.current.abort_on_exception = true
             catch :eof do
               loop do
                 @consumers.each do |consumer|
                   chunk =
                   consumer.bufsize.times do
                     line = @sq.pop
                     throw :eof if line == :eof
                     chunk << line
                   end
                   consumer << chunk
                 end
               end
             end
             notify_all :eof
           end
         end

         def notify_all msg = :eof
           @consumers.each{|consumer| consumer << msg}
         end

         def teardown
           @consumers.map{|consumer| consumer.wait}
         end

         def consumer options = {}, &block
           @consumers << Consumer.new(self, options, &block)
         end

         class Consumer
           attr 'bufsize'

           def initialize producer, options = {}, &block
             @bufsize = Integer options[:bufsize]
             @producer = producer
             raise Error::Starvation unless @bufsize < @producer.bufsize
             @block = block
             @q = Queue.new
             @block = block
             @thread = new_thread
           end

           def << data
             @q.push data
           end

           def new_chunk
             Array.new bufsize
           end

           def new_thread
             Thread.new do
               Thread.current.abort_on_exception = true
               loop do
                 data = @q.pop
                 break if data == :eof
                 @block.call data
               end
             end
           end

           def wait
             @thread.value
           end
         end
       end
     end
   end

a @ http://codeforpeople.com/
--
share your knowledge. it's a way to achieve immortality.
h.h. the 14th dalai lama

Ok, I guess some posts get dropped in the ruby-talk -> comp.lang.ruby
transition occasionally. Probably the other direction too I suppose.

···

On Oct 13, 5:17 pm, Eric Hodel <drbr...@segment7.net> wrote:

On Oct 13, 2007, at 13:15 , Brian Adkins wrote:

> On Oct 13, 1:32 pm, Eric Hodel <drbr...@segment7.net> wrote:
>> On Oct 13, 2007, at 07:29 , Francis Cianfrocca wrote:

> Eric, are you reading/posting on comp.lang.ruby ? I don't see Francis'
> post, but both you and 7stud quoted him, so I'm wondering if it was
> aggregated from somewhere else.

I use the one, true ruby-talk, the ruby-t...@ruby-lang.org mailing list.

Eric I. schrieb:

That's a very nice solution. It demonstrates a lot of accumulatd
wisdom. I think I'd use a symbol in the queue, such as :end_of_data,
rather than the queue itself to mark the end of the data, if only to
avoid a "huh?" moment from those who read the code down the line.

I think there will be a big (and probably long) "huh?" moment when running the
code:

threads.each do |th|
  # send the terminator and wait
  queue.enq queue
  th.join
end

is more likely than not to never terminate. If another thread than th is eating
the terminator th.join will wait for a long time.

Eric

cheers

Simon

never ever think wisdom will protect you against threads :slight_smile: except if wisdom
tells you not to use them.

Yes, I've written about this in the past:

Here is the relevant header from the message you are discussing that shows why it wasn't gated:

Content-Type: multipart/alternative; boundary="----=_Part_28483_17627615.1192285743535"

James Edward Gray II

···

On Oct 13, 2007, at 5:45 PM, Brian Adkins wrote:

On Oct 13, 5:17 pm, Eric Hodel <drbr...@segment7.net> wrote:

On Oct 13, 2007, at 13:15 , Brian Adkins wrote:

On Oct 13, 1:32 pm, Eric Hodel <drbr...@segment7.net> wrote:

On Oct 13, 2007, at 07:29 , Francis Cianfrocca wrote:

Eric, are you reading/posting on comp.lang.ruby ? I don't see Francis'
post, but both you and 7stud quoted him, so I'm wondering if it was
aggregated from somewhere else.

I use the one, true ruby-talk, the ruby-t...@ruby-lang.org mailing list.

Ok, I guess some posts get dropped in the ruby-talk -> comp.lang.ruby
transition occasionally. Probably the other direction too I suppose.

ara.t.howard wrote:

1) Where are you checking a return value:

threads.map{|t| t.join}

i'm not, but in a real piece of code longer than 5 lines it would be

In fact, you discard map's return value.

2) How is map's return value ever going to be different than your
threads array?

ah - 'join' should indeed be 'value' there. sorry.

basically one should use Thread.current.abort_on_exception, check the
return values, or be prepared that threads may fail and you might no
know about it (which is obviously ok sometimes)

a @ http://codeforpeople.com/

Ok, so let me get this straight:

First you post a poor example that is needlessly complex for a
beginner--and that won't even work in the op's situation.

Then, when someone points out some flaws in your code, you claim that
the proposed improvements are faulty and that your original code is
superior.

Finally, when someone pointedly asked how it's possible your original
code does the things you claim it does, you refer to some imaginary
example that you would have posted.

···

On Oct 13, 2007, at 11:43 AM, 7stud -- wrote:

--
Posted via http://www.ruby-forum.com/\.

Of course you are right! Normally I use two loops here - dunno why I suddenly though this was a great idea. Thanks for catching that stupid error!

I also agree to Eric, that using something else is probably better because the code will be more readable. I just wanted to demonstrate the point to not use a "regular" queue element.

Kind regards

  robert

···

On 13.10.2007 17:52, Simon Kröger wrote:

Eric I. schrieb:

That's a very nice solution. It demonstrates a lot of accumulatd
wisdom. I think I'd use a symbol in the queue, such as :end_of_data,
rather than the queue itself to mark the end of the data, if only to
avoid a "huh?" moment from those who read the code down the line.

I think there will be a big (and probably long) "huh?" moment when running the
code:

threads.each do |th|
  # send the terminator and wait
  queue.enq queue
  th.join
end

is more likely than not to never terminate. If another thread than th is eating
the terminator th.join will wait for a long time.

I think there will be a big (and probably long) "huh?" moment when running the
code:

threads.each do |th|
  # send the terminator and wait
  queue.enq queue
  th.join
end

is more likely than not to never terminate. If another thread than th is eating
the terminator th.join will wait for a long time.

Of course! Good catch!

never ever think wisdom will protect you against threads :slight_smile: except if wisdom
tells you not to use them.

I don't think I was claiming that it would protect you, but it sure
helps.

Eric

···

On Oct 13, 11:52 am, Simon Kröger <SimonKroe...@gmx.de> wrote:

----

Are you interested in on-site Ruby training that uses well-designed,
real-world, hands-on exercises? http://LearnRuby.com

I just checked out your "What is the ruby-talk" gateway; I didn't realize
that the gateway currently dropped multipart/alternative. That's a shame.

Since I bear some responsibility for its evil popularity, I'll volunteer to
update that gateway code to extract the text-part out of the multipart if
you can send it to me...

I should point out, though, that (a) it's really not that hard (text/plain
is supposed to come first, so that even clients who didn't understand MIME
would display the right thing before displaying the wrong thing) and that
(b) SpamAssassin doesn't actually assign any points for HTML e-mail - or,
more accurately, it assigns zero points.

You say that "Some e-mails would be pretty non-trivial to handle
correctly", but I'd be curious to see examples of those; by definition,
multipart/alternative contains a number of equivalent parts, and as long as
one of those parts is text/plain, you only have to extract that part. That
was the whole point of sending multipart/alternative, rather than merely
sending text/html and forcing people to downconvert. If there are clients
that send multipart/alternative, but don't send a text/plain subpart,
they're missing the point.

···

On Sun, 14 Oct 2007 11:17:48 +0900, James Edward Gray II wrote:

Here is the relevant header from the message you are discussing that
shows why it wasn't gated:

Content-Type: multipart/alternative; boundary="----
=_Part_28483_17627615.1192285743535"

--
Jay Levitt |
Boston, MA | My character doesn't like it when they
Faster: jay at jay dot fm | cry or shout or hit.
http://www.jay.fm | - Kristoffer

Thanks for the info!

I wonder how many mailing list posters realize this. Do you have any
stats for the percentage of mailing list posts that don't make it to
comp.lang.ruby?

If it's common knowledge that one needs to post in text only, then I
don't mind letting the gateway act as a filter for those who can't
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I'd like to receive.

I personally much prefer usenet to mailing lists, so I'm reluctant to
switch to the mailing list for just this one group.

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

···

On Oct 13, 10:17 pm, James Edward Gray II <ja...@grayproductions.net> wrote:

On Oct 13, 2007, at 5:45 PM, Brian Adkins wrote:
> Ok, I guess some posts get dropped in the ruby-talk -> comp.lang.ruby
> transition occasionally. Probably the other direction too I suppose.

Yes, I've written about this in the past:

Gray Soft / Not Found

Here is the relevant header from the message you are discussing that
shows why it wasn't gated:

Content-Type: multipart/alternative; boundary="----
=_Part_28483_17627615.1192285743535"

I just checked out your "What is the ruby-talk" gateway; I didn't realize
that the gateway currently dropped multipart/alternative. That's a shame.

To be totally clear, our gateway doesn't drop them. They are forwarded to our Usenet host. Our host rejects them as invalid Usenet posts.

Since I bear some responsibility for its evil popularity, I'll volunteer to update that gateway code to extract the text-part out of the multipart if you can send it to me...

I have a rewrite in progress that uses TMail for message handling. On of my goals for this was to correctly separate the text portions of multipart/alternative. I've just been distracted with work deadlines and other short term projects, so I haven't completed it yet.

I should point out, though, that (a) it's really not that hard (text/plain is supposed to come first, so that even clients who didn't understand MIME would display the right thing before displaying the wrong thing)

I've seen some pretty crazy things in messages sent to Ruby Talk. One of those is multipart/alternative with no text/plain component. I don't think there's too much loss in not supporting such setups though.

and that
(b) SpamAssassin doesn't actually assign any points for HTML e-mail - or, more accurately, it assigns zero points.

My apologies. I thought for sure I had seen a reference to that sometime in the past, but I've been unable to dig it up this morning. I stand corrected.

James Edward Gray II

···

On Oct 13, 2007, at 9:35 PM, Jay Levitt wrote:

On Sun, 14 Oct 2007 11:17:48 +0900, James Edward Gray II wrote:

Do you have any stats for the percentage of mailing list posts that don't make it to comp.lang.ruby?

I just did a simple grep of the logs for a period of a little over the last month. It looks like we average about eight rejected messages a day (for an "HTML post" reason).

If it's common knowledge that one needs to post in text only, then I
don't mind letting the gateway act as a filter for those who can't
configure their mail client, but if the requirement is not widely
known, then I may be missing posts that I'd like to receive.

Well, I wrote a blog post about it and reference it whenever the discussion comes up. :wink:

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

Over the same time period, the gateway saw 1,071 posts from Usenet and 5,126 from the mailing list.

James Edward Gray II

···

On Oct 13, 2007, at 11:40 PM, Brian Adkins wrote:

Since I bear some responsibility for its evil popularity, I'll
volunteer to update that gateway code to extract the text-part out
of the multipart if you can send it to me...

Gray Soft / Not Found

Gray Soft / Not Found

Gray Soft / Not Found

I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I've just been distracted with work
deadlines and other short term projects, so I haven't completed it yet.

Sounds like it'd be more useful for me to help on the TMail version (since
that's what I'd end up using anyway).. is that code posted anywhere yet, or
would you be willing to send/post it? It's not in the Gateway topic, and
your search box is, uh, broken.

I've seen some pretty crazy things in messages sent to Ruby Talk.
One of those is multipart/alternative with no text/plain component.
I don't think there's too much loss in not supporting such setups
though.

Yeah, that's just totally broken. I mean, it's technically legal MIME, but
pointless.

> and that
> (b) SpamAssassin doesn't actually assign any points for HTML e-mail
> - or, more accurately, it assigns zero points.

My apologies. I thought for sure I had seen a reference to that
sometime in the past, but I've been unable to dig it up this
morning. I stand corrected.

IIRC correctly the rule used to have some points attached to it, but
somewhere along the way the mass-checks stopped determining it to be a
useful rule. That's often what happens with SA; the scores are all
determined with some fancy AI code I think.

···

On Mon, 15 Oct 2007 00:26:58 +0900, James Edward Gray II wrote:

--
Jay Levitt |
Boston, MA | My character doesn't like it when they
Faster: jay at jay dot fm | cry or shout or hit.
http://www.jay.fm | - Kristoffer

8 msgs/day * ~30 days / 5,126 is ~5%; that's a little higher than I
was hoping :frowning: I'm also surprised by the mailing list : usenet ratio.

···

On Oct 14, 11:52 am, James Edward Gray II <ja...@grayproductions.net> wrote:

On Oct 13, 2007, at 11:40 PM, Brian Adkins wrote:
> Do you have any stats for the percentage of mailing list posts that
> don't make it to comp.lang.ruby?

I just did a simple grep of the logs for a period of a little over
the last month. It looks like we average about eight rejected
messages a day (for an "HTML post" reason).

> Also, do you have stats for the percent of messages originating from
> usenet vs. from the mailing list?

Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.

The log I used actually went back to 9-6-2007, so it was closer to 38 days, but yeah. It was higher than I would have guessed too.

James Edward Gray II

···

On Oct 14, 2007, at 2:25 PM, Brian Adkins wrote:

On Oct 14, 11:52 am, James Edward Gray II <ja...@grayproductions.net> > wrote:

On Oct 13, 2007, at 11:40 PM, Brian Adkins wrote:

Do you have any stats for the percentage of mailing list posts that
don't make it to comp.lang.ruby?

I just did a simple grep of the logs for a period of a little over
the last month. It looks like we average about eight rejected
messages a day (for an "HTML post" reason).

Also, do you have stats for the percent of messages originating from
usenet vs. from the mailing list?

Over the same time period, the gateway saw 1,071 posts from Usenet
and 5,126 from the mailing list.

8 msgs/day * ~30 days / 5,126 is ~5%; that's a little higher than I
was hoping :frowning:

Since I bear some responsibility for its evil popularity, I'll
volunteer to update that gateway code to extract the text-part out
of the multipart if you can send it to me...

Gray Soft / Not Found

Gray Soft / Not Found

Gray Soft / Not Found

I have a rewrite in progress that uses TMail for message handling.
On of my goals for this was to correctly separate the text portions
of multipart/alternative. I've just been distracted with work
deadlines and other short term projects, so I haven't completed it yet.

Sounds like it'd be more useful for me to help on the TMail version (since that's what I'd end up using anyway).. is that code posted anywhere yet, or would you be willing to send/post it?

It's not yet online. I am happy to put it up, sure. I really need to get through two projects before I get to that though. Please give me a few weeks.

It's not in the Gateway topic, and your search box is, uh, broken.

It seems to work OK for me. Feel free to email me the details off-list and I'll sure try to fix it.

I've seen some pretty crazy things in messages sent to Ruby Talk.
One of those is multipart/alternative with no text/plain component.
I don't think there's too much loss in not supporting such setups
though.

Yeah, that's just totally broken. I mean, it's technically legal MIME, but pointless.

We see quite a few broken posts pass through the gateway in quite a few different ways. Welcome to the Internet. :wink:

James Edward Gray II

···

On Oct 14, 2007, at 11:35 AM, Jay Levitt wrote:

On Mon, 15 Oct 2007 00:26:58 +0900, James Edward Gray II wrote:

Sure. I'll e-mail you in a few weeks, and embed a Flash movie reminder
using ActiveX.

···

On Mon, 15 Oct 2007 08:22:38 +0900, James Edward Gray II wrote:

It's not yet online. I am happy to put it up, sure. I really need
to get through two projects before I get to that though. Please give
me a few weeks.

--
Jay Levitt |
Boston, MA | My character doesn't like it when they
Faster: jay at jay dot fm | cry or shout or hit.
http://www.jay.fm | - Kristoffer