I'm trying to upload a number of large files to s3. I want to do this
in parallel using threads, but it looks like the s3 gems for windows
aren't thread safe.
I then started looking into fork. Memory/process creation time
aren't a limiting factor.
···
****
So the question is: How can I create a thread like experience using
fork
(or something similar). I want it to run only a subset of the entire
program (ideally what's in the block) and I want to wait for it to
finish.
****
I tried win32/process#fork, but it runs the entire program twice. And
open("|-", "r") isn't supported on windows either.
Though, I wonder if this is an improvement over the serial uploading.
Unless the upload rate is limited to a fraction of the total upload
speed for each process wanting to upload data (either on the uploading
or the receiving end), two files uploaded in parallel will use half*
the bandwidth, and thus upload just as slow or fast as the two files
in series.
Additionally, the program complexity rises: instead of keeping tabs on
successful uploads of one file, it's now n uploads that have to be
monitored and redone on failure.
But since engineering challenges are fun, I'd do it like this (if I had to):
- Create an upload queue server containing all unfinished uploads.
- Spawn several worker processes that check the queue for the next
available upload (beware: race condition. A simple block-and-backoff
strategy would be sufficient to prevent that, I think), and mark their
assigned uploads.
- Once the upload is finished, the worker processes send the all
clear, and the upload gets removed from the queue.
- If a worker process cannot finish its upload for whatever reason,
the upload gets marked as unfinished again.
- If a worker finishes its upload, it marks the queue item for deletion.