Seven new VMs, all in a row

> I think the "wal-mart argument" is quite an important one.

I'm not sure exactly what the "wal-mart argument" is. Wal-Mart can be
seen as a big U.S. conglomerate that moves into a town and drives all
the mom-and-pop stores out of business, eventually drying up downtown
business districts. Or it can be seen as a big discount retailer that
provides cheap imported goods using its massive warehousing and
distribution networks, while undercutting domestic manufacturers.

I was alluding to someone's dictum that even Wal-Mart are selling multiprocessor systems nowadays, with the innuendo that hardware is growing more and more trashy every day :slight_smile: . The philosophy of increasing processing power by simply increasing clock speeds seems to have reached its limits. Any further advancement will require an intelligent combination of hard- and software. One which even Wal-Mart customers can handle :wink: .

Obviously, I'm not a big Wal-Mart fan, but maybe their brutal retail
success strategy has lessons for Ruby? :slight_smile:

Neither am I, but still I like your expression "undercutting domestic manufacturers" (= low-level programmers)!

> Apart from explicitly creating threads, it would be nice if
> the Ruby system could be taught to automatically recognize
> parallelizable code and optimally distribute it across a
> multiprocessor system -- implicitly. That would be a big
> advange for high-level programming in general! I do not know
> the state of the art in this, I only remember that the
> Atari/Inmos guys failed do do this in Occam, back in the 1980s.
> Do you think there is a serious chance to get such a thing working?

The only programming environment I'm familiar with where somebody
implemented automatic parallel optimization is Fortran (although I'm
sure there are others). Fortran's branching and memory models are
constrained enough to allow for some clever analysis. Loops where each
iteration has no impact on the next can be discovered and converted into
short-term fine-grained parallel execution. In that case, the original
code has no concept of threading, it just runs faster during the inner
loops.

None of that would carry over to a thread-aware language with a dynamic
type system.

Do you really think so?
Fortran has a pretty simple enumerative loop which can be optimized for parallelization, provided your compiler is smart enough.
Higher-level languages, by contrast, contain (or at least may contain) structures such as MAP which tell the compiler/interpreter: "This refers to an entire block of data" and may be extended to: "So distribute the workload as you think fit." This would not even require any analysis but just a fistful of code in the part of the compiler that handles the respective statement.
Of course, it might break backward compatibility as it does away with the tacit assumption that the iterations are executed in any guaranteed order...

Ilmari Heikkinen wrote:
> Re: green threads vs native threads, if a green threads implementation
> is 30 times faster, that's like having 29 extra cpus, no?

"You can't get blood from a stone." The only thing that is "like having
29 extra cpus" is actually having 29 extra CPUs. :slight_smile:

You can, if and only if ("iff") the net workload of the thread is negligible when compared to the administrative overhead, and in this case the entire affair is probably not very demanding in terms of cpu power, so there is little point in using a multiprocessor system. On the other hand, if your threads have to do really heavy-duty work, there is no real gain in optimizing the thread model, so what's the buzz...
For bioinformatics work, I'd opt for the 29 extra cpus.

-- Ruediger Marcus

···

Am Montag, 11. April 2005 17:02 schrieb ruby-talk-admin@ruby-lang.org:

flaig@sanctacaris.net wrote:

--
Chevalier Dr Dr Ruediger Marcus Flaig
   Institute for Immunology
   University of Heidelberg
   INF 305, D-69121 Heidelberg

"Drain you of your sanity,
Face the Thing That Should Not Be."

--
Diese E-Mail wurde mit http://www.mail-inspector.de verschickt
Mail Inspector ist ein kostenloser Service von http://www.is-fun.net
Der Absender dieser E-Mail hatte die IP: 129.206.124.135

Parallel programming, both local and distributed, is one of the great
research topics of this decade. Some good google keywords - orca,
amoeba, erlang. For things more complicated than a do-loop, it still
takes a human to break the problem into message-passing or an
equivalent. There are plenty of extensions to C/Fortran/whatever to
help in these things.

···

On 4/12/05, flaig@sanctacaris.net <flaig@sanctacaris.net> wrote:

Am Montag, 11. April 2005 17:02 schrieb ruby-talk-admin@ruby-lang.org:
> flaig@sanctacaris.net wrote:

> > Apart from explicitly creating threads, it would be nice if
> > the Ruby system could be taught to automatically recognize
> > parallelizable code and optimally distribute it across a
> > multiprocessor system -- implicitly. That would be a big
> > advange for high-level programming in general! I do not know
> > the state of the art in this, I only remember that the
> > Atari/Inmos guys failed do do this in Occam, back in the 1980s.
> > Do you think there is a serious chance to get such a thing working?
>
> The only programming environment I'm familiar with where somebody
> implemented automatic parallel optimization is Fortran (although I'm
> sure there are others). ÂFortran's branching and memory models are
> constrained enough to allow for some clever analysis. ÂLoops where each
> iteration has no impact on the next can be discovered and converted into
> short-term fine-grained parallel execution. ÂIn that case, the original
> code has no concept of threading, it just runs faster during the inner
> loops.
>
> None of that would carry over to a thread-aware language with a dynamic
> type system.

Do you really think so?
Fortran has a pretty simple enumerative loop which can be optimized for parallelization, provided your compiler is smart enough.
Higher-level languages, by contrast, contain (or at least may contain) structures such as MAP which tell the compiler/interpreter: "This refers to an entire block of data" and may be extended to: "So distribute the workload as you think fit." This would not even require any analysis but just a fistful of code in the part of the compiler that handles the respective statement.
Of course, it might break backward compatibility as it does away with the tacit assumption that the iterations are executed in any guaranteed order...

--
spooq

flaig@sanctacaris.net wrote:

Higher-level languages, by contrast, contain (or at least may contain)
structures such as MAP which tell the compiler/interpreter: "This
refers to an entire block of data" and may be extended to: "So distribute
the workload as you think fit." This would not even require any analysis
but just a fistful of code in the part of the compiler that handles the
respective statement.

Would that it were this simple. A "map" function operates on a list, but what are the potential relationships between members of any list? In most cases it's very hard to be sure. And if you can't prove mathematically, at compile (or even execution) time, that there are zero interactions between the list members, then your compiler had best not inject any threaded code.

"You can't get blood from a stone." The only thing that is "like
having 29 extra cpus" is actually having 29 extra CPUs. :slight_smile:

You can, if and only if ("iff") the net workload of the thread is
negligible when compared to the administrative overhead, ...

I read this as saying, "if your threading overhead is so rediculous that it consumes 29/30ths of your CPU time," then you are better off not using threads." This is certainly true, but it misses the point. Good threaded software design minimizes threading overhead, thus a (potential) 30x speedup indicates a poor design, not a poor threading system.

For bioinformatics work, I'd opt for the 29 extra cpus.

Of course, but can you get 30x work out of them? I suspect that in bioinformatics, many of the "big" problems are easily isolated into non-communicating work units. I'm running the World Community Grid client, which is currently cranking on the Human Proteome Folding Project. My computer, along with thousands more, handles its job with relatively little coordination from the central work server at IBM. For this system, threading would be a waste.

But, there are other not purely compute-bound problems that require more intricate and fine-grained scheduling to achieve decent scaling.

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;