Problem: do threads always run for at least TIME_QUANTUM_USEC before termination, on Linux?


(Saverio Miroddi) #1

I have a GUI project (Tk), which makes use of threads for performing
background filesystem searches based on a filename pattern.

My system is a 4 cores, on recent Linux; the interpreter is MRI 2.3.

At any given time, there should be at most one search background
thread, so the requests for new searches cause termination of the one
currently executing.

The search code is properly structured: it's a simple loop through
directories, that exits either when the directories are exhausted, or
when a termination flag is passed (by the project scheduler) to the
thread.

The problem I've observed is that search threads take at least 100ms
before terminating, so that when searches come in in quick succession,
they start to pile up.

I've had a look at the source code, and I have the suspicion that this
is related to `TIME_QUANTUM_USEC`. In the 2.1 changelog, `When running
on uniprocessor systems, every th.kill needs TIME_QUANTUM_USEC time`
is mentioned (I'm not sure if "processor" refers to CPU or core).

I've tried a very simplified version of the project, and on
JRuby/Rubinius the time slice is significantly smaller (at least one
order of magnitude smaller).

Is this behavior expected, or I'm unintentionally causing it? In the
latter case, which conditions could potentially cause it?

Thanks,
Saverio


(Eric Wong) #2

I have a GUI project (Tk), which makes use of threads for performing
background filesystem searches based on a filename pattern.

My system is a 4 cores, on recent Linux; the interpreter is MRI 2.3.

At any given time, there should be at most one search background
thread, so the requests for new searches cause termination of the one
currently executing.

The search code is properly structured: it's a simple loop through
directories, that exits either when the directories are exhausted, or
when a termination flag is passed (by the project scheduler) to the
thread.

The problem I've observed is that search threads take at least 100ms
before terminating, so that when searches come in in quick succession,
they start to pile up.

Is it 100% necessary to join the threads you're killing? And do
you need to kill threads, instead of reusing them with Queue or
similar?

I've had a look at the source code, and I have the suspicion that this
is related to `TIME_QUANTUM_USEC`. In the 2.1 changelog, `When running
on uniprocessor systems, every th.kill needs TIME_QUANTUM_USEC time`
is mentioned (I'm not sure if "processor" refers to CPU or core).

Right, this seems to be a problem if you have 3 or more threads
running. If you have only 2 threads (one doing work, the other
doing th.kill+th.join), it's OK.

In other words, I think your other threads are hogging up time
from the background thread you want to kill, as well as the
thread joining the to-be-killed background thread.

You can try lowering thread priorities to shorten timeslices,
but I'm not sure how effective it'd be...

I've tried a very simplified version of the project, and on
JRuby/Rubinius the time slice is significantly smaller (at least one
order of magnitude smaller).

They don't have GVL; totally different threading implementations.

Is this behavior expected, or I'm unintentionally causing it? In the
latter case, which conditions could potentially cause it?

Expected, I think so :< Ideal? No...

GVL is round-robin, and we cannot control runqueue order easily
because it's in the kernel. Killed threads should probably jump
to the head of the queue...

···

Sav Erio <saverio.pub2@gmail.com> wrote:


(Eric Wong) #3

You need to unsubscribe yourself, nobody can do it for you. See:

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>

···

Sarkar Chinmoy <chinmoy_sarkar@yahoo.com> wrote:

Please DO NOT send any more e-mails. I want to UNSUBSCRIBE


(Saverio Miroddi) #4

Hello Eric!

Thanks for your help. Following your advice, I think changing the

design to a single search thread will solve this latency problem. I
will just structure the search thread, in the case when it receives
a “new search” message, to :break its current loop and restart the
loop with the new input, rather than having the search scheduler
sending a “stop” message to the current search thread and creating a
new thread.

For the sake of clarity, I wasn't joining the killed threads (and I

wasn’t killing them either, if you refer to kill()); in the case
when a new search was requested while an old one was running, I was
sending a “stop” message to the existing search thread (which,
internally, would just :break the search loop and terminate
normally) and starting a new one at the same time. This design
probably makes sense on VMs other than the MRI… but not on the MRI
:slight_smile:

One thing that I wonder, out of curiosity, is if the 100ms still

makes sense. The comment about this value says it was tested against
a very old Linux kernel (it reports 2.6 as “recent”); for Windows
kernels is actually 10 ms. I couldn’t find in the history a detailed
rationale for the choice of the value(s).

I've microbenchmarked different values, and smaller values do take a

toll, however only in the cases of many concurrent threads (dozens),
and the loss is a single-digit percentage. However, I’m no systems
developer :slight_smile:

Thanks!

Z
···

On 16.07.2018 12:32, Eric Wong wrote:

Sav Erio wrote:

I have a GUI project (Tk), which makes use of threads for performing
background filesystem searches based on a filename pattern.
My system is a 4 cores, on recent Linux; the interpreter is MRI 2.3.
At any given time, there should be at most one search background
thread, so the requests for new searches cause termination of the one
currently executing.
The search code is properly structured: it's a simple loop through
directories, that exits either when the directories are exhausted, or
when a termination flag is passed (by the project scheduler) to the
thread.
The problem I've observed is that search threads take at least 100ms
before terminating, so that when searches come in in quick succession,
they start to pile up.
Is it 100% necessary to join the threads you're killing? And do
you need to kill threads, instead of reusing them with Queue or
similar?
I've had a look at the source code, and I have the suspicion that this
is related to `TIME_QUANTUM_USEC`. In the 2.1 changelog, `When running
on uniprocessor systems, every th.kill needs TIME_QUANTUM_USEC time`
is mentioned (I'm not sure if "processor" refers to CPU or core).

Right, this seems to be a problem if you have 3 or more threads
running. If you have only 2 threads (one doing work, the other
doing th.kill+th.join), it's OK.
In other words, I think your other threads are hogging up time
from the background thread you want to kill, as well as the
thread joining the to-be-killed background thread.
You can try lowering thread priorities to shorten timeslices,
but I'm not sure how effective it'd be...
I've tried a very simplified version of the project, and on
JRuby/Rubinius the time slice is significantly smaller (at least one
order of magnitude smaller).
They don't have GVL; totally different threading implementations.
Is this behavior expected, or I'm unintentionally causing it? In the
latter case, which conditions could potentially cause it?

Expected, I think so :< Ideal? No...
GVL is round-robin, and we cannot control runqueue order easily
because it's in the kernel. Killed threads should probably jump
to the head of the queue...
Unsubscribe:

saverio.pub2@gmail.commailto:ruby-talk-request@ruby-lang.org?subject=unsubscribehttp://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk