Threads, Sockets and OSX


(Peter Hickman) #1

I have a little app that is basically the standard threaded TCPServer with
a couple of mutexes to implement a fifo queue. When it runs it goes flat
out for 15-20 seconds and then freezes for 10-15 seconds. Everything
resumes just fine but it's a bit annoying and occasionally the freeze lasts
long enough to timeout connections to the server

My first thought was the GC was kicking in and so I have been trying to
find ways to reduce the amount of GC that needs to be done as well as
calling it when I have created or deleted structures so that it does not
accumulate a pile of work and kick in unexpectedly

This has reduced the number of times that the timeout causing freezes have
happened but not affected the other freezes and has also reduced the
throughput

This is on a MacMini with 4 CPUs and 8 GB ram

After much fiddling with data structures and the like I just decided to run
this on a Debian box with 4 CPUs and 4 GB of ram

No freezes whatsoever! Not even a little one. Throughput jumps from 500-700
puts/gets per second to 1,000 to 1,500

Where is the problem with OSX
1) Threads
2) Sockets
3) Mutexes

Anyone seen something like this before?


(Nicola Mingotti) #2

Hi Peter,

I am writing an application that must run on a Linux server where i can't install what i want.
I am developing it under FreeBSD, my machine, where i am free to install stuff.

I observed important differences in the performance (and bugs) of the app running on the server with an old Ruby
and on my machine with a recent Ruby.

So, the first thing you may check is: are the two Ruby version the same ?
Preinstalled versions of Ruby tend to be VERY old.

After a while, to ensure the same behaviour on different OSes I moved the app to JRuby.

Now it is all very consistent, I am happy with that.

You may try JRuby as well, it is trivial to get it working provided you have
a recent (1.8) version of Java installed. You don't need root.

bye
n.

···

On 12/5/18 11:33 AM, Peter Hickman wrote:

I have a little app that is basically the standard threaded TCPServer with a couple of mutexes to implement a fifo queue. When it runs it goes flat out for 15-20 seconds and then freezes for 10-15 seconds. Everything resumes just fine but it's a bit annoying and occasionally the freeze lasts long enough to timeout connections to the server

My first thought was the GC was kicking in and so I have been trying to find ways to reduce the amount of GC that needs to be done as well as calling it when I have created or deleted structures so that it does not accumulate a pile of work and kick in unexpectedly

This has reduced the number of times that the timeout causing freezes have happened but not affected the other freezes and has also reduced the throughput

This is on a MacMini with 4 CPUs and 8 GB ram

After much fiddling with data structures and the like I just decided to run this on a Debian box with 4 CPUs and 4 GB of ram

No freezes whatsoever! Not even a little one. Throughput jumps from 500-700 puts/gets per second to 1,000 to 1,500

Where is the problem with OSX
1) Threads
2) Sockets
3) Mutexes

Anyone seen something like this before?

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>


(Eric Wong) #3

My first thought was the GC was kicking in and so I have been trying to
find ways to reduce the amount of GC that needs to be done as well as
calling it when I have created or deleted structures so that it does not
accumulate a pile of work and kick in unexpectedly

This has reduced the number of times that the timeout causing freezes have
happened but not affected the other freezes and has also reduced the
throughput

OK, so it is at least partially memory-management related.

Can you try the OSX build with jemalloc?

And what Ruby versions are you running?

Ruby 2.5 and 2.6 also have a new Mutex implementation which is
less OS-dependent, smaller and faster (at least on Linux):

  https://bugs.ruby-lang.org/issues/13517

  (I just fixed a bug in it, earlier :x
   https://bugs.ruby-lang.org/issues/15383)

Where is the problem with OSX
1) Threads
2) Sockets
3) Mutexes

I'm not familiar with profiling tools for OSX, but I'm sure
there are some which can help you find the problem.

AFAIK, nobody working on ruby-core can fix problems in OSX itself.

In Ruby itself, GNU/Linux and OSX share much of the code in all
those areas. Use process of elimination to isolate the
performance of each element.

And with just about any autoconf program, you don't need root
to install C Ruby; use "./configure --prefix=$HOME/somewhere ..."

I guess stuff like RVM/rbenv is also popular, too, but I don't
use them.

···

Peter Hickman <peterhickman386@googlemail.com> wrote:


(Peter Hickman) #4

The Debian machine is running 2.3.3 and on OSX it's 2.4.2. Both reasonably
modern. On OSX the rubies are handled by rvm so I'll try downgrading OSX to
2.3.3 and see if it behaves the same

I could also try 2.5 on OSX and see if that fixes things too


(Peter Hickman) #5

Using RVM on OSX I've tested this for 2.3.3 and 2.5.1 with no significant
change in the behaviour. I did note that under 2.5.1 the freeze ups were
shorter and the periods of work were longer but the picket fence CPU usage
remains (100% for 15 seconds, 0% for 10 seconds, repeat)

I'm wondering, does Ruby implement it's own threading system or call out to
the underlying OS services. This could be down to how OSX schedules stuff
rather than a Ruby version issue

Perhaps I need to implement this in Python and see if I get the same
behaviour


(Eric Wong) #6

Using RVM on OSX I've tested this for 2.3.3 and 2.5.1 with no significant
change in the behaviour. I did note that under 2.5.1 the freeze ups were
shorter and the periods of work were longer but the picket fence CPU usage
remains (100% for 15 seconds, 0% for 10 seconds, repeat)

So that might be down to the Mutex implementation being less OS-dependent
in 2.5.

Care to give 2.6 preview3 a try?

I'm wondering, does Ruby implement it's own threading system or call out to
the underlying OS services. This could be down to how OSX schedules stuff
rather than a Ruby version issue

Ruby uses native threads and locking mechanisms; but the GVL
limits their ability of threads to run in parallel to only a few
safepoints. The GVL is a custom lock designed for
high-contention. It's implemented using normal pthreads
primitives, but optimized for contention, whereas normal
pthreads mutexes are optimized for low contention.

Perhaps I need to implement this in Python and see if I get the same
behaviour

AFAIK, Python's GVL is/was similar to ours, but had a fixed timeslice
with more wakeups and higher power consumption as a result.

···

Peter Hickman <peterhickman386@googlemail.com> wrote:


(Peter Hickman) #7

I'm compiling 2.6 preview3 as we speak. I will give it a tryout tonight and
report back tomorrow


(Nicola Mingotti) #8

In JRuby threads are Java threads.
They should behave as you expect out of the box.

Give it a shot if you have time. It is worth it IMO.
It will take you say 10-15 minutes to make a test, you don't need
to change the Ruby code (as far as your code is not dependent on "C gems").

JR is not silver bullet for everything, eg. it takes a while to start
so it is not ideal for shell programs you want to run by hand
and read immediately the result back.

bye
n.

···

On 12/6/18 1:16 AM, Peter Hickman wrote:

Using RVM on OSX I've tested this for 2.3.3 and 2.5.1 with no significant change in the behaviour. I did note that under 2.5.1 the freeze ups were shorter and the periods of work were longer but the picket fence CPU usage remains (100% for 15 seconds, 0% for 10 seconds, repeat)

I'm wondering, does Ruby implement it's own threading system or call out to the underlying OS services. This could be down to how OSX schedules stuff rather than a Ruby version issue

Perhaps I need to implement this in Python and see if I get the same behaviour


(Peter Hickman) #9

Well the test with 2.6preview3 went ahead. It was similar to the other
versions but had it's own issues. It would run 100% cpu for 20 seconds, 50%
cpu for 20 second, repeated 4 to 5 times and then, 100% cpu for 15 seconds,
0% cpu for 25 seconds, repeated 4 to 5 times and then back to the start.
Overall the throughput had dropped and the end to end test took much longer

I've tried to eliminate any other processes that could interfere with the
test on my mac but a mac is not something you have that level of control
over (like you do with linux)

I have a few other things to try though


(Tim Hamilton) #10

I’m new here and hopefully don’t send you down a rabbit hole. Are you
confident it’s not a hardware or os issue? Can you verify or recreate issue
by doing similar task with other code? Just a thought.

Best
Tim

···

On Fri, Dec 7, 2018 at 3:53 AM Peter Hickman <peterhickman386@googlemail.com> wrote:

Well the test with 2.6preview3 went ahead. It was similar to the other
versions but had it's own issues. It would run 100% cpu for 20 seconds, 50%
cpu for 20 second, repeated 4 to 5 times and then, 100% cpu for 15 seconds,
0% cpu for 25 seconds, repeated 4 to 5 times and then back to the start.
Overall the throughput had dropped and the end to end test took much longer

I've tried to eliminate any other processes that could interfere with the
test on my mac but a mac is not something you have that level of control
over (like you do with linux)

I have a few other things to try though

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>


(Peter Hickman) #11

I have a second Mac of lower spec and slightly older version of OSX (2 CPUs
vs 4 CPUs, 10.13.6 vs 10.14.1), I'll check that out ...

It does the 100% usage followed by 0% usage that my main computer does but
the 100% usage can last minutes rather than seconds and the 0% usage runs
between 5 and 15 seconds. Overall it completes the test faster because it
spends less time at 0% usage

One possible difference is that the main computer has a normal disk and the
older machine has an SSD. I've been wondering if it's some OSX background
task that is kicking and and slowing things down. If it was disk bound then
the SSD would get past that bottleneck much faster (not to mention far
fewer files on my second mac)

To test this thesis I would need to install an SSD in my main computer,
which is very, very tempting :slight_smile:


(Walter Lee Davis) #12

I have a second Mac of lower spec and slightly older version of OSX (2 CPUs vs 4 CPUs, 10.13.6 vs 10.14.1), I'll check that out ...

It does the 100% usage followed by 0% usage that my main computer does but the 100% usage can last minutes rather than seconds and the 0% usage runs between 5 and 15 seconds. Overall it completes the test faster because it spends less time at 0% usage

One possible difference is that the main computer has a normal disk and the older machine has an SSD. I've been wondering if it's some OSX background task that is kicking and and slowing things down. If it was disk bound then the SSD would get past that bottleneck much faster (not to mention far fewer files on my second mac)

To test this thesis I would need to install an SSD in my main computer, which is very, very tempting :slight_smile:

Don’t think about this at all. I have a 11 year old Mac Pro with an SSD and 32GB RAM as my “daily driver”, and aside from not being able to update to the latest OS, the performance is exemplary.

Walter

···

On Dec 7, 2018, at 9:04 AM, Peter Hickman <peterhickman386@googlemail.com> wrote:

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>


(Eric Wong) #13

I have a second Mac of lower spec and slightly older version of OSX (2 CPUs
vs 4 CPUs, 10.13.6 vs 10.14.1), I'll check that out ...

It does the 100% usage followed by 0% usage that my main computer does but
the 100% usage can last minutes rather than seconds and the 0% usage runs
between 5 and 15 seconds. Overall it completes the test faster because it
spends less time at 0% usage

Have you tried any profiling tools which can break down where the CPU usage
is coming from (and where the 0% usage is sleeping)?

I'm not familiar with OSX at all, but things like "perf" on
Linux or even iostat or looking at the right columns in "top"
can be helpful.

One possible difference is that the main computer has a normal disk and the
older machine has an SSD. I've been wondering if it's some OSX background
task that is kicking and and slowing things down. If it was disk bound then
the SSD would get past that bottleneck much faster (not to mention far
fewer files on my second mac)

Something like "top" would be useful for checking background activity

To test this thesis I would need to install an SSD in my main computer,
which is very, very tempting :slight_smile:

And do you know how disk/file-system intensive your application is?
I know Linux does filesystem caching pretty aggressively compared
to other OSes.

···

Peter Hickman <peterhickman386@googlemail.com> wrote:


(Peter Hickman) #14

The application itself is entirely memory bound. Being a test nothing is
being recorded, just thrown at the screen

I use htop for the moment by moment cpu utilisation and to keep an eye on
memory and swap usage

I used the cpu history graph from Activity Monitor to show the trends. When
the cpu hits 0% it is system wide, nothing of significance is running. So
it's not like something else was hogging the cpu at that time. It is 0% on
the whole system

So what it needs is something that does not have significant cpu usage but
can cause processes to wait. Disk activity could do that but my knowledge
of OSX internals to know what else might be the cause


(Peter Hickman) #15

Just as a follow up. For a test I booted the Mac off a usb version of Mint
Linux 19 and ran the tests again. The installed version of ruby was
2.5.1p57. The problem disappeared and the tests ran end to end without any
noticeable freezing. However the CPU history graph never really hit more
than 75% usage but the test ran much much faster. A whole order of
magnitude faster (from 300-500 TPS to 1500-2500 TPS)

In some respects it wasn't a fair test as Mint was not installed on the
hard drive and running entirely in memory

The second test was on a new Mac Mini (12 cores, 64gb ram and an 1T ssd)
The performance was even worse. 5-10 seconds of work that made a small blip
on the CPU history graph and the 30-40 seconds of freezing

At this point I'm going to say this is a OSX or OSX and Ruby issue more
than purely a Ruby issue

A rewrite in Python might clarify things further

I am sorely tempted to boot the new Mac Mini with Mint and see show that
works out :slight_smile: