Killing sons (Linux)

Maybe this isn't strictly a Ruby question, but I hope someone here can
help:

I have a job-management application, with a central daemon which
receives job requests. Upon receiving this request, it forks and then
runs "system" to run bash, which in turn runs the Matlab job. I use bash
for this in order to redirect the input and output from Matlab. pstree
output looks like this:

  init-+-apache2---8*[apache2]
       >-atd
      ...
       >-ruby-+-4*[ruby---bash---MATLAB-+-matlab_helper]
       > > `-15*[{MATLAB}]]
       > `-{ruby}
      ...

  Legend:
   daemon ^ ^ daemon fork

Now, my system also allows a 'kill' command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I'm keeping is of the daemon fork. Killing it
doesn't kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

···

--
Posted via http://www.ruby-forum.com/.

Make your parent process the leader of its own process group with
setpgid(0,0). When you fork, add each child to the parent's process group
with setpgid(0, getppid()). If you fork subchildren, make sure they get
added to the same process group. Now, to send a signal to the whole group,
send it to (0 - pid), where pid is that of the parent. If you want them all
to die without killing the leader, use a signal whose default behavior is
terminate-process and ignore it in the parent.

···

On 5/28/07, Ohad Lutzky <lutzky@gmail.com> wrote:

Now, my system also allows a 'kill' command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I'm keeping is of the daemon fork. Killing it
doesn't kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

Francis Cianfrocca wrote:

Now, my system also allows a 'kill' command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I'm keeping is of the daemon fork. Killing it
doesn't kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

Make your parent process the leader of its own process group with
setpgid(0,0). When you fork, add each child to the parent's process
group
with setpgid(0, getppid()). If you fork subchildren, make sure they get
added to the same process group. Now, to send a signal to the whole
group,
send it to (0 - pid), where pid is that of the parent. If you want them
all
to die without killing the leader, use a signal whose default behavior
is
terminate-process and ignore it in the parent.

Just to be sure - if I run the following Ruby code on a Linux system:

  child = fork do
    Process::setpgid 0,0
    system 'bash -c "sleep 300"'
  end
  Process::kill 9, -child

Then I am guaranteed that no child bash, sleep or ruby process will
remain? It works, I just want to be sure I can count on that behaviour.
For contrast, in my original code, bash gets reparented to init:

  child = fork do
    system 'bash -c "sleep 300"'
  end
  Process::kill 9, child

And this code doesn't even work (ESRCH: No such process)

  child = fork do
    system 'bash -c "sleep 300"'
  end
  Process::kill 9, -child

Thank you for your help!

···

On 5/28/07, Ohad Lutzky <lutzky@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

With signals, nothing is ever "guaranteed."

Why are you using signal 9 instead of something more system-friendly? 9
should be your last resort when all else fails, because it doesn't give your
processes a chance to exit cleanly.

I have the uncomfortable feeling that your code is working by accident,
because the subprocesses which the "system" call creates aren't explicitly
added to the process group of the fork child. If you run this program in a
shell, it will probably work, because of the process-group semantics defined
by most shells. However, if you run it as a headless daemon or from a cron
job, it may not work. Try it and see.

When I have to do what you're trying to do, I usually avoid calling system.
Instead I call fork, and in the child I call setpgid(0, getppid()), and then
exec.

···

On 5/28/07, Ohad Lutzky <lutzky@gmail.com> wrote:

Just to be sure - if I run the following Ruby code on a Linux system:

  child = fork do
    Process::setpgid 0,0
    system 'bash -c "sleep 300"'
  end
  Process::kill 9, -child

Then I am guaranteed that no child bash, sleep or ruby process will
remain? It works, I just want to be sure I can count on that behaviour.

I've learned how to do redirection from within ruby
($stdwhatever.reopen), and switched to using exec instead of system - so
now I avoid two levels of depth. I also use setpgid and kill the whole
group for good measure.

Much thanks for your help!

···

--
Posted via http://www.ruby-forum.com/.