Solaris porting problem -- flock failure?

Any solaris gurus out there?

I’m having trouble porting some multi-thread, multi-process code from
linux to solaris. I’ve already dealt with (or tried to deal with) some
differences in flock (solaris flock is based on fcntl locks), like the
fact that closing a file releases locks on the file held by other threads.

I’ve managed to isolate the problem in a fairly simple test program. It’s at

http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb

The program creates /tmp/test-file-lock.dat, which holds a marshalled
fixnum starting at 0. Then it creates Np processes each with Nt threads
which do a random sequence of reads and writes using some locking
methods. The writes just increment the counter.

When a process is done, it writes the number of times it incremented the
counter to the file /tmp/test-file-lock.dat#{pid}. Then the main process
adds these up and compares with the contents of the counter file. The
point of this is to test for colliding writers.

But the program fails before that final test–it seems to be having a
collision between a reader and a writer that causes the reader to see a
corrupt file.

A typical run fails like this. The counter 0…3 is a seconds clock:

$ ruby solaris-bug.rb
0
1
2
3
solaris-bug.rb:128:in `load’: marshal data too short (ArgumentError)

It looks like there are a reader and a writer accessing the file at the
same time, and the writer has just truncated the file (line 137) when
the reader tries to read it.

This happens:

  • on solaris, quad cpu

    • ruby 1.7.3 (2002-10-30) [sparc-solaris2.7]
  • not on single processor linux

    • ruby 1.7.3 (2002-12-12) [i686-linux]
  • not on dual SMP linux

    • ruby 1.6.7 (2002-03-01) [i686-linux]

Also, the bug requires both of:

  • thread_count >= 2

  • process_count >= 2

Also, the bug requires that there be both reader and writer operations
(i.e., that the random number lead to each branch often enough, say 50/50).

Any solaris gurus out there?

any chance your home directory, or where ever you are running, is nfs mounted?

-a

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

I’m having trouble porting some multi-thread, multi-process code from
linux to solaris. I’ve already dealt with (or tried to deal with) some
differences in flock (solaris flock is based on fcntl locks), like the
fact that closing a file releases locks on the file held by other threads.

I’ve managed to isolate the problem in a fairly simple test program. It’s at

http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb

The program creates /tmp/test-file-lock.dat, which holds a marshalled
fixnum starting at 0. Then it creates Np processes each with Nt threads
which do a random sequence of reads and writes using some locking
methods. The writes just increment the counter.

When a process is done, it writes the number of times it incremented the
counter to the file /tmp/test-file-lock.dat#{pid}. Then the main process
adds these up and compares with the contents of the counter file. The
point of this is to test for colliding writers.

But the program fails before that final test–it seems to be having a
collision between a reader and a writer that causes the reader to see a
corrupt file.

A typical run fails like this. The counter 0…3 is a seconds clock:

$ ruby solaris-bug.rb
0
1
2
3
solaris-bug.rb:128:in `load’: marshal data too short (ArgumentError)

It looks like there are a reader and a writer accessing the file at the
same time, and the writer has just truncated the file (line 137) when
the reader tries to read it.

This happens:

  • on solaris, quad cpu

    • ruby 1.7.3 (2002-10-30) [sparc-solaris2.7]
  • not on single processor linux

    • ruby 1.7.3 (2002-12-12) [i686-linux]
  • not on dual SMP linux

    • ruby 1.6.7 (2002-03-01) [i686-linux]

Also, the bug requires both of:

  • thread_count >= 2

  • process_count >= 2

Also, the bug requires that there be both reader and writer operations
(i.e., that the random number lead to each branch often enough, say 50/50).

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

> I'm having trouble porting some multi-thread, multi-process code from > linux to solaris.

does your solaris os have this disclaimer?

man flock :

NOTES
Use of these interfaces should be restricted to only appli- cations written
on BSD platforms. Use of these interfaces with any of the system libraries
or in multi-thread applica- tions is unsupported.

-a

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

> I'm having trouble porting some multi-thread, multi-process code from > linux to solaris. I've already dealt with (or tried to deal with) some > differences in flock (solaris flock is based on fcntl locks), like the > fact that closing a file releases locks on the file held by other threads.

you may want to try something like

f.sync = true

because neither fflush nor close are required to flush data to disk - only to
kernel space so perhaps the solaris os is buffing too well and you need to
force the data to disk before the next read.

also, and i may be wrong on this, it looks like the implementation of
rb_file_flock should not cause the thread to block, but should send it into a
busy wait - are you sure that failing to aquire the lock blocked all
threads?

-a

ps. i have to agree with guy that this program is a long way from being
fairly simply

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

from http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb

ugly hack because waiting for a lock in a Ruby thread blocks the process

this is the problem i think. if you look at the file.c implementation of
rb_file_flock you’ll notice that it used to do something very similar at the
C level - but that this has been removed

file.c:1531 #if defined(EWOULDBLOCK) && 0

which you’ve essentially re-implemented in ruby.

i’m guessing someone smarter than us realized this was not safe. thus, it
appears that the correct behavior for flock is block the entire process and
that threads should not be using flock at all. i do not understand the exact
reason for this, but searching google for flock/fcntl/thread will bring up a
plethora of problems. i think you can be farily certain that the behavior of
flock from inside a threaded program will be ill-defined across OSs.

as to a fix : keeping in mind that file locks are only (usually) advisory you
don’t really buy anything using flock over a mutex since any other process can
chose to clobber the file anyhow - the lock will not prevent this. this is
troublesome since you also want to fork… the only way i can think of doing
this is to have each thread ask it’s parent process to lock the file in it’s
stead (in a critical section), rather than attempting to do so itsef. even
this might not be safe and you might have to resort to some sort of IPC to
ensure single writer semantics. the sysvipc module from the raa seems to be
unreachable right now. does anyone have a copy?

-a

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

I’m having trouble porting some multi-thread, multi-process code from
linux to solaris. I’ve already dealt with (or tried to deal with) some
differences in flock (solaris flock is based on fcntl locks), like the
fact that closing a file releases locks on the file held by other threads.

I’ve managed to isolate the problem in a fairly simple test program. It’s at

http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb

here is a similar program, which works on both linux :

/usr/home/howardat/eg/ruby > uname -srpm
Linux 2.2.14-5.0 i686 unknown
/usr/home/howardat/eg/ruby > ruby -v
ruby 1.6.5 (2001-10-22) [i686-linux]

and solaris

uname -srpm
SunOS 5.8 sun4u sparc
ruby -v
ruby 1.6.6 (2001-12-26) [sparc-solaris2.8]

the concept of the program is that mutiple processes spawn multiple threads,
each of which attempt to gain and exclusive lock on a file in order to mark a
time stamp. when all process have waited for all threads, and all child
process themselves have been waited for, the file is read back in and sorted.
if any entries are found to be out of order - there was an error.

espcially note the need to retry (catch/throw) in the event of interupted
system calls. this is not needed on linux, but it happens alot on solaris.
i do not know why. calling “system ‘sync’” is absolutely essentially
otherwise kernel buffers will still have data in them when the file is read
back in (yes even with “file.sync = true”!). i can’t believe how painfull this
was to do… if anyone knows any short-cuts i’d be more than happy to hear
them. also, if anyone could run this on their platform that’d be great

file : feeding_frenzy.rb

----CUT----

#!/usr/bin/env ruby

require ‘thread’
require ‘ftools’

n_process = (ARGV.shift or 2).to_i
n_thread = (ARGV.shift or 2).to_i
n_mark = (ARGV.shift or 2 << 9).to_i
out = (ARGV.shift or ‘out’)

File.rm_f out
pids =

n_process.times {

pid =
  Process.fork {
sem = Mutex.new
File.open (out, 'a+') { |file|
  file.sync = true
  n_thread.times {
    t =
      Thread.new {
	(n_mark / n_thread).times {
	  sem.synchronize {
	    catch (:flock_eintr) {
	      begin
		while ((file.flock File::LOCK_EX | File::LOCK_NB)) do
		  Thread.pass
		end
	      rescue
		puts 'FLOCK EINTR'
		throw :flock_eintr
	      end
	    }
	    catch (:syswrite_eintr) {
	      begin
		file.syswrite "%10.5f\n" % Time.now.to_f
	      rescue
		puts 'SYSWRITE EINTR'
		throw :syswrite_eintr
	      end
	    }
	    catch (:flock_eintr) {
	      begin
		file.flock File::LOCK_UN
	      rescue
		puts 'FLOCK EINTR'
		throw :flock_eintr
	      end
	    }
	  }
	}
      }
    t.join
  }
}
  }

pids << pid

}

pids.each { |pid|
Process.waitpid pid, Process::WNOHANG
}

system ‘sync’

output = IO.readlines out
sorted = output.sort {|a,b| a.to_f <=> b.to_f}

if output != sorted
puts ‘ERROR!’
File.open ‘sorted’, ‘w’ do |f|
f.puts sorted
end
end

----CUT----

-ara

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

any chance your home directory, or where ever you are running, is nfs mounted?

no,

does your solaris os have this disclaimer?

ruby use fcntl()

Well, one problem is here :

I've managed to isolate the problem in a fairly simple test program.

                                            ^^^^^^^^^^^^^^^^^^

I'll not call a 'simple program' a program with 200 lines of ruby
code (multi-process and multi-threaded) :-)))

Guy Decoux

also, and i may be wrong on this, it looks like the implementation of
rb_file_flock should not cause the thread to block, but should send it into a
busy wait - are you sure that failing to aquire the lock blocked _all_
threads?

Well perhaps you have not seen this

1531 #if defined(EWOULDBLOCK) && 0

                                ^^^^^

Guy Decoux

ahoward wrote:

any chance your home directory, or where ever you are running, is nfs mounted?

The data files are in /tmp, which is local.

ahoward wrote:

you may want to try something like

f.sync = true

It doesn’t affect the current problem, but I’ll file that one away in
case I see a situation where it appears one process has stale data…

ahoward wrote:

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

I’m having trouble porting some multi-thread, multi-process code from
linux to solaris.

does your solaris os have this disclaimer?

man flock :

NOTES
Use of these interfaces should be restricted to only appli- cations written
on BSD platforms. Use of these interfaces with any of the system libraries
or in multi-thread applica- tions is unsupported.

It sure does. I should have been more intimidated by that, but I was
thinking that the fact that all ruby threads are in one native threads
would save me.

ahoward wrote:

from http://path.berkeley.edu/~vjoel/ruby/solaris-bug.rb

ugly hack because waiting for a lock in a Ruby thread blocks the process

this is the problem i think. if you look at the file.c implementation of
rb_file_flock you’ll notice that it used to do something very similar at the
C level - but that this has been removed

file.c:1531 #if defined(EWOULDBLOCK) && 0

which you’ve essentially re-implemented in ruby.

Interesting. I checked again using two irb sessions that waiting for a
lock does block all ruby threads. Also, I ran my solaris-bug.rb without
the polling code (the ugly hack), instead just calling #flock. I
expected deadlock. Over several runs, I’ve seen two kinds of results:

  1. solaris-bug.rb:43:in `flock’: Interrupted system call -
    “/tmp/test-file-lock.dat” (Errno::EINTR)

  2. solaris-bug.rb:128:in `load’: marshal data too short (ArgumentError)

The latter looks like the same collision problem as usual. Could #1 be a
result of deadlock, which I would expect to happen eventually?

Anyway, the fact the #2 happens (no pun intended) at all suggests that
polling isn’t really the cause of the problem, and blocking the whole
process doesn’t prevent the collision.

i’m guessing someone smarter than us realized this was not safe. thus, it
appears that the correct behavior for flock is block the entire process and
that threads should not be using flock at all. i do not understand the exact
reason for this, but searching google for flock/fcntl/thread will bring up a
plethora of problems. i think you can be farily certain that the behavior of
flock from inside a threaded program will be ill-defined across OSs.

It’s particularly confusing to me because ruby threads all run in the
same native thread, so I expected to be insulated from any
multithreading problems with locks or with anything else at the system
level, as long as I was careful to manage concurrency among my threads
(keep reader counters and use LOCK_NB, use Thread.critical, etc.).

The test program runs perfectly with thread_count == 1 (that’s ruby
threads, not native threads, and it doesn’t count the “supervisor
thread”, only the “workers”, so there are actually still 2 threads). So
I have trouble believing that the underlying flock implementation (in
ruby source or in the depths of solaris) is bad. It seems to me that as
long as ruby’s threading code is correct, and I use appropriate
threading constructs correctly, this should all work.

as to a fix : keeping in mind that file locks are only (usually) advisory you
don’t really buy anything using flock over a mutex since any other process can
chose to clobber the file anyhow - the lock will not prevent this. this is
troublesome since you also want to fork… the only way i can think of doing
this is to have each thread ask it’s parent process to lock the file in it’s
stead (in a critical section), rather than attempting to do so itsef. even
this might not be safe and you might have to resort to some sort of IPC to
ensure single writer semantics. the sysvipc module from the raa seems to be
unreachable right now. does anyone have a copy?

I don’t mind using advisory locks, since I don’t expect any processes
but my own (which use the still semi-functional locking code) to access
the files of interest. Also, I don’t fork within the context of a lock.

That workaround sounds pretty scary, and potentially less portable, but
what do I know, maybe I should learn how to use inter-process mutexes…

···

On Sun, 26 Jan 2003, Joel VanderWerf wrote:

[dslstat-bvi1-254:~] chrisg% uname -a
Darwin dslstat-bvi1-254.fastq.com 6.3 Darwin Kernel Version 6.3: Sat
Dec 14 03:11:25 PST 2002; root:xnu/xnu-344.23.obj~4/RELEASE_PPC Power
Macintosh powerpc
[dslstat-bvi1-254:~] chrisg% ruby -w feeding_frenzy.rb
feeding_frenzy.rb:20: warning: open (…) interpreted as method call
feeding_frenzy.rb:27: warning: catch (…) interpreted as method call
feeding_frenzy.rb:37: warning: catch (…) interpreted as method call
feeding_frenzy.rb:45: warning: catch (…) interpreted as method call
[dslstat-bvi1-254:~] chrisg%

···

On Sunday, January 26, 2003, at 08:56 PM, ahoward wrote:

also, if anyone could run this on their platform that’d be great


It produced a file named “out”:

1043641128.38338
1043641128.38692
1043641128.38744
1043641128.38777
1043641128.38809
1043641128.38840
1043641128.38883
1043641128.38916
1043641128.38948
1043641128.39006
1043641128.39040
1043641128.39072
1043641128.39108
1043641128.39152
1043641128.39184
1043641128.39234
1043641128.39278
1043641128.39311
1043641128.39343
1043641128.39374
1043641128.39418
1043641128.39450
1043641128.39482
1043641128.39514
1043641128.39557
1043641128.39597
1043641128.39629
1043641128.39673
1043641128.39705
1043641128.39737
1043641128.39769
1043641128.39800
1043641128.39832
1043641128.39863
1043641128.39894
1043641128.39926
1043641128.39969
1043641128.40001
1043641128.40033
1043641128.40082
1043641128.40115
1043641128.40147
1043641128.40179
1043641128.40211
1043641128.40242
1043641128.40277
1043641128.40310
1043641128.40342
1043641128.40384
1043641128.40417
1043641128.40450
1043641128.40483
1043641128.40516
1043641128.40548
1043641128.40581
1043641128.40622
1043641128.40655
1043641128.40687
1043641128.40744
1043641128.40777
1043641128.40810
1043641128.40842
1043641128.40875
1043641128.40907
1043641128.40939
1043641128.40972
1043641128.41004
1043641128.41036
1043641128.41068
1043641128.41100
1043641128.41131
1043641128.41163
1043641128.41194
1043641128.41226
1043641128.41258
1043641128.41290
1043641128.41322
1043641128.41365
1043641128.41398
1043641128.41430
1043641128.41465
1043641128.41497
1043641128.41529
1043641128.41561
1043641128.41599
1043641128.53028
1043641128.53077
1043641128.53110
1043641128.53142
1043641128.53174
1043641128.53205
1043641128.53241
1043641128.53272
1043641128.53304
1043641128.53336
1043641128.53382
1043641128.53415
1043641128.53448
1043641128.53479
1043641128.53511
1043641128.53549
1043641128.53581
1043641128.53613
1043641128.53650
1043641128.53683
1043641128.53715
1043641128.53747
1043641128.53779
1043641128.53811
1043641128.53842
1043641128.53874
1043641128.53906
1043641128.53938
1043641128.53971
1043641128.57766
1043641128.57805
1043641128.57838
1043641128.57870
1043641128.57902
1043641128.57936
1043641128.57970
1043641128.58001
1043641128.58033
1043641128.58065
1043641128.58097
1043641128.58128
1043641128.58160
1043641128.58192
1043641128.58223
1043641128.58255
1043641128.58286
1043641128.58318
1043641128.58350
1043641128.58393
1043641128.58425
1043641128.58457
1043641128.58488
1043641128.58520
1043641128.58552
1043641128.58583
1043641128.58615
1043641128.58647
1043641128.58687
1043641128.58725
1043641128.58757
1043641128.61838
1043641128.61870
1043641128.61902
1043641128.61934
1043641128.61977
1043641128.62009
1043641128.62053
1043641128.62086
1043641128.62118
1043641128.62149
1043641128.62181
1043641128.62213
1043641128.62244
1043641128.62276
1043641128.62308
1043641128.62339
1043641128.62371
1043641128.62403
1043641128.62434
1043641128.62466
1043641128.62498
1043641128.62530
1043641128.62561
1043641128.62593
1043641128.62626
1043641128.65831
1043641128.65869
1043641128.65901
1043641128.65933
1043641128.65987
1043641128.66019
1043641128.66051
1043641128.66083
1043641128.66114
1043641128.66146
1043641128.66180
1043641128.66212
1043641128.66244
1043641128.66276
1043641128.66307
1043641128.66352
1043641128.66390
1043641128.66423
1043641128.66465
1043641128.66498
1043641128.66530
1043641128.66562
1043641128.66593
1043641128.66625
1043641128.66656
1043641128.66688
1043641128.66726
1043641128.66758
1043641128.66789
1043641128.66821
1043641128.66852
1043641128.66884
1043641128.66916
1043641128.66947
1043641128.66991
1043641128.67023
1043641128.67055
1043641128.67098
1043641128.67130
1043641128.67162
1043641128.67193
1043641128.67225
1043641128.67257
1043641128.67288
1043641128.67320
1043641128.67354
1043641128.67386
1043641128.67417
1043641128.67449
1043641128.67481
1043641128.67512
1043641128.67544
1043641128.67575
1043641128.67607
1043641128.67638
1043641128.67670
1043641128.67720
1043641128.67753
1043641128.67790
1043641128.67822
1043641128.67866
1043641128.67899
1043641128.67930
1043641128.67973
1043641128.73258
1043641128.73296
1043641128.73328
1043641128.73360
1043641128.73392
1043641128.73424
1043641128.73455
1043641128.73493
1043641128.73526
1043641128.73558
1043641128.73609
1043641128.73643
1043641128.73675
1043641128.73708
1043641128.73749
1043641128.73781
1043641128.73813
1043641128.73845
1043641128.73891
1043641128.73924
1043641128.73958
1043641128.73990
1043641128.74022
1043641128.74055
1043641128.74087
1043641128.74119
1043641128.74151
1043641128.74183
1043641128.74215
1043641128.80940
1043641128.80980
1043641128.81013
1043641128.81045
1043641128.81077
1043641128.81115
1043641128.81148
1043641128.81180
1043641128.81218
1043641128.81251
1043641128.81283
1043641128.81315
1043641128.81347
1043641128.81379
1043641128.81425
1043641128.81461
1043641128.81493
1043641128.81526
1043641128.81558
1043641128.81602
1043641128.81634
1043641128.81666
1043641128.81698
1043641128.81730
1043641128.81770
1043641128.81803
1043641128.81835
1043641128.81867
1043641128.81899
1043641128.81931
1043641128.87030
1043641128.87066
1043641128.87099
1043641128.87131
1043641128.87163
1043641128.87195
1043641128.87227
1043641128.87271
1043641128.87304
1043641128.87338
1043641128.87371
1043641128.87403
1043641128.87449
1043641128.87481
1043641128.87514
1043641128.87546
1043641128.87578
1043641128.87610
1043641128.87642
1043641128.88831
1043641128.88867
1043641128.88906
1043641128.88963
1043641128.88997
1043641128.89030
1043641128.89062
1043641128.89103
1043641128.89136
1043641128.89168
1043641128.90210
1043641128.90243
1043641128.90276
1043641128.90308
1043641128.90340
1043641128.90372
1043641128.90410
1043641128.90454
1043641128.90487
1043641128.90520
1043641128.90552
1043641128.90584
1043641128.90616
1043641128.90647
1043641128.90679
1043641128.90722
1043641128.90755
1043641128.90794
1043641128.90826
1043641128.90858
1043641128.90893
1043641128.90925
1043641128.90974
1043641128.91007
1043641128.91039
1043641128.91071
1043641128.91103
1043641128.91134
1043641128.91166
1043641128.91198
1043641128.97967
1043641128.98017
1043641128.98049
1043641128.98081
1043641128.98131
1043641128.98165
1043641128.98197
1043641128.98229
1043641128.98261
1043641128.98293
1043641128.98325
1043641128.98357
1043641128.98389
1043641128.98420
1043641128.98452
1043641128.98484
1043641128.98516
1043641128.98547
1043641128.98580
1043641128.98611
1043641128.98644
1043641128.98675
1043641128.98707
1043641128.98751
1043641128.98784
1043641128.98823
1043641128.98855
1043641128.98887
1043641128.98919
1043641128.98950
1043641128.98994
1043641128.99026
1043641128.99058
1043641128.99090
1043641128.99124
1043641128.99156
1043641128.99188
1043641128.99220
1043641128.99252
1043641128.99284
1043641128.99315
1043641128.99358
1043641128.99391
1043641128.99423
1043641128.99455
1043641128.99500
1043641128.99533
1043641128.99571
1043641128.99603
1043641128.99635
1043641128.99666
1043641128.99698
1043641128.99730
1043641128.99762
1043641128.99793
1043641128.99831
1043641128.99863
1043641128.99895
1043641128.99927
1043641128.99969
1043641129.00013
1043641129.00046
1043641129.00078
1043641129.00110
1043641129.00142
1043641129.04229
1043641129.04263
1043641129.04295
1043641129.04326
1043641129.04358
1043641129.04390
1043641129.04422
1043641129.04454
1043641129.04486
1043641129.04517
1043641129.04549
1043641129.04581
1043641129.04613
1043641129.04655
1043641129.04688
1043641129.04721
1043641129.04752
1043641129.04784
1043641129.04816
1043641129.04857
1043641129.04889
1043641129.04920
1043641129.04952
1043641129.04997
1043641129.08216
1043641129.08261
1043641129.08294
1043641129.08326
1043641129.08358
1043641129.08390
1043641129.08421
1043641129.08453
1043641129.08497
1043641129.08545
1043641129.08579
1043641129.08610
1043641129.08642
1043641129.08674
1043641129.08705
1043641129.08737
1043641129.08769
1043641129.08800
1043641129.08832
1043641129.08872
1043641129.08904
1043641129.08936
1043641129.08978
1043641129.09010
1043641129.09042
1043641129.09074
1043641129.09105
1043641129.09149
1043641129.09181
1043641129.09213
1043641129.12278
1043641129.12312
1043641129.12344
1043641129.12376
1043641129.12407
1043641129.12439
1043641129.12470
1043641129.12502
1043641129.12534
1043641129.12566
1043641129.12597
1043641129.12634
1043641129.13691
1043641129.13731
1043641129.13763
1043641129.17989
1043641129.18042
1043641129.18076
1043641129.18107
1043641129.18139
1043641129.18170
1043641129.18202
1043641129.18233
1043641129.18265
1043641129.18297
1043641129.18328
1043641129.18360
1043641129.18392
1043641129.18423
1043641129.18455
1043641129.18486
1043641129.18518
1043641129.21611
1043641129.21648
1043641129.21679
1043641129.21712
1043641129.21743
1043641129.21774
1043641129.21806
1043641129.21837
1043641129.21869
1043641129.22044
1043641129.22259
1043641129.22296
1043641129.22328
1043641129.22360
1043641129.22391
1043641129.22423
1043641129.22454
1043641129.22486
1043641129.22702
1043641129.22737
1043641129.22769
1043641129.25184
1043641129.25218
1043641129.25250
1043641129.25282
1043641129.25313
1043641129.25345
1043641129.25376
1043641129.25408
1043641129.25439
1043641129.25470
1043641129.25502
1043641129.25534
1043641129.25565
1043641129.25597
1043641129.25628
1043641129.25660
1043641129.25692
1043641129.25725
1043641129.25758
1043641129.25790
1043641129.25823
1043641129.25856
1043641129.25889
1043641129.25929
1043641129.25973
1043641129.26021
1043641129.26054
1043641129.26086
1043641129.26119
1043641129.26152
1043641129.32071
1043641129.32110
1043641129.32143
1043641129.32175
1043641129.32208
1043641129.32240
1043641129.32273
1043641129.32306
1043641129.32338
1043641129.32371
1043641129.32403
1043641129.32436
1043641129.32468
1043641129.32501
1043641129.32533
1043641129.32566
1043641129.32598
1043641129.32631
1043641129.32676
1043641129.32710
1043641129.32742
1043641129.32775
1043641129.32807
1043641129.32840
1043641129.32873
1043641129.32905
1043641129.32944
1043641129.32978
1043641129.33011
1043641129.33044
1043641129.36127
1043641129.36162
1043641129.36196
1043641129.36228
1043641129.36261
1043641129.36293
1043641129.36326
1043641129.36359
1043641129.36391
1043641129.36424
1043641129.36511
1043641129.36545
1043641129.36577
1043641129.36609
1043641129.36642
1043641129.36675
1043641129.36707
1043641129.36740
1043641129.36775
1043641129.36808
1043641129.36840
1043641129.36873
1043641129.36905
1043641129.36938
1043641129.36981
1043641129.37013
1043641129.37046
1043641129.37078
1043641129.37111
1043641129.37143
1043641129.37196
1043641129.37232
1043641129.37267
1043641129.37303
1043641129.37338
1043641129.37373
1043641129.37408
1043641129.37443
1043641129.37477
1043641129.37513
1043641129.37547
1043641129.37583
1043641129.37618
1043641129.37652
1043641129.40890
1043641129.40926
1043641129.40976
1043641129.41009
1043641129.41042
1043641129.41075
1043641129.41107
1043641129.41140
1043641129.41172
1043641129.41205
1043641129.41237
1043641129.41270
1043641129.41302
1043641129.41334
1043641129.41367
1043641129.42414
1043641129.42448
1043641129.42481
1043641129.42513
1043641129.42545
1043641129.42578
1043641129.42610
1043641129.42646
1043641129.42678
1043641129.42711
1043641129.42744
1043641129.42776
1043641129.42808
1043641129.42841
1043641129.42874
1043641129.42906
1043641129.42939
1043641129.42973
1043641129.43011
1043641129.43044
1043641129.43077
1043641129.43109
1043641129.43141
1043641129.43174
1043641129.43218
1043641129.43251
1043641129.43283
1043641129.43316
1043641129.43348
1043641129.43381
1043641129.46466
1043641129.46502
1043641129.46534
1043641129.46567
1043641129.46599
1043641129.46632
1043641129.46664
1043641129.46696
1043641129.46729
1043641129.46761
1043641129.46794
1043641129.46826
1043641129.46859
1043641129.46891
1043641129.46923
1043641129.46967
1043641129.47009
1043641129.47042
1043641129.47074
1043641129.47107
1043641129.47139
1043641129.47171
1043641129.47204
1043641129.47236
1043641129.47268
1043641129.47301
1043641129.47333
1043641129.47368
1043641129.47401
1043641129.47433
1043641129.51207
1043641129.51245
1043641129.51278
1043641129.51311
1043641129.51343
1043641129.51375
1043641129.51408
1043641129.51440
1043641129.51472
1043641129.51505
1043641129.51537
1043641129.51569
1043641129.51602
1043641129.51634
1043641129.51666
1043641129.51699
1043641129.51731
1043641129.51763
1043641129.51796
1043641129.51828
1043641129.51867
1043641129.51900
1043641129.51944
1043641129.51979
1043641129.52019
1043641129.52056
1043641129.52089
1043641129.52122
1043641129.52155
1043641129.52187
1043641129.55772
1043641129.55807
1043641129.55840
1043641129.55873
1043641129.55905
1043641129.55938
1043641129.55973
1043641129.56024
1043641129.56058
1043641129.56091
1043641129.56124
1043641129.56157
1043641129.56190
1043641129.56223
1043641129.56256
1043641129.56289
1043641129.56322
1043641129.56354
1043641129.56387
1043641129.56568
1043641129.56605
1043641129.56665
1043641129.56701
1043641129.56775
1043641129.56810
1043641129.56874
1043641129.56922
1043641129.56974
1043641129.57028
1043641129.57062
1043641129.57095
1043641129.57128
1043641129.57161
1043641129.57194
1043641129.57227
1043641129.57260
1043641129.57292
1043641129.57325
1043641129.57358
1043641129.57391
1043641129.57424
1043641129.57457
1043641129.57490
1043641129.57529
1043641129.57562
1043641129.57595
1043641129.57628
1043641129.57661
1043641129.57694
1043641129.57727
1043641129.57760
1043641129.57793
1043641129.57826
1043641129.57859
1043641129.57892
1043641129.57925
1043641129.57972
1043641129.58005
1043641129.61071
1043641129.61105
1043641129.61138
1043641129.61171
1043641129.61204
1043641129.61236
1043641129.61269
1043641129.61314
1043641129.61347
1043641129.61380
1043641129.61413
1043641129.61446
1043641129.61481
1043641129.61514
1043641129.61547
1043641129.61580
1043641129.61612
1043641129.61645
1043641129.61678
1043641129.61710
1043641129.61743
1043641129.61776
1043641129.61808
1043641129.61841
1043641129.61873
1043641129.61906
1043641129.61939
1043641129.61973
1043641129.62005
1043641129.69295
1043641129.69335
1043641129.69369
1043641129.69402
1043641129.69434
1043641129.69467
1043641129.69500
1043641129.69533
1043641129.69566
1043641129.69599
1043641129.69632
1043641129.69665
1043641129.69697
1043641129.69733
1043641129.69766
1043641129.69799
1043641129.69832
1043641129.69864
1043641129.69897
1043641129.69930
1043641129.69975
1043641129.70009
1043641129.70042
1043641129.70083
1043641129.70116
1043641129.70149
1043641129.70182
1043641129.70214
1043641129.70247
1043641129.70280
1043641129.73597
1043641129.73637
1043641129.73670
1043641129.73702
1043641129.73735
1043641129.73767
1043641129.73800
1043641129.73833
1043641129.73865
1043641129.73898
1043641129.73930
1043641129.73975
1043641129.74008
1043641129.74041
1043641129.74074
1043641129.74113
1043641129.74146
1043641129.74178
1043641129.74211
1043641129.74244
1043641129.74276
1043641129.74315
1043641129.74348
1043641129.74381
1043641129.74416
1043641129.74449
1043641129.74481
1043641129.74514
1043641129.74547
1043641129.74579
1043641129.77210
1043641129.77244
1043641129.77276
1043641129.77309
1043641129.77341
1043641129.77374
1043641129.77406
1043641129.77439
1043641129.77471
1043641129.77504
1043641129.77536
1043641129.77569
1043641129.77601
1043641129.77634
1043641129.77666
1043641129.77699
1043641129.77731
1043641129.77764
1043641129.77796
1043641129.77829
1043641129.77874
1043641129.77907
1043641129.77942
1043641129.77976
1043641129.78009
1043641129.78042
1043641129.78074
1043641129.78112
1043641129.78145
1043641129.78177
1043641129.84194
1043641129.84232
1043641129.84265
1043641129.84297
1043641129.84329
1043641129.84361
1043641129.84393
1043641129.84425
1043641129.84457
1043641129.84488
1043641129.84520
1043641129.84552
1043641129.84583
1043641129.84615
1043641129.84666
1043641129.84698
1043641129.84730
1043641129.84762
1043641129.84793
1043641129.84825
1043641129.84857
1043641129.84889
1043641129.84922
1043641129.84966
1043641129.85002
1043641129.85035
1043641129.85068
1043641129.85100
1043641129.85141
1043641129.85174
1043641129.87681
1043641129.87716
1043641129.87748
1043641129.87780
1043641129.87812
1043641129.87845
1043641129.87876
1043641129.87908
1043641129.87940
1043641129.87973
1043641129.88005
1043641129.88036
1043641129.88068
1043641129.88099
1043641129.88138
1043641129.88170
1043641129.88201
1043641129.88232
1043641129.88264
1043641129.88296
1043641129.88328
1043641129.88359
1043641129.88403
1043641129.88436
1043641129.88474
1043641129.88506
1043641129.88541
1043641129.88574
1043641129.88606
1043641129.88638
1043641129.88670
1043641129.90301
1043641129.90480
1043641129.90517
1043641129.90550
1043641129.90582
1043641129.91382
1043641129.92442
1043641129.92475
1043641129.92507
1043641129.92539
1043641129.92570
1043641129.92602
1043641129.92634
1043641129.92665
1043641129.92697
1043641129.92728
1043641129.92760
1043641129.92792
1043641129.92823
1043641129.92855
1043641129.92887
1043641129.92918
1043641129.92975
1043641129.93007
1043641129.93039
1043641129.93070
1043641129.93102
1043641129.93133
1043641129.93170
1043641129.93202
1043641129.93236
1043641129.93268
1043641129.93299
1043641129.93331
1043641129.93362
1043641129.93394
1043641129.93426
1043641129.97506
1043641129.97540
1043641129.97572
1043641129.97604
1043641129.97635
1043641129.97667
1043641129.97698
1043641129.97730
1043641129.97761
1043641129.97792
1043641129.97824
1043641129.97855
1043641129.97887
1043641129.97919
1043641129.97964
1043641129.97998

HTH

When a thing is funny, search it carefully for a hidden truth.
-George Bernard Shaw, writer, Nobel laureate (1856-1950)

???

~ > uname -a && ruby -v
Linux eli.fsl.noaa.gov 2.4.2-2 #1 SMP Fri Nov 23 20:51:15 GMT 2001 i686 unknown
ruby 1.6.7 (2002-03-01) [i686-linux]

file.c :

1531 #if defied(EWOULDBLOCK) && 0
1532 static int
1533 rb_thread_flock(fd, op, fptr)
1534 int fd, op;
1535 OpenFile fptr;
1536 {
1537 if (rb_thread_alone() || (op & LOCK_NB)) {
1538 return flock(fd, op);
1539 }
1540 op |= LOCK_NB;
1541 while (flock(fd, op) < 0) {
1542 switch (errno) {
1543 case EAGAIN:
1544 case EACCES:
1545 #if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
1546 case EWOULDBLOCK:
1547 #endif
1548 rb_thread_polling(); /
busy wait */
1549 rb_io_check_closed(fptr);
1550 continue;
1551 default:
1552 return -1;
1553 }
1554 }
1555 return 0;
1556 }
1557 #define flock(fd, op) rb_thread_flock(fd, op, fptr)
1558 #endif
1559
1560 static VALUE
1561 rb_file_flock(obj, operation)
1562 VALUE obj;
1563 VALUE operation;
1564 {
1565 #ifndef CHECKER
1566 OpenFile *fptr;
1567 int ret;
1568
1569 rb_secure(2);
1570 GetOpenFile(obj, fptr);
1571
1572 if (fptr->mode & FMODE_WRITABLE) {
1573 fflush(GetWriteFile(fptr));
1574 }
1575 TRAP_BEG;
1576 ret = flock(fileno(fptr->f), NUM2INT(operation));
1577 TRAP_END;
1578 if (ret < 0) {
1579 switch (errno) {
1580 case EAGAIN:
1581 case EACCES:
1582 #if defined(EWOULDBLOCK) && EWOULDBLOCK != EAGAIN
1583 case EWOULDBLOCK:
1584 #endif
1585 return Qfalse;
1586 }
1587 rb_sys_fail(fptr->path);
1588 }
1589 #endif
1590 return INT2FIX(0);
1591 }
1592 #undef flock

also, it appears to me (just glancing over sources) that ruby uses
fflush(stream) which only flushes into kernel space and does not guaruntee
data is written to disk even on file close !! eg. sync must also be used.

-a

···

On Mon, 27 Jan 2003, ts wrote:

does your solaris os have this disclaimer?

ruby use fcntl()

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

ts wrote:

I’ll not call a ‘simple program’ a program with 200 lines of ruby
code (multi-process and multi-threaded) :-)))

Only by comparison to the original…

no, that’s what i’m saying : why has this method be ‘commented out’?? it
looks like someone’s intention was for attempting to aquire a lock not to
block the current thread… does anyone have the source for the 1.8 version?

also, earlier you said that flock was ‘broken’ on solaris. what exactly did
you mean by that? supposedly solaris uses fcntl to implement flock so why
does ruby need it’s own? i’m sure there is an answer i’m just curious…

-a

···

On Mon, 27 Jan 2003, ts wrote:

also, and i may be wrong on this, it looks like the implementation of
rb_file_flock should not cause the thread to block, but should send it into a
busy wait - are you sure that failing to aquire the lock blocked all
threads?

Well perhaps you have not seen this

1531 #if defined(EWOULDBLOCK) && 0

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Interesting. I checked again using two irb sessions that waiting for a
lock does block all ruby threads. Also, I ran my solaris-bug.rb without
the polling code (the ugly hack), instead just calling #flock. I
expected deadlock. Over several runs, I’ve seen two kinds of results:

  1. solaris-bug.rb:43:in `flock’: Interrupted system call -
    “/tmp/test-file-lock.dat” (Errno::EINTR)

  2. solaris-bug.rb:128:in `load’: marshal data too short (ArgumentError)

The latter looks like the same collision problem as usual. Could #1 be a
result of deadlock, which I would expect to happen eventually?

according to the fcntl man page this might be the result of catching a
signal during the period that the process was blocked waiting for the lock,
what signal that could be i do not know…

Anyway, the fact the #2 happens (no pun intended) at all suggests that
polling isn’t really the cause of the problem, and blocking the whole
process doesn’t prevent the collision.

hmmm. i truely have no idea why it would be but the

file.c:1531 #if defined(EWOULDBLOCK) && 0

line in file.c which prevents the rb_thread_flock method from getting defined
makes me wonder. can anyone shed light on this? i did some searching in this
group and found no mention of it…

It’s particularly confusing to me because ruby threads all run in the
same native thread, so I expected to be insulated from any
multithreading problems with locks or with anything else at the system
level, as long as I was careful to manage concurrency among my threads
(keep reader counters and use LOCK_NB, use Thread.critical, etc.).

maybe the flock implementation uses some sort of process level global state
which gets copied when forking, but which gets clobbered by competing
threads… i realize this is OT but i’d really like to know now.

The test program runs perfectly with thread_count == 1 (that’s ruby threads,
not native threads, and it doesn’t count the “supervisor thread”, only the
“workers”, so there are actually still 2 threads). So I have trouble
believing that the underlying flock implementation (in ruby source or in the
depths of solaris) is bad. It seems to me that as long as ruby’s threading
code is correct, and I use appropriate threading constructs correctly, this
should all work.

do some searching on google and i don’t think you’ll have so much trouble
believing an flock implementation could be bad. :wink: last week i was amazed
to learn that a multi-threaded C program i had written could not use
fprintf/fflush to force writes, and that i had to resort to
write/read/flush/sync to do this…

-a

···

On Mon, 27 Jan 2003, Joel VanderWerf wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

~ > uname -a && ruby -v
Linux eli.fsl.noaa.gov 2.4.2-2 #1 SMP Fri Nov 23 20:51:15 GMT 2001 i686 unknown
ruby 1.6.7 (2002-03-01) [i686-linux]

                            ^^^^^^^^^^

This is the latest version of Solaris ? :-)))

moulon% ls flock.o missing/flock.c
flock.o missing/flock.c
moulon%

moulon% grep flock.o Makefile
MISSING = flock.o isinf.o
flock.o: $(srcdir)/missing/flock.c
moulon%

moulon% ruby -v
ruby 1.6.8 (2002-12-24) [sparc-solaris2.7]
moulon%

Guy Decoux

also, earlier you said that flock was 'broken' on solaris. what exactly did
you mean by that? supposedly solaris uses fcntl to implement flock so why
does ruby need it's own? i'm sure there is an answer i'm just curious...

Which version of Solaris ?

If I'm remember correctly, on some version flock() work if the file is
open for writing.

You need a #rewind after you have got a lock.

Guy Decoux

~ > uname -a && ruby -v
Linux eli.fsl.noaa.gov 2.4.2-2 #1 SMP Fri Nov 23 20:51:15 GMT 2001 i686 unknown
ruby 1.6.7 (2002-03-01) [i686-linux]
^^^^^^^^^^

This is the latest version of Solaris ? :-)))

whoops, wrong terminal

hero.fsl.noaa.gov$ uname -a
SunOS hero.fsl.noaa.gov 5.6 Generic_105181-33 sun4u sparc

moulon% ls flock.o missing/flock.c
flock.o missing/flock.c
moulon%

moulon% grep flock.o Makefile
MISSING = flock.o isinf.o
flock.o: $(srcdir)/missing/flock.c
moulon%

odd that it builds flock when it is not missing on this system?

are you saying that the ruby implimentation of flock is safe to use from
multi threaded applications then?

-a

···

On Mon, 27 Jan 2003, ts wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================