Making sure a file isn't being written/copied before moving

So, I'm (still) working on some scripts to rename and reorganize my image files.
Right now, I have a script on my laptop that pulls images off the CF card and stores them locally. I have another script that rsync's the images to my server at home whenever there's a connection available. These scripts don't step on each other because they look at the process list for instances of rsync.

But I need to write two more scripts that take the image files from an incoming directory, rename them, and drop them into a directory for me to work on them; then once I've done whatever I'm going to do (cull, keyword, etc), then I put them into a directory for archiving. The images are going to be pulled from that directory, put in an archive directory, and have the immutable extended attribute (xattr) set.

My problem/issue is that I don't want to do anything with a file that is in the process of being moved into one of these directories. I have to be sure that the file is not still being moved/copied. Now, these directories are all on the same filesystem, so it *should* be an atomic change by the filesystem (technically a rename of the file from one directory/file name to another). But I want to be sure--plus I may be dropping files into the incoming directory from elsewhere from time to time. (I plan on going through my backlog of old untagged files eventually.)

Can someone suggest an easy (or at least reliable) way to make sure that any file I'm about to modify isn't being touched by another program?

Paul

Paul Archer wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

This is how Maildir works, so maybe reading up on the semantics of
Maildir will help you.

http://www.qmail.org/qmail-manual-html/man5/maildir.html

···

--
Posted via http://www.ruby-forum.com/\.

I think I solved my problem. I was looking at inotify in order to avoid having the script have to check the directories on a regular basis. Turns out that it can report when a file is closed for writing, *and* return the path and basename of the file.
The only downside is that ruby-inotify doesn't (as far as I can tell) do recursive checks of the directory, so I'm using Open3 to call inotifywait, and parsing its output.

Here's my test program:

#!/usr/bin/ruby -w

require 'open3'
require 'ftools'

def inwait(path)
         Open3.popen3("inotifywait -m -r #{path}"){ |stdin, stdout, stderr|
             while line = stdout.gets
          next unless line.include?("CLOSE_WRITE")
          yield line
          end
         }
end

inwait("/tmp") do |line|
         path, action, file = line.split
         puts "path: \t #{path}"
         puts "action: \t #{action}"
         puts "file: \t #{file}"
         File.move(path+file, "/tmp")
end

Paul

Tomorrow, Paul Archer wrote:

···

So, I'm (still) working on some scripts to rename and reorganize my image files.
Right now, I have a script on my laptop that pulls images off the CF card and stores them locally. I have another script that rsync's the images to my server at home whenever there's a connection available. These scripts don't step on each other because they look at the process list for instances of rsync.

But I need to write two more scripts that take the image files from an incoming directory, rename them, and drop them into a directory for me to work on them; then once I've done whatever I'm going to do (cull, keyword, etc), then I put them into a directory for archiving. The images are going to be pulled from that directory, put in an archive directory, and have the immutable extended attribute (xattr) set.

My problem/issue is that I don't want to do anything with a file that is in the process of being moved into one of these directories. I have to be sure that the file is not still being moved/copied. Now, these directories are all on the same filesystem, so it *should* be an atomic change by the filesystem (technically a rename of the file from one directory/file name to another). But I want to be sure--plus I may be dropping files into the incoming directory from elsewhere from time to time. (I plan on going through my backlog of old untagged files eventually.)

Can someone suggest an easy (or at least reliable) way to make sure that any file I'm about to modify isn't being touched by another program?

Paul

Tomorrow, Brian Candler wrote:

Paul Archer wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't clear), but I may be operating on files that are added arbitrarily. My concern is that the script starts acting on a file that is still being copied. Renaming it won't help there.

Paul

Just use the renaming semantics in the first program (the one doing the copying) also. Basically you are using the filename appearance as a synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the directory in which it appears.

Gary Wright

···

On Jul 16, 2009, at 4:48 PM, Paul Archer wrote:

Tomorrow, Brian Candler wrote:

Paul Archer wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't clear), but I may be operating on files that are added arbitrarily. My concern is that the script starts acting on a file that is still being copied. Renaming it won't help there.

Paul Archer wrote:

Paul Archer wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)

···

--
Posted via http://www.ruby-forum.com/\.

Tomorrow, Gary Wright wrote:

Paul Archer wrote:

Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't clear), but I may be operating on files that are added arbitrarily. My concern is that the script starts acting on a file that is still being copied. Renaming it won't help there.

Just use the renaming semantics in the first program (the one doing the copying) also. Basically you are using the filename appearance as a synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the directory in which it appears.

That still leaves me with the same problem: I have to read out of a directory. Plus, there isnt' just going to be a script putting files in my incoming directory. I'll be doing that myself as I clean up all my old files.

Paul

5:28pm, Brian Candler wrote:

Paul Archer wrote:

Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)
--

I see what you're saying. My issue was that I will be moving files into this directory by hand as I go through my old, unmanaged digital images and put them in this directory to be renamed and start the DAM (digital asset management) process.

Of course, this is all moot now that I've found inotify will solve the problem for me. Actually, it solves three problems:

1) It blocks, so I don't have to poll the directory.
2) It lets me know when a file has been moved to or written in the directory (even if it's in a subdirectory).
3) It tells me the name of the file, so I don't have to go out and find it.

Paul