Hi, good people of clr,
I'm just dipped into the goodness that is ruby for the first time
yesterday, and while this group and the online docs proved useful, I'm
left somewhat bewildered by a few things. Environment: Win XP SP2,
one-click-install 1.8.2 ruby.
1) Current working directories:
I currently use
f = __FILE__
len = -f.length
my_dir = File::expand_path(f)[0...len]
To find the script's current working directory. Snappier alternatives
such as
my_dir = File.dirname(__FILE__)
just report back with ".", which, while true, isn't exactly helpful.
Problem: this only works if the script is invoked from the command line
as "ruby this.rb". Trying to invoke it by double-clicking on the script
in the windows explorer makes the above function return an empty
string. Is there any way, short of embedding the call to ruby in a bat
file, to make ruby read its currrent working directory even if invokend
by double-clicking?
2) MD5 hashes and file handles:
I currently use something like
Dir['*'].each {|f| print Digest::MD5.hexdigest(open(f, 'rb').read), '
', f, "\n"}
I tried stuff like
Dir['*'].each {|f|print f, " "; puts
Digest::MD5.hexdigest(File.read(f))}
or
dig=Digest::MD5.new
dig.update(file)
and they both seem to suffer from some sort of buffer on the directory
reading; that is, they'll produce the same hash for several files when
scanning a large directory. The first line above bypasses this, I
suppose by the 'rb' reading mode on the file handle. Is there any way
to unbuffer the directory file handle stream (akin to Perl's $|=1)?
3) Finally, I submit for very first ruby script for merciless
criticism. What here could have been done otherwise? What screams for a
better ruby solution? I'm aware of that I should probably look into
split instead of relying so much on regexps for splitting and I was
trying to set up a structure like hash[key]=[a,b], but I found I could
not access hash.each_pair { |key,value] puts key, value(0), value (1)
}.
···
------------------------------------------------------------------
require 'Digest/md5'
require 'fileutils'
# Variables to set manually
global_digest_index='C:/srfctrl/indexfile/globalindex.txt'
global_temp_directory='C:/srfctrl/tempstore/'
global_collide_directory='C:/srfctrl/collide/'
# Begin program
f = __FILE__
len = -f.length
my_dir = File::expand_path(f)[0...len]
my_dirname = my_dir.sub(/^.+\/(\w+?)\/$/,'\1')
puts my_dir
puts my_dirname
digest_map_name={}
digest_map_directory={}
IO.foreach(global_digest_index) { |line|
th_dige=line.sub(/^.+?\:(.+?)\:.+?$/,'\1').chomp
th_fnam=line.sub(/^.+?\:.+?\:(.+?)$/,'\1').chomp
th_dir=line.sub(/^(.+?)\:.+?\:.+?$/,'\1').chomp
digest_map_name[th_dige] = th_fnam
digest_map_directory[th_dige] = th_dir
}
filecnt = filesuc = 0
outfile = File.new(global_digest_index, "a")
Dir['*'].each do |file_name|
next unless (file_name =~ /\.mp3$|\.ogg$/i)
filecnt += 1
hex = Digest::MD5.hexdigest(open(file_name, 'rb').read)
if digest_map_name.has_key?(hex) then
collfilestrip = digest_map_name[hex].sub(/\.mp3$|\.ogg$/i,'')
id_name = global_collide_directory + digest_map_directory[hex].to_s
+ '_' + collfilestrip + '_' + file_name
FileUtils.cp(file_name,id_name)
else
filesuc +=1
digest_map_name[hex] = file_name
digest_map_directory[hex] = my_dirname
outfile.puts my_dirname + ':' + hex + ':' + file_name
id_name = global_temp_directory + file_name
FileUtils.cp(digest_map_name[hex],id_name)
end
end
outfile.close
puts "Processed " + filecnt.to_s + " files, out of which " +
filesuc.to_s + " were not duplicates."
----------------------------------------------