I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords stored
in a file. My code so far is below, but before moving ahead I have two
questions.
First: I am passing in a text file called "terms.txt" to search for each
keyword in the file I assume the best way to to do so is as follows:
terms.each do |term|
if line =~ term
puts ""
end
My second question is: This program works well for searching text files
but what about word docs and spreadsheets? Do i need some Windows API in
there??
Many thanks
require 'find'
class ESearch
#method which is passed file path from cmd line
def scanFiles(path)
terms = "C:\Documents and Settings\user\Desktop\terms.txt" #process each file under the passed file path
Find.find(path) do |curPath|
next unless File.file?(curPath) #process the contens of each file line by line counting line
nmbers
File.open(curPath) do |file|
file.each do |line| #check if a line in the file matches term and output the path
and line number
if line =~ terms
puts "#{curPath}"
end
end
end
end
end
end
#run of cmd line pass in file path, this will ask for a file path if one
is not passed
if __FILE__ == $0
if ARGV.size != 1
puts "Use: #{$0} [path]"
exit
end
esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end
I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords
stored in a file. My code so far is below, but before moving ahead I
have two questions.
First: I am passing in a text file called "terms.txt" to search for
each keyword in the file I assume the best way to to do so is as
follows:
terms.each do |term|
if line =~ term
puts ""
end
My second question is: This program works well for searching text
files but what about word docs and spreadsheets? Do i need some
Windows API in there??
You can read these files if you open them in binary mode.
However, they will contain so much extra binary crap that
it may not be easy to search in them.
Many thanks
require 'find'
class ESearch
#method which is passed file path from cmd line
def scanFiles(path)
terms = "C:\Documents and Settings\user\Desktop\terms.txt" #process each file under the passed file path
Find.find(path) do |curPath|
next unless File.file?(curPath) #process the contens of each file line by line counting line
nmbers
File.open(curPath) do |file|
file.each do |line| #check if a line in the file matches term and output the
path and line number
if line =~ terms
puts "#{curPath}"
end
end
end
end
end
end
#run of cmd line pass in file path, this will ask for a file path if
one is not passed
if FILE == $0
if ARGV.size != 1
puts "Use: #{$0} [path]"
exit
end
esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end