Finding files with regular expressions

Remco_Hh · 2 October 2007 12:17

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

remco

···

--
Posted via http://www.ruby-forum.com/.

Richard_Conroy1 · 2 October 2007 12:58

Look at Ruby's Find library. I am not sure if it can take regexp arguments
(haven't tried, but it would be hella cool).

···

On 10/2/07, Remco Hh <remco@huijdts.nl> wrote:

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

7stud1 · 2 October 2007 13:41

Remco Hh wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Try something like this:

results =

Dir.foreach("./programs_ruby") do |filename|
  if filename.index("mod")
    results << filename
  end
end

p results

···

--
Posted via http://www.ruby-forum.com/\.

Gavin_Kistner3 · 2 October 2007 13:50

Here's my 'findfile' script that I use daily. It lets you use a regexp
for the filename, file content, specify depth of search, whether or
not to show all matches inside a file, and so on.

(You may need to unwrap some of the longer lines after copy/paste.)

See additional notes at the end.

Slim2:/usr/local/bin phrogz$ cat findfile
#!/usr/bin/env ruby

USAGE = <<ENDUSAGE
Usage:
   findfile [-d max_depth] [-a] [-c] [-i] name_regexp
[content_regexp]
   -d,--depth the maximum depth to recurse to (defaults to no
limit)
   -a,--showall with content_regexp, show every match per file
                     (defaults to only show the first-match per file)
   -c,--usecase with content_regexp, use case-sensitive matching
                     (defaults to case-insensitive)
   -i,--includedirs also find directories matching name_regexp
                     (defaults to files only; incompatible with
content_regexp)
   -h,--help show some help examples
ENDUSAGE

EXAMPLES = <<ENDEXAMPLES

Examples:
findfile foo
# Print the path to all files with 'foo' in the name

findfile -i foo
# Print the path to all files and directories with 'foo' in the
name

findfile js$
# Print the path to all files whose name ends in "js"

   findfile js$ vector
   # Print the path to all files ending in "js" with "Vector" or
"vector"
   # (or "vEcTOr", "VECTOR", etc.) in the contents, and print some of
the
   # first line that has that content.

findfile js$ -c Vector
# Like above, but must match exactly "Vector" (not 'vector' or
'VECTOR').

   findfile . vector -a
   # Print the path to every file with "Vector" (any case) in it
somewhere
   # printing every line in those files (with line numbers) with that
content.

findfile -d 0 .
# Print the path to every file that is in the current directory.

   findfile -d 1 .
   # Print the path to every file that is in the current directory or
any
   # of its child directories (but no subdirectories of the children).
ENDEXAMPLES

ARGS = {}
UNFLAGGED_ARGS = [ :name_regexp, :content_regexp ]
next_arg = UNFLAGGED_ARGS.first
ARGV.each{ |arg|
   case arg
     when '-d','--depth'
       next_arg = :max_depth
     when '-a','--showall'
       ARGS[:showall] = true
     when '-c','--usecase'
       ARGS[:usecase] = true
     when '-i','--includedirs'
       ARGS[:includedirs] = true
     when '-h','--help'
       ARGS[:help] = true
     else
       if next_arg
         if next_arg==:max_depth
           arg = arg.to_i + 1
         end
         ARGS[next_arg] = arg
         UNFLAGGED_ARGS.delete( next_arg )
       end
       next_arg = UNFLAGGED_ARGS.first
   end
}

if ARGS[:help] or !ARGS[:name_regexp]
   puts USAGE
   puts EXAMPLES if ARGS[:help]
   exit
end

class Dir
   def self.crawl( path, max_depth=nil, include_directories=false,
depth=0, &block )
     return if max_depth && depth > max_depth
     begin
       if File.directory?( path )
         yield( path, depth ) if include_directories
         files = Dir.entries( path ).select{ |f| true unless f=~/^\.
{1,2}$/ }
         unless files.empty?
           files.collect!{ |file_path|
             Dir.crawl( path+'/'+file_path, max_depth,
include_directories, depth+1, &block )
           }.flatten!
         end
         return files
       else
         yield( path, depth )
       end
     rescue SystemCallError => the_error
       warn "ERROR: #{the_error}"
     end
   end

end

start_time = Time.new
name_match = Regexp.new(ARGS[:name_regexp], true )
content_match = ARGS[:content_regexp] && Regexp.new( ".
{0,20}#{ARGS[:content_regexp]}.{0,20}", !ARGS[:usecase] )

file_count = 0
matching_count = 0
Dir.crawl( '.', ARGS[:max_depth], ARGS[:includedirs] && !
content_match){ |file_path, depth|
   if File.split( file_path )[ 1 ] =~ name_match
     if content_match
       if ARGS[:showall]
         shown_file = false
         IO.readlines( file_path ).each_with_index{ |
line_text,line_number|
           if match = line_text[content_match]
             unless shown_file
               puts file_path
               matching_count += 1
               shown_file = true
             end
             puts ( "%5d: " % line_number ) + match
           end
         }
         puts " " if shown_file
       elsif IO.read( file_path ) =~ content_match
         puts file_path," #{$~}"," "
         matching_count += 1
       end
     else
       puts file_path
       matching_count += 1
     end
   end
   file_count += 1
}
elapsed = Time.new - start_time
puts "Found #{matching_count} file#{matching_count==1?'':'s'} (out of
#{file_count}) in #{elapsed} seconds"

You do have to watch for shell escaping of the regexp, either escaping
chars as needed or quoting your regexp:

Slim2:/usr/local/bin phrogz$ findfile \d
./findfile
./index_gem_repository.rb
./p4d
./rdoc
./rdoc-osa
./svnadmin
./svndumpfilter
./update_rubygems
Found 8 files (out of 40) in 0.001228 seconds

Slim2:/usr/local/bin phrogz$ findfile \\d
./p4
./p4d
./rot13
./sqlite3
Found 4 files (out of 40) in 0.001088 seconds

Slim2:/usr/local/bin phrogz$ findfile \\d$
./p4
./rot13
./sqlite3
Found 3 files (out of 40) in 0.001118 seconds

Slim2:/usr/local/bin phrogz$ findfile "\d$"
./p4
./rot13
./sqlite3
Found 3 files (out of 40) in 0.001298 seconds

···

On Oct 2, 6:17 am, Remco Hh <re...@huijdts.nl> wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Gavin_Kistner3 · 2 October 2007 13:50

Sorry, I just re-read your request and saw your desire for an array of
filenames. How about this:

Slim2:/usr/local/bin phrogz$ irb
irb(main):001:0> Dir[ '*' ]
=> ["erb", "fastri-server", "findfile", "fri", "gem", "gem_mirror",
"gem_server", "gemlock", "gemri", "gemwhich", "gpgen",
"index_gem_repository.rb", "irb", "lua", "luac", "mate",
"mongrel_rails", "p4", "p4d", "qri", "rails", "rake", "rdoc", "rdoc-
osa", "ri", "ri-emacs", "rot13", "ruby", "sql", "sqlite3", "svn",
"svnadmin", "svndumpfilter", "svnlook", "svnserve", "svnsync",
"svnversion", "swig", "testrb", "update_rubygems"]

irb(main):002:0> Dir[ '*' ].grep /\d$/
=> ["p4", "rot13", "sqlite3"]

You could use Dir.chdir to pick a working directory if you like.

···

On Oct 2, 6:17 am, Remco Hh <re...@huijdts.nl> wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Forum · 2 October 2007 13:58

Dir.glob("**/**").grep(/filename)
HTH
Robert

···

On 10/2/07, Remco Hh <remco@huijdts.nl> wrote:

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

remco
--
Posted via http://www.ruby-forum.com/\.

--
what do I think about Ruby?
http://ruby-smalltalk.blogspot.com/

Remco_Hh · 2 October 2007 16:37

everybody, thanks for the good advice
this is most helpfull

remco

Remco Hh wrote:

···

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance

remco

--
Posted via http://www.ruby-forum.com/\.

David_A_Black1 · 2 October 2007 13:54

Hi --

···

On Tue, 2 Oct 2007, 7stud -- wrote:

Remco Hh wrote:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

Try something like this:

results =

Dir.foreach("./programs_ruby") do |filename|
if filename.index("mod")
results << filename
end
end

p results

A little more concise:

results = Dir.entries("./programs_ruby").grep(/mod/)

Or you could do:

results = Dir["*mod*"]

to automatically exclude hidden files, if that's desired.

David

--
Upcoming training from Ruby Power and Light, LLC:
* Intro to Ruby on Rails, Edison, NJ, October 23-26
* Advancing with Rails, Edison, NJ, November 6-9
Both taught by David A. Black.
See http://www.rubypal.com for more info!

7stud1 · 2 October 2007 14:45

David A. Black wrote:

A little more concise:

results = Dir.entries("./programs_ruby").grep(/mod/)

Or you could do:

results = Dir["*mod*"]

to automatically exclude hidden files, if that's desired.

Thanks. I have some questions though. I notice that a lot of people
that post to the this forum don't employ iterators for reading input as
they go. Instead, they tend to slam everything into memory first, and
then they work on iterating over the data--often with no care at all if
they happen to create a copy or two of the data along the way. I always
try to ask myself, "What if the input is 2-3GB?" I realize that's
probably not going to be the case with filenames, but who knows? There
are multi Terabyte hard drives now. As a result, I always try to
iterate over input as I go rather than read it into memory in one chunk.
Is there something I am missing about ruby in that regard?

I assume that ruby iterators buffer file i/o. Is that not the case? Is
ruby so inefficient that you need to read everything into memory in the
biggest chunks possible to get reasonable performance while iterating
over data. Also, on a side note, it seems like it's standing operating
procedure to shuttle as much code as you can into shell commands. Is
that because people want to avoid using the ruby interpreter?

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
A directory "grep" in RUBY? ruby-talk	10	135	27 November 2006
Looking for a way to recursively for a string array through directories and subdirectories ruby-talk	7	228	20 September 2013
Finding files in a directory and regexp ruby-talk	2	118	27 July 2006
Search directory for containing specified string ruby-talk	3	127	4 February 2011
[ANN] search-0.0.1 ruby-talk	13	112	9 December 2007

Finding files with regular expressions

Related topics