Extract date from filenames using regex

I have my code which looks like this:

  delete= 5 + 2 #escape counting in weekends i.e Sat,Sun
  folders = $del_path
   puts delete_date = DateTime.now - delete

   regexp = Regexp.compile(/(\d{4}\d{2}\d{2})/)

   fileData = Struct.new(:name, :size)
   deleted_files =

  folders.each do |folder|
     Dir.glob(folder+"/*") do |file|
       match = regexp.match(File.basename(file));
       if match
         file_date = DateTime.parse(match[1])

When my file name is in the format, 20080331 for example, the script
will run successfully. However, if the filename has additional
characters added to it, say, risk20080331, it'll run an error. And i
reckon it's the cause of the above line.

         size = (File.size(file))/1024
         if delete_date > file_date
           deleted_files << fileData.new(file,size)
           FileUtils.rm_r file

           if File.exist?(file)==false
             puts "Files/Folders deleted: #{file} size: #{size} KB"
             end #if
           end #if
       end #if
     end #do
   end #each
end #if

So is there any way I can extract the date using regex or whichever way
simpler so I can compare the deletion date and execute the rm_r command?
Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Hi,

You can use File.mtime(file_name) which will return a Time object.

You can also match with /\d+/ (one or more digits):

("sdf555sadfsdfg")[/\d+/]

=> "555"

But watch out!:

("sdf555sadfs5867dfg")[/\d+/]

=> "555"

For a nice, object-oriented approach to file manipulation in Ruby, you
might want to check out Pathname in the standard library:
http://www.ruby-doc.org/stdlib/libdoc/pathname/rdoc/index.html

Dan

···

On Tue, Apr 22, 2008 at 11:48 PM, Clement Ow <clement.ow@asia.bnpparibas.com> wrote:

I have my code which looks like this:
>> delete= 5 + 2 #escape counting in weekends i.e Sat,Sun
>> folders = $del_path
>> puts delete_date = DateTime.now - delete

>> regexp = Regexp.compile(/(\d{4}\d{2}\d{2})/)

>> fileData = Struct.new(:name, :size)
>> deleted_files =

>> folders.each do |folder|
>> Dir.glob(folder+"/*") do |file|
>> match = regexp.match(File.basename(file));
>> if match
>> file_date = DateTime.parse(match[1])
When my file name is in the format, 20080331 for example, the script
will run successfully. However, if the filename has additional
characters added to it, say, risk20080331, it'll run an error. And i
reckon it's the cause of the above line.

>> size = (File.size(file))/1024
>> if delete_date > file_date
>> deleted_files << fileData.new(file,size)
>> FileUtils.rm_r file

>> if File.exist?(file)==false
>> puts "Files/Folders deleted: #{file} size: #{size} KB"
>> end #if
>> end #if
>> end #if
>> end #do
>> end #each
>> end #if

So is there any way I can extract the date using regex or whichever way
simpler so I can compare the deletion date and execute the rm_r command?
Thanks!
--
Posted via http://www.ruby-forum.com/\.

Sorry, what is the error? Cause this works for me:

irb(main):001:0> regexp = Regexp.compile(/(\d{4}\d{2}\d{2})/)
=> /(\d{4}\d{2}\d{2})/
irb(main):002:0> match = regexp.match("risk20080331.log")
=> #<MatchData:0xb7ce20f4>
irb(main):003:0> match[1]
=> "20080331"
irb(main):005:0> require 'date'
=> true
irb(main):006:0> DateTime.parse(match[1])
=> #<DateTime: 4909113/2,0,2299161>

So any string that contains 4 digits followed by 2 digits followed by
2 digits will match that regexp,
independently of what it has around the numbers:

irb(main):007:0> regexp.match("12345678")[1]
=> "12345678"
irb(main):008:0> regexp.match("12345678asdfasdf")[1]
=> "12345678"
irb(main):009:0> regexp.match("asdfasdf12345678asdfasdf")[1]
=> "12345678"
irb(main):010:0> regexp.match("asdfasdf12345678")[1]
=> "12345678"

Jesus.

···

On Wed, Apr 23, 2008 at 5:48 AM, Clement Ow <clement.ow@asia.bnpparibas.com> wrote:

I have my code which looks like this:
>> delete= 5 + 2 #escape counting in weekends i.e Sat,Sun
>> folders = $del_path
>> puts delete_date = DateTime.now - delete

>> regexp = Regexp.compile(/(\d{4}\d{2}\d{2})/)

>> fileData = Struct.new(:name, :size)
>> deleted_files =

>> folders.each do |folder|
>> Dir.glob(folder+"/*") do |file|
>> match = regexp.match(File.basename(file));
>> if match
>> file_date = DateTime.parse(match[1])
When my file name is in the format, 20080331 for example, the script
will run successfully. However, if the filename has additional
characters added to it, say, risk20080331, it'll run an error. And i
reckon it's the cause of the above line.

Daniel Finnie wrote:

Hi,

You can use File.mtime(file_name) which will return a Time object.

You can also match with /\d+/ (one or more digits):

("sdf555sadfsdfg")[/\d+/]

=> "555"

But watch out!:

("sdf555sadfs5867dfg")[/\d+/]

=> "555"

For a nice, object-oriented approach to file manipulation in Ruby, you
might want to check out Pathname in the standard library:
http://www.ruby-doc.org/stdlib/libdoc/pathname/rdoc/index.html

Dan

On Tue, Apr 22, 2008 at 11:48 PM, Clement Ow

Thanks Daniel for your input. I tried using /\d+/ but it'll extract
files that have even 2 numbers to i decided to use
/(\d\d)(\d\d)(\d\d\d\d)/ instead. It enabled me to run the command on
certain files but not all files and the following error occured:

Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_09042008.dat size: 74 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_10042008.dat size: 81 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_11042008.dat size: 80 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_14042008.dat size: 79 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_15042008.dat size: 77 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_16042008.dat size: 77 KB
c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
(ArgumentError
)
        from c:/ruby/lib/ruby/1.8/date.rb:1583:in `parse'
        from testing.conf.rb:166:in `delFiles'
        from testing.conf.rb:163:in `glob'
        from testing.conf.rb:163:in `delFiles'
        from testing.conf.rb:162:in `each'
        from testing.conf.rb:162:in `delFiles'
        from testing.conf.rb:204

Is there anything wrong with mycode that prevents deleting all the files
that I want?

···

--
Posted via http://www.ruby-forum.com/\.

OK, now I see the problem. The file that is failing has a number like this:
16042008. The DateTime.parse method is trying to parse the date as:

1604-20-08 which is obviously an invalid date (month > 12).
There are two solutions to this problem:

1.- Change DateTime.parse to DateTime.strptime passing a format
that describes where in the string you have the two digits of the day, the month
and the four digits of the date. I haven't been able to gather a quick example,
cause I don't find a reference for the format string (any help here
appreciated).
The doc refers me to the date/format.rb for details and I don't see
anything clear
there.

2.- Change the regexp a little bit so you capture the day, the month
and the year
in separate groups and create the DateTime using the three values:

irb(main):011:0> regexp = Regexp.compile(/(\d{4})(\d{2})(\d{2})/)
=> /(\d{4})(\d{2})(\d{2})/
irb(main):012:0> m = regexp.match("20080103asdfasdf")
=> #<MatchData:0xb7c11a6c>
irb(main):014:0> d = DateTime.civil m[1].to_i, m[2].to_i, m[3].to_i
=> #<DateTime: 4908937/2,0,2299161>
irb(main):015:0> d.to_s
=> "2008-01-03T00:00:00+00:00"

I think you can apply the above changes to the script and it will work.
Let me know,

Jesus.

···

On Wed, Apr 23, 2008 at 9:14 AM, Clement Ow <clement.ow@asia.bnpparibas.com> wrote:

Thanks Daniel for your input. I tried using /\d+/ but it'll extract
files that have even 2 numbers to i decided to use
/(\d\d)(\d\d)(\d\d\d\d)/ instead. It enabled me to run the command on
certain files but not all files and the following error occured:

report_16042008.dat size: 77 KB
c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
(ArgumentError
)
        from c:/ruby/lib/ruby/1.8/date.rb:1583:in `parse'
        from testing.conf.rb:166:in `delFiles'
        from testing.conf.rb:163:in `glob'
        from testing.conf.rb:163:in `delFiles'
        from testing.conf.rb:162:in `each'
        from testing.conf.rb:162:in `delFiles'
        from testing.conf.rb:204

Is there anything wrong with mycode that prevents deleting all the files
that I want?

Clement Ow wrote:

Thanks Daniel for your input. I tried using /\d+/ but it'll extract
files that have even 2 numbers to i decided to use
/(\d\d)(\d\d)(\d\d\d\d)/ instead. It enabled me to run the command on
certain files but not all files and the following error occured:

Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_09042008.dat size: 74 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_10042008.dat size: 81 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_11042008.dat size: 80 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_14042008.dat size: 79 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_15042008.dat size: 77 KB
Files/Folders deleted:
//sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
report_16042008.dat size: 77 KB
c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
(ArgumentError
)
        from c:/ruby/lib/ruby/1.8/date.rb:1583:in `parse'
        from testing.conf.rb:166:in `delFiles'
        from testing.conf.rb:163:in `glob'
        from testing.conf.rb:163:in `delFiles'
        from testing.conf.rb:162:in `each'
        from testing.conf.rb:162:in `delFiles'
        from testing.conf.rb:204

Is there anything wrong with mycode that prevents deleting all the files
that I want?

require 'date'

str = 'sins00114178'
pattern = /(\d\d)(\d\d)(\d\d\d\d)/

match_obj = pattern.match(str)
puts match_obj[1]

file_date = DateTime.parse(match_obj[1])

--output:--
00
/usr/lib/ruby/1.8/date.rb:1214:in `new_with_hash': invalid date
(ArgumentError)
        from /usr/lib/ruby/1.8/date.rb:1258:in `parse'
        from r1test.rb:9

···

--
Posted via http://www.ruby-forum.com/\.

After a couple of trial/error tests this seems to work:

DateTime.strptime "16042008", "%d%M%Y"

So any of the two solutions will work for you.

Jesus.

···

On Wed, Apr 23, 2008 at 9:50 AM, Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

On Wed, Apr 23, 2008 at 9:14 AM, Clement Ow > > <clement.ow@asia.bnpparibas.com> wrote:

1.- Change DateTime.parse to DateTime.strptime passing a format
that describes where in the string you have the two digits of the day, the month
and the four digits of the date. I haven't been able to gather a quick example,
cause I don't find a reference for the format string (any help here
appreciated).
The doc refers me to the date/format.rb for details and I don't see
anything clear
there.

You are right, I overlooked the fact that he had added more parens
in the regexp, so he was passing only two digits to DateTime.parse.
Anyway the changes I proposed should work for him.

Jesus.

···

On Wed, Apr 23, 2008 at 10:01 AM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:

Clement Ow wrote:
> //sins00114178/mad/Singapore/CubeMorningLDN/HKD_CUBE_risk
> report_16042008.dat size: 77 KB
> c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
> (ArgumentError
> )
> from c:/ruby/lib/ruby/1.8/date.rb:1583:in `parse'
> from testing.conf.rb:166:in `delFiles'
> from testing.conf.rb:163:in `glob'
> from testing.conf.rb:163:in `delFiles'
> from testing.conf.rb:162:in `each'
> from testing.conf.rb:162:in `delFiles'
> from testing.conf.rb:204
>
> Is there anything wrong with mycode that prevents deleting all the files
> that I want?

require 'date'

str = 'sins00114178'
pattern = /(\d\d)(\d\d)(\d\d\d\d)/

match_obj = pattern.match(str)
puts match_obj[1]

file_date = DateTime.parse(match_obj[1])

--output:--
00
/usr/lib/ruby/1.8/date.rb:1214:in `new_with_hash': invalid date
(ArgumentError)
        from /usr/lib/ruby/1.8/date.rb:1258:in `parse'
        from r1test.rb:9

Jesús Gabriel y Galán wrote:

1.- Change DateTime.parse to DateTime.strptime passing a format
that describes where in the string you have the two digits of the day, the month
and the four digits of the date. I haven't been able to gather a quick example,
cause I don't find a reference for the format string (any help here
appreciated).
The doc refers me to the date/format.rb for details and I don't see
anything clear
there.

After a couple of trial/error tests this seems to work:

DateTime.strptime "16042008", "%d%M%Y"

So any of the two solutions will work for you.

Jesus.

Hi Jesus,
First of all thanks for your help!
However,
Despite using
d= DateTime.civil (match[1].to_i, match[2].to_i, match[3].to_i)
          file_date=d.to_s
OR

file_date = DateTime.strptime (match[1], "%d%M%Y")

it still gives me invalid date as the error msg. But when i run it in
the fxri it seems to work fine.. This only seems to happen when the date
format is ddmmyyyy, but for yyyymmdd it has no problems though.. Any
ideas anyone? I have cracked my head but to no avail.

···

On Wed, Apr 23, 2008 at 9:50 AM, Jes?briel y > Galᮦlt;jgabrielygalan@gmail.com> wrote:

On Wed, Apr 23, 2008 at 9:14 AM, Clement Ow >> >> <clement.ow@asia.bnpparibas.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Can you post the smallest example that fails? Do you have files with
different date formats?

Jesus.

···

On Wed, Apr 23, 2008 at 10:38 AM, Clement Ow <clement.ow@asia.bnpparibas.com> wrote:

Jesús Gabriel y Galán wrote:

> On Wed, Apr 23, 2008 at 9:50 AM, Jes?briel y > > Galᮦlt;jgabrielygalan@gmail.com> wrote:
>> On Wed, Apr 23, 2008 at 9:14 AM, Clement Ow > >> > >> <clement.ow@asia.bnpparibas.com> wrote:
>
>> 1.- Change DateTime.parse to DateTime.strptime passing a format
>> that describes where in the string you have the two digits of the day, the month
>> and the four digits of the date. I haven't been able to gather a quick example,
>> cause I don't find a reference for the format string (any help here
>> appreciated).
>> The doc refers me to the date/format.rb for details and I don't see
>> anything clear
>> there.
>
> After a couple of trial/error tests this seems to work:
>
> DateTime.strptime "16042008", "%d%M%Y"
>
> So any of the two solutions will work for you.
>
> Jesus.

Hi Jesus,
First of all thanks for your help!
However,
Despite using
d= DateTime.civil (match[1].to_i, match[2].to_i, match[3].to_i)
          file_date=d.to_s
OR

file_date = DateTime.strptime (match[1], "%d%M%Y")

it still gives me invalid date as the error msg. But when i run it in
the fxri it seems to work fine.. This only seems to happen when the date
format is ddmmyyyy, but for yyyymmdd it has no problems though.. Any
ideas anyone? I have cracked my head but to no avail.

Can you post the smallest example that fails? Do you have files with
different date formats?

Jesus.

    delete_date = DateTime.now - delete

    regexp = Regexp.compile(/(\d\d)(\d\d)(\d\d\d\d)/)

    fileData = Struct.new(:name, :size)
    deleted_files =

    folders.each do |folder|
      Dir.glob(folder+"/*") do |file|
      puts match = regexp.match(File.basename(file))
        if match
          file_date = DateTime.strptime(match[1] , fmt='%d%M%Y')
          size = (File.size(file))/1024
          if delete_date > file_date
            deleted_files << fileData.new(file,size)
            FileUtils.rm_r file
            if File.exist?(file)==false
              puts "Files/Folders deleted: #{file} size: #{size} KB"
              end #if
            end #if
        end #if
      end #do
    end #each
  end #if
end #delFiles

it'll show this error:

c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
(ArgumentError
)
        from c:/ruby/lib/ruby/1.8/date.rb:1563:in `strptime'
        from testing.conf.rb:166:in `delFiles'

···

I do have files with different date formats, but the format yyyymmdd works when i use DateTime.parse maybe because DateTime accepts this format? however if i use strptime it also cant work. Any help will be greatly appreciated =)

--
Posted via http://www.ruby-forum.com/\.

> Can you post the smallest example that fails? Do you have files with
> different date formats?
>
> Jesus.

    delete_date = DateTime.now - delete

    regexp = Regexp.compile(/(\d\d)(\d\d)(\d\d\d\d)/)

    fileData = Struct.new(:name, :size)
    deleted_files =

    folders.each do |folder|
      Dir.glob(folder+"/*") do |file|
      puts match = regexp.match(File.basename(file))
        if match
          file_date = DateTime.strptime(match[1] , fmt='%d%M%Y')

          size = (File.size(file))/1024
          if delete_date > file_date
            deleted_files << fileData.new(file,size)
            FileUtils.rm_r file
            if File.exist?(file)==false
              puts "Files/Folders deleted: #{file} size: #{size} KB"
              end #if
            end #if
        end #if
      end #do
    end #each
  end #if
end #delFiles

>>it'll show this error:

c:/ruby/lib/ruby/1.8/date.rb:1536:in `new_by_frags': invalid date
(ArgumentError
)
        from c:/ruby/lib/ruby/1.8/date.rb:1563:in `strptime'

        from testing.conf.rb:166:in `delFiles'

>>I do have files with different date formats, but the format yyyymmdd works when i use DateTime.parse maybe
because DateTime accepts this format?

Yes, that's exactly the issue.

however if i use strptime it also cant work. Any help will be greatly appreciated =)

If you have files with different formats, you will have to know which
format each file is, because DateTime.parse is expecting yyyymmdd,
while strptime is expecting whatever format you pass it, but only one
format. If the dates are current dates, and are only these two formats
(yyyymmdd or ddmmyyyy) I think this is safe:

regexp = /(\d{8})/
match = regexp.match(file_name)
file_date = nil
begin
  file_date = DateTime.parse(match[1])
rescue ArgumentError
  file_date = DateTime.strptime(match[1], "%d%M%Y")
end

However, if you have arbitrary dates, this can lead to unexpected
results. For example:

19011902

will result in 1901-19-02 while maybe you meant 19-01-1902.
Also, I think the above is safe because the century (20xx for the
year) is not a valid month, but there might be some corner case I
haven't realized.

Jesus.

···

On Wed, Apr 23, 2008 at 11:01 AM, Clement Ow <clement.ow@asia.bnpparibas.com> wrote:

Jesús Gabriel y Galán wrote:

          size = (File.size(file))/1024
  end #if

>>I do have files with different date formats, but the format yyyymmdd works when i use DateTime.parse maybe
because DateTime accepts this format?

Yes, that's exactly the issue.

however if i use strptime it also cant work. Any help will be greatly appreciated =)

If you have files with different formats, you will have to know which
format each file is, because DateTime.parse is expecting yyyymmdd,
while strptime is expecting whatever format you pass it, but only one
format. If the dates are current dates, and are only these two formats
(yyyymmdd or ddmmyyyy) I think this is safe:

regexp = /(\d{8})/
match = regexp.match(file_name)
file_date = nil
begin
  file_date = DateTime.parse(match[1])
rescue ArgumentError
  file_date = DateTime.strptime(match[1], "%d%M%Y")
end

However, if you have arbitrary dates, this can lead to unexpected
results. For example:

19011902

will result in 1901-19-02 while maybe you meant 19-01-1902.
Also, I think the above is safe because the century (20xx for the
year) is not a valid month, but there might be some corner case I
haven't realized.

Jesus.

Hey Jesus,

file_date = DateTime.strptime(match[1], "%d%M%Y")
when the above is being put, it will parse the date as eg.28012008 even
thought the date is 28032008. So, I tried some trail and error and i
used this:
file_date = DateTime.strptime(match[1], "%d%m%Y")
and bingo, it parses the date correctly and thus being able to run the
command to delete. Thanks alot for your time and help! =)

Cheers!

···

On Wed, Apr 23, 2008 at 11:01 AM, Clement Ow > <clement.ow@asia.bnpparibas.com> wrote:

--
Posted via http://www.ruby-forum.com/\.