Parsing data with ruby

Hiya,

I'm trying to figure out how I would go about parsing a group of files
containing raw log data (results of crontab -l) and convert this data
into a human readable format. The entries in the files are like so:

10,25,40,55 * * * * /some/cron/here > /dev/null 2>&1
30 */4 * * * /some/cron/here

And so on.

I want to get them into this format so I can enter the data into a
spreadsheet:

Cronjob | # of Servers | Every minute | Every hour | Every day | Every
week | Every month

···

-----------------------------------------------------------------------------------------
CronHere> 10 | N | N | Y | Y | Y
CronHere> 8 | Y | N | N | Y | Y

And so on. Anything that has a * would get a N in the cron job, while
anything that has anything but a * for the time would get a Y in the
results.

Can anyone give me some examples of how I might go about doing this with
ruby? Either doing it with pure ruby, or invoking awk within ruby is
fine.

Thanks!

--
Posted via http://www.ruby-forum.com/.

Well this is a bit of a hack but just to simply parse the lines you could use:

a = "10,25,40,55 * * * * /some/cron/here > /dev/null 2>&1\n30 */4 * *
* /some/cron/here"

a.split(/\n/).each do |line|
  if line =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)$/
    output = Array.new
    output << $6
    output << "?"
    output << ($1 == "*" ? 'N' : 'Y')
    output << ($2 == "*" ? 'N' : 'Y')
    output << ($3 == "*" ? 'N' : 'Y')
    output << ($4 == "*" ? 'N' : 'Y')
    output << ($5 == "*" ? 'N' : 'Y')

    p output
  end
end

Which will output:

["/some/cron/here > /dev/null 2>&1", "?", "Y", "N", "N", "N", "N"]
["/some/cron/here", "?", "Y", "Y", "N", "N", "N"]

Note that I left the '?' for the number of servers as that is your
problem - Think of it as an exercise for the reader :slight_smile:

Also you might also need to match the Paul Vixie extensions for cron
such as @hourly and @daily

Peter Hickman wrote in post #1030933:

Well this is a bit of a hack but just to simply parse the lines you
could use:

a = "10,25,40,55 * * * * /some/cron/here > /dev/null 2>&1\n30 */4 * *
* /some/cron/here"

a.split(/\n/).each do |line|
  if line =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)$/
    output = Array.new
    output << $6
    output << "?"
    output << ($1 == "*" ? 'N' : 'Y')
    output << ($2 == "*" ? 'N' : 'Y')
    output << ($3 == "*" ? 'N' : 'Y')
    output << ($4 == "*" ? 'N' : 'Y')
    output << ($5 == "*" ? 'N' : 'Y')

    p output
  end
end

Which will output:

["/some/cron/here > /dev/null 2>&1", "?", "Y", "N", "N", "N", "N"]
["/some/cron/here", "?", "Y", "Y", "N", "N", "N"]

Note that I left the '?' for the number of servers as that is your
problem - Think of it as an exercise for the reader :slight_smile:

Also you might also need to match the Paul Vixie extensions for cron
such as @hourly and @daily

Thanks for the reply. This looks like a step in the right direction. The
only problem that I can think of is that the cron jobs aren't always
going to have the same time, so I don't think having the " a =
"10,25,40,55 * * * *" portion would work, as it would probably only find
one entry of that since the times the crons execute are always going to
vary.

···

--
Posted via http://www.ruby-forum.com/\.

Peter Hickman wrote in post #1030933:

Well this is a bit of a hack but just to simply parse the lines you
could use:

a = "10,25,40,55 * * * * /some/cron/here > /dev/null 2>&1\n30 */4 * *
* /some/cron/here"

a.split(/\n/).each do |line|
if line =~ /^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(.*)$/
output = Array.new
output << $6
output << "?"
output << ($1 == "*" ? 'N' : 'Y')
output << ($2 == "*" ? 'N' : 'Y')
output << ($3 == "*" ? 'N' : 'Y')
output << ($4 == "*" ? 'N' : 'Y')
output << ($5 == "*" ? 'N' : 'Y')

p output

end
end

Which will output:

["/some/cron/here > /dev/null 2>&1", "?", "Y", "N", "N", "N", "N"]
["/some/cron/here", "?", "Y", "Y", "N", "N", "N"]

Note that I left the '?' for the number of servers as that is your
problem - Think of it as an exercise for the reader :slight_smile:

Also you might also need to match the Paul Vixie extensions for cron
such as @hourly and @daily

Thanks for the reply. This looks like a step in the right direction. The
only problem that I can think of is that the cron jobs aren't always
going to have the same time, so I don't think having the " a =
"10,25,40,55 * * * *" portion would work, as it would probably only find
one entry of that since the times the crons execute are always going to
vary.

split takes a second argument that specifies a limit. It will greatly
simplify parsing the lines in this case.

line.split(/ /, 6)

=> ["10,25,40,55", "*", "*", "*", "*", "/some/cron/here > /dev/null 2>&1"]

Regards,
Ammar

···

On Wed, Nov 9, 2011 at 12:49 AM, Ronald Craft <admin@ssihosting.com> wrote:

The a = "..." part was to allow me to give you a runnable example, you
will have to figure out how to read the crontabs yourself. I dont know
how your system is set up. Also Ammar Ali's suggestion is a
considerable improvement on my code, you should go with that.