Awk regexp search

Hi,

The last bit of a bash program is still resisting me. Here is the code I used before:

awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat

How would you do that in Ruby? I just need to locate this regexp in the file, and get the following value in the same line. I've tried something like,

output_file=IO.readlines('../OUTPUTFILES/Output.dat').to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print elt.to_s, '\t'}

but it clearly doesn't work.

Best regards,

baptiste

Hi,

The last bit of a bash program is still resisting me. Here is the
code I used before:

awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat

How would you do that in Ruby? I just need to locate this regexp in
the file, and get the following value in the same line. I've tried
something like,

output_file=IO.readlines('../OUTPUTFILES/Output.dat').to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print
elt.to_s, '\t'}

but it clearly doesn't work.

First off, don't be too quick to abandon awk. I took a dozen lines of Ruby
someone wrote that did almost what I wanted and ported/replaced it with
three lines of sh/awk. For what awk can do, it is excellent.

To set up the equivalent of the awk code above, you want something like
this (note: untested):

ARGF.each { |line|
  case line
  when /scattering efficiency/
    puts line.split(/\s+/)[3] #note 3 instead of 4
  end
}

Note that this basic framework does not support awk's /regex1/,/regex2/
notation that captures lines between (and including) the lines matching
those regular expressions.

Best regards,
baptiste

--Greg

···

On Sun, Jul 01, 2007 at 10:13:16PM +0900, baptiste Augui? wrote:

Hi --

Hi,

The last bit of a bash program is still resisting me. Here is the code I used before:

awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat

How would you do that in Ruby? I just need to locate this regexp in the file, and get the following value in the same line. I've tried something like,

output_file=IO.readlines('../OUTPUTFILES/Output.dat').to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print elt.to_s, '\t'}

but it clearly doesn't work.

Here's a little test/demo (using stdin) -- I've added => to the output
lines:

$ ruby -ne 'puts $1 if /scattering efficiency\s+(\S+)/'
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

David

···

On Sun, 1 Jul 2007, baptiste Auguié wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

The problem is that we have no idea where "scattering efficency" is
relatively to $4 :frowning:
However

ruby -ane 'puts $F[3] if /scattering efficency/' ../ton/beau/fichier

does the same as the awk script above

Side Remark:
domage que l'on ne puisse utiliser mes options préfèrées: -anpe :wink:

Robert

···

On 7/1/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

Hi --

On Sun, 1 Jul 2007, baptiste Auguié wrote:

> Hi,
>
> The last bit of a bash program is still resisting me. Here is the code I used
> before:
>
> awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat
>
> How would you do that in Ruby? I just need to locate this regexp in the file,
> and get the following value in the same line. I've tried something like,
>

$ ruby -ne 'puts $1 if /scattering efficiency\s+(\S+)/'
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

Note that this basic framework does not support awk's /regex1/,/regex2/
notation that captures lines between (and including) the lines matching
those regular expressions.

Best regards,
baptiste

--Greg

how about something like this simple implementation

# AWK in Ruby

module AWK
  class ClassAwk
      def initialize(filename = "")
        @NR = 0; # record number
        @NF = 0; # field number
        @FS = /\s+/; # field separator
        @line = ""; # line matched
        @fields = ; # fields of macthed line
        @trace = ; # regexp trace

        if (filename == "")
          @file = ARGF;
        else
          @file = File.open(filename, "r").close;;
        end
      end

      #NOTE: every rule has to be in separate line
      #to get unique rule id
      def rule(regexp1, regexp2 = nil)
        msg = "regexp parameter must be Regexp";
        raise ArgumentError, msg unless regexp1.kind_of?(Regexp);

        if regexp2 == nil
          if @line =~ regexp1
            @fields = @line.split(@FS)
            yield
          end
        else
          raise ArgumentError, msg unless regexp2.kind_of?(Regexp);
          rule_id = /.+:([0-9]+)/.match(caller.first.to_s)[1].to_i;

          @trace[rule_id] = true if @line =~ regexp1;
          if @trace[rule_id]
            @fields = @line.split(@FS)
            yield
          end
          @trace[rule_id] = false if @line =~ regexp2;
        end
      end

      def analyze()
        @NR = 0;
        ARGF.each { |@line|
          @line = @line.chop;
          @NR += 1;
          yield
        }
      end

      #get paricular field
      def getField(index)
        output = "";

        if (index == 0)
          output = @line;
        else
          if index - 1 < @fields.length
            return @fields[index - 1];
          end
        end
      end

      #get NR (record number)
      def getNR
        return @NR;
      end

      #get number of fileds
      def getNF
        return @fields.length;
      end
  end
end

and an example how to use it:

require "awk.rb"

awk = AWK::ClassAwk.new();

awk.analyze() {
  awk.rule(/start1/, /stop1/) {
    print "1, NR:", awk.getNR(),", ";
    print "NF: ", awk.getNF(),", ", awk.getField(0), "\n";
  };
  awk.rule(/start2/, /stop2/) {
    print "2, NR:", awk.getNR(),", ";
    print "NF: ", awk.getNF(),", ", awk.getField(0), "\n";
  };
  awk.rule(/start1/) {
     print awk.getField(0);
  };
}

···

--
Posted via http://www.ruby-forum.com/\.

Thanks everybody,

this piece of code works fine for me,

output_file.each { |line|
    case line
    when /scattering efficiency/
     qsca << line.split(/\s+/)[4]
    end
}

although I realize now that "output_file" may contain a duplicate of the line I want to extract. How can I specify to take only the last occurrence?

The file to parse looks something like that,

a few lines ...

      scattering efficiency = 2.8009E-08
      extinction efficiency = 4.9374E-06

some lines in between ...

      scattering efficiency = 2.7957E-08
      extinction efficiency = 4.9374E-06
...

Hi --

> Hi,
>
> The last bit of a bash program is still resisting me. Here is the code I used
> before:
>
> awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat
>
> How would you do that in Ruby? I just need to locate this regexp in the file,
> and get the following value in the same line. I've tried something like,
>

$ ruby -ne 'puts $1 if /scattering efficiency\s+(\S+)/'
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

The problem is that we have no idea where "scattering efficency" is
relatively to $4 :frowning:
However

ruby -ane 'puts $F[3] if /scattering efficency/' ../ton/beau/fichier

does the same as the awk script above

this piece of code needs to be part of a script, not a one line call - > how would that work in a Ruby script?

Side Remark:
domage que l'on ne puisse utiliser mes options préfèrées: -anpe :wink:

; )

Robert

Thanks,

baptiste

···

On 1 Jul 2007, at 15:51, Robert Dober wrote:

On 7/1/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

On Sun, 1 Jul 2007, baptiste Auguié wrote:

Hi --

Hi --

> Hi,
>
> The last bit of a bash program is still resisting me. Here is the code I used
> before:
>
> awk '/scattering efficiency/{print $4}' ../OUTPUTFILES/Output.dat
>
> How would you do that in Ruby? I just need to locate this regexp in the file,
> and get the following value in the same line. I've tried something like,
>

$ ruby -ne 'puts $1 if /scattering efficiency\s+(\S+)/'
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

The problem is that we have no idea where "scattering efficency" is
relatively to $4 :frowning:

He said "the following value", so I assumed it would be a \S+ match
after a \s+ match.

However

ruby -ane 'puts $F[3] if /scattering efficency/' ../ton/beau/fichier

does the same as the awk script above

Wow -- I really must brush up on 'man ruby' :slight_smile:

David

···

On Sun, 1 Jul 2007, Robert Dober wrote:

On 7/1/07, dblack@wobblini.net <dblack@wobblini.net> wrote:

On Sun, 1 Jul 2007, baptiste Auguié wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Few mistakes

instead of

          @file = File.open(filename, "r").close;;

should be
            @file = File.open(filename, "r");
and

        ARGF.each { |@line|

replace with

        @file.each { |@line|

:slight_smile:

···

--
Posted via http://www.ruby-forum.com/\.

> ruby -ane 'puts $F[3] if /scattering efficency/' ../ton/beau/fichier
>
> does the same as the awk script above
>

this piece of code needs to be part of a script, not a one line call -
> how would that work in a Ruby script?

I am not sure I understand, this command can be put into a bash script
as the awk example above, if however you want to replace the bash
script by a Ruby script it will be necessary to read the File
explicitly

puts File.readlines("/le/beau/fichier"/).grep(/scattering efficency/).
    map{|line| line.split[3]}

or only the first

puts File.readlines("/le/beau/fichier"/).grep(....).first.split[3]

or if performance is an issue and you do not want to read further lines

File.each("/le/grand/fichier") do
   >line>
   next unless /.../ === line
   puts line.split[3]
   break
end

Cheers
Robert

···

On 7/1/07, baptiste Auguié <ba208@exeter.ac.uk> wrote:

> Side Remark:
> domage que l'on ne puisse utiliser mes options préfèrées: -anpe :wink:

; )

>
> Robert

Thanks,

baptiste

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

small update to make it easier

# AWK implementation

module AWK
  class ClassAwk
      def initialize(filename = "")
        @NR = 0; # record number
        @NF = 0; # field number
        @FS = /\s+/; # field separator
        @f = []; # fields of macthed line, f[0] - line
        @trace = []; # regexp trace

        #input file
        @file = filename == "" ? ARGF: File.open(filename, "r");
      end

      #NOTE: every rule has to be in separate line
      #to get unique rule id
      def rule(regexp1, regexp2 = nil)
        msg = "regexp parameter must be Regexp";
        raise ArgumentError, msg unless regexp1.kind_of?(Regexp);

        if regexp2 == nil
          yield if @f[0] =~ regexp1;
        else
          raise ArgumentError, msg unless regexp2.kind_of?(Regexp);
          rule_id = /.+:([0-9]+)/.match(caller.first.to_s)[1].to_i;

          @trace[rule_id] = true if @f[0] =~ regexp1;
          yield if @trace[rule_id]
          @trace[rule_id] = false if @f[0] =~ regexp2;
        end
      end

      def analyze()
        @NR = 0;
        @file.each { |line|
          @NR += 1;
          @f = line.split(@FS)
          @NF = @f.length
          @f.unshift(line.chop);
          yield
        }
      end

      attr_reader :NR, :NF, :f;
   end
end

example:

require "awk.rb"

awk = AWK::ClassAwk.new();

awk.analyze() {
  awk.rule(/start1/, /stop1/) {
    print "1, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n";
  };
  awk.rule(/start2/, /stop2/) {
    print "2, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n";
  };
  awk.rule(/start1/) {
    print "3, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n";
  };
}

···

--
Posted via http://www.ruby-forum.com/.

Perfect! exactly what I wanted (with last instead of first). I just didn't know about this grep command in Ruby, basically.

Thanks,

baptiste

···

On 1 Jul 2007, at 19:09, Robert Dober wrote:

or only the first

puts File.readlines("/le/beau/fichier"/).grep(....).first.split[3]