How to strip ruby comments in a ruby line of code?

Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

···

________________

Long description :

For my dsl project, i'm loading my dsl files and applying a small
preprocess on each line before performing a global instance_eval on the
preprocessed file.

Basically, in my dsl language, it is possible to put a label followed by
a ":" starting at the beginning of a line like this:
    my_label: here_is_a_dsl(arg1, arg2)

This label may be followed by a dsl instruction.

The preprocessor is transforming the previous line to this line:

    newLabel(:my_label) { here_is_a_dsl(arg1, arg2) }

using the following code:
    append = ""
    File.open(file).each do |line|
      match = line.match(/^([a-zA-Z_]\w+):[\s\r\n]+(.*)/)
      if ( match.nil?)
        append += line
      else
        append += "newLabel(:#{match[1]}) { #{match[2]} }\n"
      end
    end

The problem arise when there is a comment at the end of the input line :
    my_label: here_is_a_dsl(arg1, arg2) # my comments

It's then generating the following line:
    newLabel(:my_label) { here_is_a_dsl(arg1, arg2) # my comments }

Meanning that the "}" end block is commented and having a parse error on
the whole file.

I could put a newline after the match like this :
    newLabel(:my_label) { here_is_a_dsl(arg1, arg2) # my comments
    }

Unfornutately, i'm no longer able to debug my dsl language, because the
lines are not matching the preprocessed line.

-----

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!
--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

···

--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

I could put a newline after the match like this :
    newLabel(:my_label) { here_is_a_dsl(arg1, arg2) # my comments
    }

Unfornutately, i'm no longer able to debug my dsl language, because the
lines are not matching the preprocessed line.

However if you eval each line individually, then you can pass in the
source line number.

def foo; end
src = "foo\nfoo\nbar"
src.each_with_index do |line,i|
  eval "#{line} {\n}", binding, "DSL", i+1
end

# Result:
DSL:3: undefined method `bar' for main:Object (NoMethodError)

Otherwise, if every input line maps to exactly two output lines, you can
just patch up the line number in the exception by dividing by two.

src = "foo\nfoo\nbar\n"
begin
  eval src.gsub(/\n/, "{\n}\n"), binding, "DSL", 1
rescue => e
  if e.backtrace.first =~ /\A(.*):(\d+)\z/
    e.backtrace.first.replace "#{$1}:#{($2.to_i+1) / 2}"
  end
  raise e
end

···

--
Posted via http://www.ruby-forum.com/.

Aldric Giacomoni wrote:

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
   myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!

···

--
Posted via http://www.ruby-forum.com/.

Brian Candler wrote:

def foo; end
src = "foo\nfoo\nbar"
src.each_with_index do |line,i|
  eval "#{line} {\n}", binding, "DSL", i+1
end

# Result:
DSL:3: undefined method `bar' for main:Object (NoMethodError)

Wooo, thanks Brian!

···

--
Posted via http://www.ruby-forum.com/.

Aldric Giacomoni wrote:

Alexandre Mutel wrote:

Short description : My question is : do you know any available method,
giving the string of a Ruby line of code, to remove comments from this
line of code?

I would like to have something really simple and not being forced to use
a full ruby language parser to parse those lines and remove the
comments.

Any idea?

Thanks!

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

It's not possible to do this reliably with regular experessions, because
of the interaction of # with quoting constructs. You'll need a parser
(Treetop can help make one).

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Aldric Giacomoni wrote:

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
   myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!

Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.
The other solution you got is more elegant, though.. :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Brian Candler wrote:

def foo; end
src = "foo\nfoo\nbar"
src.each_with_index do |line,i|
  eval "#{line} {\n}", binding, "DSL", i+1
end

# Result:
DSL:3: undefined method `bar' for main:Object (NoMethodError)

Wooo, thanks Brian!

Woop, i was to fast. In fact, i need an eval on the whole file, because
my dsl language allow ruby code to be used (and so definition of
methods... etc.)

···

--
Posted via http://www.ruby-forum.com/.

Aldric Giacomoni wrote:

Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.
The other solution you got is more elegant, though.. :slight_smile:

hum, not sure the greedy is helping there:

line = "line = \"\#{args}\""
=> "line = \"\#{args}\""
string = line.match(/^.*#/).to_s[0...-1]
=> "line = "

Expecting is : line = "#{args}"

In order to strip comments using regexp, you need to handle string
escape.

···

--
Posted via http://www.ruby-forum.com/.

Aldric Giacomoni wrote:

Alexandre Mutel wrote:

Aldric Giacomoni wrote:

I'm still only learning regular expressions (I'll do another shameless
plug for rubular.com here), but you could do this:
string = string.match(/^.*#).to_s[0...-1]

Yes, it's a poor solution, but should you have nothing else, it'll do.

the problem with your solution is that this line of code will remove
valid code :
   myvar_s = "#{myvar}"

The problem is to handle correctly string escape sequence... it's
possible, but it requires much more work... I just want to know if
someone else did this?!

Actually, no, because regexps are greedy by default, so it'll go to the
very last '#' it finds.

file # => array containing each line of the file you want to clean up
file.map! do |line|
  line =~ /(^.*)#/
  $1
end

···

--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Woop, i was to fast. In fact, i need an eval on the whole file, because
my dsl language allow ruby code to be used (and so definition of
methods... etc.)

Then it sounds like you just need to separate the blocks of code
appropriately. Do you want each line which begins with \w: (a labelled
line) to be treated specially? Then the rest of the code between the
labelled lines can be treated as a single string.

Proof-of-concept:

src = <<EOS
def foo
  puts "XXX"
end
label1: foo # this is a test
def bar
  puts "YYY"
end
label2: bar
EOS

def label(name)
  puts "Executing label #{name} now..."
  yield
end

b = binding
line = 1
src.split(/^(\w+:.*)\n/).each do |chunk|
  if chunk =~ /(\w+):(.*)$/
    eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
    line += 1
  else
    eval chunk, b, "DSL", line
    line += chunk.split("\n").size
  end
end

···

--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Expecting is : line = "#{args}"

In order to strip comments using regexp, you need to handle string
escape.

Ah.. What if the only '#' isn't a comment. Good point.

···

--
Posted via http://www.ruby-forum.com/.

Brian Candler wrote:

Then it sounds like you just need to separate the blocks of code
appropriately. Do you want each line which begins with \w: (a labelled
line) to be treated specially? Then the rest of the code between the
labelled lines can be treated as a single string.

b = binding
line = 1
src.split(/^(\w+:.*)\n/).each do |chunk|
  if chunk =~ /(\w+):(.*)$/
    eval "label(#{$1.inspect}) { #{$2}\n }", b, "DSL", line
    line += 1
  else
    eval chunk, b, "DSL", line
    line += chunk.split("\n").size
  end
end

Damn, your solution was almost working well, but working on an external
file, the "eval" loose the step in the code and i'm not able to go back
to debug the dsl...
ok, i'm going probably to forgot about this option for now... i'll see
later on how to do it.

Thanks again Brian.

···

--
Posted via http://www.ruby-forum.com/.

Alexandre Mutel wrote:

Damn, your solution was almost working well, but working on an external
file, the "eval" loose the step in the code

What do you mean by "loose the step" - it's reporting the wrong line
number? I hacked together that code very quickly, and I'm sure it's
fixable. Here is a more verbose version that is more likely to have the
correct line number.

buf = nil
buf_line = 0
b = binding
src.each_with_index do |line,i|
  if line =~ /^(\w+):(.*)\n/
    label, code = $1, $2
    if buf
      eval buf, b, "DSL", buf_line+1
      buf = nil
    end
    eval "label(#{label.inspect}) { #{code}\n}", b, "DSL", i+1
  else
    unless buf
      buf = ""
      buf_line = i
    end
    buf << line
  end
end
if buf
  eval buf, b, "DSL", buf_line+1
  buf = nil
end

···

--
Posted via http://www.ruby-forum.com/.

Brian Candler wrote:

Alexandre Mutel wrote:

Damn, your solution was almost working well, but working on an external
file, the "eval" loose the step in the code

What do you mean by "loose the step" - it's reporting the wrong line
number? I hacked together that code very quickly, and I'm sure it's
fixable. Here is a more verbose version that is more likely to have the
correct line number.

i mean that before the eval of a chunk, i'm still in the dsl code, but
after i press "F8" hit, the debugger is going back to the line just
after the eval (line += chunk.split("\n").size), although i didn't setup
any breakpoint code there... it's weird, but then, I'm not able to come
back and step in the dsl code (even if i put some breakpoints).
I don't know if it's a bug or limitation on my debugger (i'm using
RubyMine) or probably I'm missing something...

···

--
Posted via http://www.ruby-forum.com/.

Oh I see - eval doesn't work with a ruby debugger. I guess the debugger
is assuming that the line number in the exception backtrace is an offset
from the start of the eval string, which it isn't here.

I did think of another and simpler solution for you though. When you
insert a newline and close-brace, add a semicolon and not another
newline. e.g.

  n: foo(bar) # comment
  nextline

becomes:

  label(:n) { foo(bar) # comment
  }; nextline

How would that be?

···

--
Posted via http://www.ruby-forum.com/.

Brian Candler wrote:

  n: foo(bar) # comment
  nextline

becomes:

  label(:n) { foo(bar) # comment
  }; nextline

How would that be?

YES! it seems to work perfectly... the ; doesn't alter the line counting
for the debugger. In fact, I tried this solution this morning without
the semicolon... but yep, it's logical with semicolon now!

Thanks very much Brian, this is helping me a lot.

···

--
Posted via http://www.ruby-forum.com/.