ANN: Regexador - A mini-language for regular expressions

This is a new project, but is reasonably mature for its age. :wink:

See http://github.com/hal9000/regexador

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

From the README:

"I'm thinking of ignoring these features for now:
Unicode chars"
And out. This is not a serious endeavour.

···

Am 07.09.2013 um 00:50 schrieb Hal Fulton <rubyhacker@gmail.com>:

This is a new project, but is reasonably mature for its age. :wink:

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

    /^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

This looks like a fun project which I’ll look into.

I think you’ve made regexes look worse than they need to (though that might well be how a person unfamiliar with regexes actually uses them). The comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Ruby’s regular expressions, remembering that \A and \z are the “real” end of string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
          25[0-5] | # 250 .. 255
          2[0-4]\d | # 200 .. 249
          [01]?\d{1,2} # 0 .. 199
         )
       /x

IP_ADDR4 = / \A #{BYTE} \. #{BYTE} \. #{BYTE} \. #{BYTE} \z /x

# p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
  puts "Good"
else
  puts "Bad"
end

__END__

Of course my Perl history makes the regex version seem “clear to me”.

I would usually decompose the text using a regular expression and then do the validation using code, for example something like:

def ipv4_address?(string)
  md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match string
  md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

···

On 2013-09-06, at 11:50 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age. :wink:

See http://github.com/hal9000/regexador

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

    /^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Reminds me a bit of something I did almost exactly six years and one month ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263785

:wink:

Cheers

robert

···

On Sat, Sep 7, 2013 at 12:50 AM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age. :wink:

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I would usually decompose the text using a regular expression and then do the validation using code, for example something like:

def ipv4_address?(string)
  md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match string
  md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

You know what's even better? Not writing anything:

require 'ipaddr'

=> true

i = IPAddr.new("192.168.2.100")

=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>

i = IPAddr.new("192.168.2.257")

IPAddr::InvalidAddressError: invalid address

···

On Sep 7, 2013, at 11:02 , Mike Stok <mike@stok.ca> wrote:

Suit yourself. I put off working on that until September, i.e.,
I started three days ago.

But as you are "out," I suppose you will never see this anyway.

Hal

···

On Sat, Sep 7, 2013 at 12:46 AM, Florian Gilcher <flo@andersground.net>wrote:

From the README:

"I'm thinking of ignoring these features for now:

   - Unicode chars"

And out. This is not a serious endeavour.

Am 07.09.2013 um 00:50 schrieb Hal Fulton <rubyhacker@gmail.com>:

This is a new project, but is reasonably mature for its age. :wink:

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

Mike,

You're correct, of course. Multiline regular expressions are
much more readable in general.

Many would argue that the entire project is not worthwhile at all.

My personal opinion is that there is a threshold (which is itself a matter
of opinion) where regexes become needlessly difficult to read.

Hal

···

On Sat, Sep 7, 2013 at 1:02 PM, Mike Stok <mike@stok.ca> wrote:

On 2013-09-06, at 11:50 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age. :wink:

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

This looks like a fun project which I’ll look into.

I think you’ve made regexes look worse than they need to (though that
might well be how a person unfamiliar with regexes actually uses them). The
comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Ruby’s
regular expressions, remembering that \A and \z are the “real” end of
string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
          25[0-5] | # 250 .. 255
          2[0-4]\d | # 200 .. 249
          [01]?\d{1,2} # 0 .. 199
         )
       /x

IP_ADDR4 = / \A #{BYTE} \. #{BYTE} \. #{BYTE} \. #{BYTE} \z /x

# p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
  puts "Good"
else
  puts "Bad"
end

__END__

Of course my Perl history makes the regex version seem “clear to me”.

I would usually decompose the text using a regular expression and then do
the validation using code, for example something like:

def ipv4_address?(string)
  md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match
string
  md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

--

Mike Stok <mike@stok.ca>
Mike Stok

The "`Stok' disclaimers" apply.

In this case, very true. I have only touched the ipaddr lib
once, but I see it to be very useful.

Hal

···

On Sat, Sep 7, 2013 at 8:36 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Sep 7, 2013, at 11:02 , Mike Stok <mike@stok.ca> wrote:

> I would usually decompose the text using a regular expression and then
do the validation using code, for example something like:
>
> def ipv4_address?(string)
> md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match
string
> md && md.captures.all? { |num| num.to_i.between?(0, 255) }
> end

You know what's even better? Not writing anything:

>> require 'ipaddr'
=> true
>> i = IPAddr.new("192.168.2.100")
=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>
>> i = IPAddr.new("192.168.2.257")
IPAddr::InvalidAddressError: invalid address

I will look at this when I have time.

It would not be the first time you were six years ahead of me. :slight_smile:

Hal

···

On Sun, Sep 8, 2013 at 7:29 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Sat, Sep 7, 2013 at 12:50 AM, Hal Fulton <rubyhacker@gmail.com> wrote:
> This is a new project, but is reasonably mature for its age. :wink:
>
> See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.
>
> When a regular expression grows too complex to read or maintain,
> construct a small script to describe it instead.
>
> Example from the README (see below).
>
> Comments welcome.
>
> Thanks!
> Hal Fulton
>
>
> Suppose we want to match a string consisting of a single IP address.
> (Remember that the numbers can only range as high as 255.)
>
> Here is traditional regular expression notation:
>
>
>
/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/
>
> And here is Regexador notation:
>
> dot = "."
> num = "25" D5 | `2 D4 D | maybe D1 1,2*D
> match BOS num dot num dot num dot num EOS end
>
> In your Ruby code, you can create a Regexador "script" or "program"
> (probably by means of a here-document) that you can then pass into
> the Regexador class. At minimum, you can convert this into a "real"
> Ruby regular expression; there are a few other features and functions,
> and more may be added.
>
> So here is a complete Ruby program:
>
> require 'regexador'
>
> program = <<-EOS
> dot = "."
> num = "25" D5 | `2 D4 D | maybe D1 0,2*D
> match WB num dot num dot num dot num WB end
> EOS
>
> pattern = Regexador.new(program)
>
> puts "Give me an IP address"
> str = gets.chomp
>
> rx = pattern.to_regex # Can retrieve the actual regex
>
> if pattern.match?(str) # ...or use in other direct ways
> puts "Valid"
> else
> puts "Invalid"
> end

Reminds me a bit of something I did almost exactly six years and one month
ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263785

:wink:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years. :wink:

Cheers

robert

···

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

I will look at this when I have time.

It would not be the first time you were six years ahead of me. :slight_smile:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

http://krainboltgreene.github.io/hexpress/?utm_source=rubyweekly&utm_medium=email

···

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me. :slight_smile:

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years. :wink:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they're both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in implementation
they are different. Obviously I like my own better. Arguably it is "more
different" from these other two than they are from each other.

Hal

···

On Thu, Sep 26, 2013 at 6:24 PM, Eric Christopherson < echristopherson@gmail.com> wrote:

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

Experimenting with Verbal Expressions

Hexpress by krainboltgreene

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme <shortcutter@googlemail.com > > wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me. :slight_smile:

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years. :wink:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

+1 for the choosen name Regexador!

Abinoam Jr.
(From Brazil :wink: )

···

On Fri, Sep 27, 2013 at 4:59 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they're both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in implementation
they are different. Obviously I like my own better. Arguably it is "more
different" from these other two than they are from each other.

Hal

On Thu, Sep 26, 2013 at 6:24 PM, Eric Christopherson > <echristopherson@gmail.com> wrote:

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

Experimenting with Verbal Expressions

Hexpress by krainboltgreene

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme >> <shortcutter@googlemail.com> wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me. :slight_smile:

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years. :wink:

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/