ANN: Regexador - A mini-language for regular expressions

HAL_90001 · 6 September 2013 22:50

This is a new project, but is reasonably mature for its age.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

pattern = Regexador.new(program)

puts "Give me an IP address"
str = gets.chomp

rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

Florian_Gilcher · 7 September 2013 05:46

From the README:

"I'm thinking of ignoring these features for now:
Unicode chars"
And out. This is not a serious endeavour.

···

Am 07.09.2013 um 00:50 schrieb Hal Fulton <rubyhacker@gmail.com>:

This is a new project, but is reasonably mature for its age.

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

    /^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

Mike_Stok1 · 7 September 2013 18:02

This looks like a fun project which I’ll look into.

I think you’ve made regexes look worse than they need to (though that might well be how a person unfamiliar with regexes actually uses them). The comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Ruby’s regular expressions, remembering that \A and \z are the “real” end of string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
          25[0-5] | # 250 .. 255
          2[0-4]\d | # 200 .. 249
          [01]?\d{1,2} # 0 .. 199
         )
       /x

IP_ADDR4 = / \A #{BYTE} \. #{BYTE} \. #{BYTE} \. #{BYTE} \z /x

# p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
puts "Good"
else
puts "Bad"
end

__END__

Of course my Perl history makes the regex version seem “clear to me”.

I would usually decompose the text using a regular expression and then do the validation using code, for example something like:

def ipv4_address?(string)
md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match string
md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

···

On 2013-09-06, at 11:50 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age.

See http://github.com/hal9000/regexador

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

    /^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Robert_K1 · 8 September 2013 12:29

Reminds me a bit of something I did almost exactly six years and one month ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263785

Cheers

robert

···

On Sat, Sep 7, 2013 at 12:50 AM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age.

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ryan_Davis1 · 8 September 2013 01:36

I would usually decompose the text using a regular expression and then do the validation using code, for example something like:

def ipv4_address?(string)
md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match string
md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

You know what's even better? Not writing anything:

require 'ipaddr'

=> true

i = IPAddr.new("192.168.2.100")

=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>

i = IPAddr.new("192.168.2.257")

IPAddr::InvalidAddressError: invalid address

···

On Sep 7, 2013, at 11:02 , Mike Stok <mike@stok.ca> wrote:

HAL_90001 · 9 September 2013 14:59

Suit yourself. I put off working on that until September, i.e.,
I started three days ago.

But as you are "out," I suppose you will never see this anyway.

Hal

···

On Sat, Sep 7, 2013 at 12:46 AM, Florian Gilcher <flo@andersground.net>wrote:

From the README:

"I'm thinking of ignoring these features for now:

   - Unicode chars"

And out. This is not a serious endeavour.

Am 07.09.2013 um 00:50 schrieb Hal Fulton <rubyhacker@gmail.com>:

This is a new project, but is reasonably mature for its age.

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

HAL_90001 · 9 September 2013 15:03

Mike,

You're correct, of course. Multiline regular expressions are
much more readable in general.

Many would argue that the entire project is not worthwhile at all.

My personal opinion is that there is a threshold (which is itself a matter
of opinion) where regexes become needlessly difficult to read.

Hal

···

On Sat, Sep 7, 2013 at 1:02 PM, Mike Stok <mike@stok.ca> wrote:

On 2013-09-06, at 11:50 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

This is a new project, but is reasonably mature for its age.

See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.

When a regular expression grows too complex to read or maintain,
construct a small script to describe it instead.

Example from the README (see below).

Comments welcome.

Thanks!
Hal Fulton

Suppose we want to match a string consisting of a single IP address.
(Remember that the numbers can only range as high as 255.)

Here is traditional regular expression notation:

/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/

And here is Regexador notation:

    dot = "."
    num = "25" D5 | `2 D4 D | maybe D1 1,2*D
    match BOS num dot num dot num dot num EOS end

In your Ruby code, you can create a Regexador "script" or "program"
(probably by means of a here-document) that you can then pass into
the Regexador class. At minimum, you can convert this into a "real"
Ruby regular expression; there are a few other features and functions,
and more may be added.

So here is a complete Ruby program:

    require 'regexador'

    program = <<-EOS
      dot = "."
      num = "25" D5 | `2 D4 D | maybe D1 0,2*D
      match WB num dot num dot num dot num WB end
    EOS

    pattern = Regexador.new(program)

    puts "Give me an IP address"
    str = gets.chomp

    rx = pattern.to_regex # Can retrieve the actual regex

    if pattern.match?(str) # ...or use in other direct ways
      puts "Valid"
    else
      puts "Invalid"
    end

This looks like a fun project which I’ll look into.

I think you’ve made regexes look worse than they need to (though that
might well be how a person unfamiliar with regexes actually uses them). The
comments below are about regexes rather than your project.

I think it is possible to achieve a lot with interpolation in Ruby’s
regular expressions, remembering that \A and \z are the “real” end of
string anchors, and using the x modifier.

#!/usr/bin/env ruby

BYTE = / (?:
          25[0-5] | # 250 .. 255
          2[0-4]\d | # 200 .. 249
          [01]?\d{1,2} # 0 .. 199
         )
       /x

IP_ADDR4 = / \A #{BYTE} \. #{BYTE} \. #{BYTE} \. #{BYTE} \z /x

# p IP_ADDR4

print "Give me an address: "
if IP_ADDR4 =~ gets.chomp
  puts "Good"
else
  puts "Bad"
end

__END__

Of course my Perl history makes the regex version seem “clear to me”.

I would usually decompose the text using a regular expression and then do
the validation using code, for example something like:

def ipv4_address?(string)
  md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match
string
  md && md.captures.all? { |num| num.to_i.between?(0, 255) }
end

Regards,

Mike

--

Mike Stok <mike@stok.ca>
Mike Stok

The "`Stok' disclaimers" apply.

HAL_90001 · 9 September 2013 15:04

In this case, very true. I have only touched the ipaddr lib
once, but I see it to be very useful.

Hal

···

On Sat, Sep 7, 2013 at 8:36 PM, Ryan Davis <ryand-ruby@zenspider.com> wrote:

On Sep 7, 2013, at 11:02 , Mike Stok <mike@stok.ca> wrote:

> I would usually decompose the text using a regular expression and then
do the validation using code, for example something like:
>
> def ipv4_address?(string)
> md = /\A (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \. (\d{1,3}) \z/x.match
string
> md && md.captures.all? { |num| num.to_i.between?(0, 255) }
> end

You know what's even better? Not writing anything:

>> require 'ipaddr'
=> true
>> i = IPAddr.new("192.168.2.100")
=> #<IPAddr: IPv4:192.168.2.100/255.255.255.255>
>> i = IPAddr.new("192.168.2.257")
IPAddr::InvalidAddressError: invalid address

HAL_90001 · 9 September 2013 15:06

I will look at this when I have time.

It would not be the first time you were six years ahead of me.

Hal

···

On Sun, Sep 8, 2013 at 7:29 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Sat, Sep 7, 2013 at 12:50 AM, Hal Fulton <rubyhacker@gmail.com> wrote:
> This is a new project, but is reasonably mature for its age.
>
> See GitHub - Hal9000/regexador: An external DSL for Ruby that tries to make regular expressions readable and maintainable.
>
> When a regular expression grows too complex to read or maintain,
> construct a small script to describe it instead.
>
> Example from the README (see below).
>
> Comments welcome.
>
> Thanks!
> Hal Fulton
>
>
> Suppose we want to match a string consisting of a single IP address.
> (Remember that the numbers can only range as high as 255.)
>
> Here is traditional regular expression notation:
>
>
>
/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/
>
> And here is Regexador notation:
>
> dot = "."
> num = "25" D5 | `2 D4 D | maybe D1 1,2*D
> match BOS num dot num dot num dot num EOS end
>
> In your Ruby code, you can create a Regexador "script" or "program"
> (probably by means of a here-document) that you can then pass into
> the Regexador class. At minimum, you can convert this into a "real"
> Ruby regular expression; there are a few other features and functions,
> and more may be added.
>
> So here is a complete Ruby program:
>
> require 'regexador'
>
> program = <<-EOS
> dot = "."
> num = "25" D5 | `2 D4 D | maybe D1 0,2*D
> match WB num dot num dot num dot num WB end
> EOS
>
> pattern = Regexador.new(program)
>
> puts "Give me an IP address"
> str = gets.chomp
>
> rx = pattern.to_regex # Can retrieve the actual regex
>
> if pattern.match?(str) # ...or use in other direct ways
> puts "Valid"
> else
> puts "Invalid"
> end

Reminds me a bit of something I did almost exactly six years and one month
ago:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263785

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert_K1 · 9 September 2013 16:53

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years.

Cheers

robert

···

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

I will look at this when I have time.

It would not be the first time you were six years ahead of me.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Eric_Christopherson · 26 September 2013 23:24

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

http://krainboltgreene.github.io/hexpress/?utm_source=rubyweekly&utm_medium=email

···

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me.

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

HAL_90001 · 27 September 2013 19:59

I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they're both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in implementation
they are different. Obviously I like my own better. Arguably it is "more
different" from these other two than they are from each other.

Hal

···

On Thu, Sep 26, 2013 at 6:24 PM, Eric Christopherson < echristopherson@gmail.com> wrote:

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

Experimenting with Verbal Expressions

Hexpress by krainboltgreene

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme <shortcutter@googlemail.com > > wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me.

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

abinoam · 28 September 2013 07:04

+1 for the choosen name Regexador!

Abinoam Jr.
(From Brazil )

···

On Fri, Sep 27, 2013 at 4:59 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

I had seen Verbal Expressions before I started my own project, but
never saw hexpress until a couple of weeks ago.

I think they're both worthy projects, as the concept itself is a worthy
one (in my opinion).

All three projects are similar in spirit and intent, but in implementation
they are different. Obviously I like my own better. Arguably it is "more
different" from these other two than they are from each other.

Hal

On Thu, Sep 26, 2013 at 6:24 PM, Eric Christopherson > <echristopherson@gmail.com> wrote:

The newest Ruby Weekly pointed out two other more-friendly ways of doing
regexes:

Experimenting with Verbal Expressions

Hexpress by krainboltgreene

On Mon, Sep 9, 2013 at 11:53 AM, Robert Klemme >> <shortcutter@googlemail.com> wrote:

On Mon, Sep 9, 2013 at 5:06 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> I will look at this when I have time.
>
> It would not be the first time you were six years ahead of me.

Hm, maybe then I should ask you whether you take over maintenance of
my grave - then I'm sure it looks nice for at least six years.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Alternate Regular Expressions? ruby-talk	26	152	24 December 2009
Regular expressions ruby-talk	26	162	17 April 2003
About Regular Expressions ruby-talk	30	139	20 November 2004
Regex simplifier? ruby-talk	16	129	18 February 2011
Regex that works on rubular.com but not in my program ruby-talk	7	192	26 June 2009

ANN: Regexador - A mini-language for regular expressions

Related topics