More questions about =~

irb(main):006:1* class String
irb(main):006:1* alias oldRegex =~
irb(main):006:1* def =~(o)
irb(main):007:2> p "string regex"
irb(main):008:2> oldRegex(o)
irb(main):009:2> end
irb(main):010:1> end

irb(main):017:0> class Regexp
irb(main):018:1> alias oldRegex =~
irb(main):019:1* def =~(o)
irb(main):020:2> p "regexp regex"
irb(main):021:2> oldRegex(o)
irb(main):022:2> end
irb(main):023:1> end

“abc” =~ /a/
=> 0

Eh? Where is print statement?
What =~ is being called?

If I do:

irb(main):040:0> “abc”.=~ /a/
“string regex”
=> 0

irb(main):041:0> /a/.=~(“abc”)
“regex regex”
=> 0

all seems logical, thou.

I am obviously missing some other method that =~ uses, right?

I am obviously missing some other method that =~ uses, right?

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Guy Decoux

ts decoux@moulon.inra.fr wrote in message news:200401011438.i01Ecb824151@moulon.inra.fr

I am obviously missing some other method that =~ uses, right?

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Hmm… so, is there a way to overload/redefine it from ruby?
Also, how do you find out about the existance of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

GGarramuno wrote:

ts decoux@moulon.inra.fr wrote in message news:200401011438.i01Ecb824151@moulon.inra.fr

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Hmm… so, is there a way to overload/redefine it from ruby?

no way of overloading or redefining from ruby that I know of, but you do
have some other options. this optimization is used when the regexp is a
a literal, which means you can get by this by not using a regexp
literal, for example:

pattern = /a/

or

pattern = Regexp.new(‘a’)

“abc” =~ pattern

you could also prepend . to the =~ operator, which still looks
operator-like but is interpreted as a method call:

“abc” .=~ /a/

it would be best if ruby were to first check to see if =~ has been
redefined before applying this optimization.

Also, how do you find out about the existence of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

yep, the ruby source code is the primary source of documentation for
this kind of stuff.

Dave

Look here if you want to overwrite $1…$9, $', $`, $+
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/86167

BTW: Why are you interested in overloading the regexp methods?
Is it because you are writing your own regexp engine?

···

On Thu, 01 Jan 2004 12:46:54 -0800, GGarramuno wrote:

ts decoux@moulon.inra.fr wrote in message news:200401011438.i01Ecb824151@moulon.inra.fr

I am obviously missing some other method that =~ uses, right?

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Hmm… so, is there a way to overload/redefine it from ruby?
Also, how do you find out about the existance of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?


Simon Strandgaard

“Dave Wilson” davel@canuck.com wrote in message

GGarramuno wrote:

ts decoux@moulon.inra.fr wrote in message

Yes, internally ruby is optimized to call directly the C function
rather
than calling the ruby method.

Also, how do you find out about the existence of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

yep, the ruby source code is the primary source of documentation for
this kind of stuff.

More over, you have to keep in mind that it is Guy who is answering
your questions. I strongly suspect, and I am sure I am not the only one
on this ML, that he knows the source code by heart !

Dave Wilson wrote:

this optimization is used when the regexp is a
a literal, which means you can get by this by not using a regexp
literal, for example:

pattern = /a/

or

pattern = Regexp.new(‘a’)

“abc” =~ pattern

I should clarify: you can use a literal regexp, but not “in-place”. the
optimization only happens when you match against the literal regexp
“in-place”, but if you match against a variable that has been assigned a
literal regexp, this optimization won’t be used.

Dave

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.02.12.28.42.15247@adslhome.dk

BTW: Why are you interested in overloading the regexp methods?
Is it because you are writing your own regexp engine?

Kind of.

I was just trying to have an extended string class that stored the
position of the last match position, to allow supporting something
similar to Perl’s pos() command (which allows continuing matches from
the last match done or setting where to begin from like index() —
only it works for subs and all regex methods, too). I am porting some
nasty perl code that makes heavy use of that feature. Ruby’s regex
engine already has the building blocks needed to support this feature,
but the interface to it is not as easy to use as perl’s.
I had thought I could use ruby’s excellent OO to add this seamlessly
into the string class (and also make it work for any other string that
used my module, too).
But the fact that =~ of regex literals works in a non OO manner, has
kind of put a damp on that idea.

···

On Thu, 01 Jan 2004 12:46:54 -0800, GGarramuno wrote:

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.02.12.28.42.15247@adslhome.dk

BTW: Why are you interested in overloading the regexp methods?
Is it because you are writing your own regexp engine?

Kind of.

I was just trying to have an extended string class that stored the
position of the last match position, to allow supporting something
similar to Perl’s pos() command (which allows continuing matches from
the last match done or setting where to begin from like index() —
only it works for subs and all regex methods, too). I am porting some
nasty perl code that makes heavy use of that feature.

#begin /#end can tell the offset… is that what you want ?

irb(main):001:0> txt = “hello world, hello world”
=> “hello world, hello world”
irb(main):002:0> m = /orl/.match(txt)
=> #MatchData:0x81c8754
irb(main):003:0> m.begin(0)
=> 7
irb(main):004:0> m.end(0)
=> 10
irb(main):005:0>

Ruby’s regex
engine already has the building blocks needed to support this feature,
but the interface to it is not as easy to use as perl’s.
I had thought I could use ruby’s excellent OO to add this seamlessly
into the string class (and also make it work for any other string that
used my module, too).
But the fact that =~ of regex literals works in a non OO manner, has
kind of put a damp on that idea.

It sound interesting, I like challenges (but i don’t do perl).
Can you show us some of the perl code which you are porting ?

···

On Fri, 02 Jan 2004 10:16:45 -0800, GGarramuno wrote:

On Thu, 01 Jan 2004 12:46:54 -0800, GGarramuno wrote:


Simon Strandgaard

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.02.19.24.35.766160@adslhome.dk

#begin /#end can tell the offset… is that what you want ?

Kind of. Those are what I call the building blocks.

The differences are that:
a) That’s an index that gets stored with the string variable, as if
it were an attribute of it.
b) It can also be set, like String#pos(number), so that any
further matches or substitutions begin from that position on (this is
somewhat akin to ruby’s index())
c) It interacts with the \G flag of regular expressions.

It sound interesting, I like challenges (but i don’t do perl).
Can you show us some of the perl code which you are porting ?

Well, you really don’t want to look at it. It is a library written by
Damian Conway (one of Perl’s top gurus and designers) and as such it
is brilliant.
But unless you’ve been coding perl for some years, it will more likely
look like part of an obfuscated code contest :slight_smile:

Anyway, here it is:
http://www.cpan.org/modules/01modules.index.html
Module is: Getopt-Declare.

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.02.19.24.35.766160@adslhome.dk

#begin /#end can tell the offset… is that what you want ?

Kind of. Those are what I call the building blocks.

The differences are that:
a) That’s an index that gets stored with the string variable, as if
it were an attribute of it.
b) It can also be set, like String#pos(number), so that any
further matches or substitutions begin from that position on (this is
somewhat akin to ruby’s index())

How about #scan ?

c) It interacts with the \G flag of regular expressions.

never seen \G before… is that global ?

It sound interesting, I like challenges (but i don’t do perl).
Can you show us some of the perl code which you are porting ?

Well, you really don’t want to look at it. It is a library written by
Damian Conway (one of Perl’s top gurus and designers) and as such it
is brilliant.
But unless you’ve been coding perl for some years, it will more likely
look like part of an obfuscated code contest :slight_smile:

Agree on that, it looks obfuscated… but less than other perl modules I
have seen.

Anyway, here it is:
Modules on CPAN alphabetically
Module is: Getopt-Declare.

I am curios to how its differ from GetoptLong ?

···

On Fri, 02 Jan 2004 15:25:04 -0800, GGarramuno wrote:


Simon Strandgaard

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.03.00.31.06.853624@adslhome.dk

Simon Strandgaard neoneye@adslhome.dk wrote in message news:pan.2004.01.02.19.24.35.766160@adslhome.dk

#begin /#end can tell the offset… is that what you want ?

Kind of. Those are what I call the building blocks.

The differences are that:
a) That’s an index that gets stored with the string variable, as if
it were an attribute of it.
b) It can also be set, like String#pos(number), so that any
further matches or substitutions begin from that position on (this is
somewhat akin to ruby’s index())

How about #scan ?

No, different thing.

c) It interacts with the \G flag of regular expressions.

never seen \G before… is that global ?

From the perl manual…
\G Match only at pos() (e.g. at the end-of-match position
of prior m//g)

Ruby supposedly supports it, but not as a setter, from what I can see
so far (ie. it is missing pos()).

Agree on that, it looks obfuscated… but less than other perl modules I
have seen.

Don’t be fooled. It is one of the most obfuscated modules once you
realize all it does with so little code.

Anyway, here it is:
Modules on CPAN alphabetically
Module is: Getopt-Declare.

I am curios to how its differ from GetoptLong ?

You can read the full docs at the end of the module (if you have perl,
perldoc is better, thou). The docs are 1500 lines long.

Overall, it is vastly superior and makes all other option parsing
modules obsolete and primitive, imho.

Among the not so standard features:

  • Allows also using a config file for options and reading parameters
    from other places other than commandline (files, for example).
  • It keeps the flags and docs as a single string (ie. you basically
    type the help string message ONLY and the module extracts the flags
    from that). It makes for extremely clean code while still allowing
    you to format the help line as you wish. Help line is provided
    automatically, too, removing special characters or blocks.
  • It supports arbitrary user created types for matching, not just
    string, numerics, etc.
  • For numbers it supports matching positive, negative w or w/o 0.
  • Allows arrays parsing and ranges parsing/expanding.
  • Allows matching parameters with a specific manual regex.
  • It supports all sorts of user shortcuts for flags (not just two).
  • Supports aliases for flags easily.
  • It creates regex code that can be spit out for matching if needed.
  • It allows code blocks to be imbedded (ie. when flags are seen full
    blocks can be parsed with perl, MUCH more powerful ways than other
    similar getopts)
  • Allows case to be ignored on a parameter or globally.
  • Allows options to be exclusive, inclusive, strict, etc.
  • Allows clustering of flags in a couple of forms
  • Allows parameters to be put on a queue, so that they only get
    interpreted after all others have.
  • Can check file parameter to verify their existance.
···

On Fri, 02 Jan 2004 15:25:04 -0800, GGarramuno wrote:

A lot of those things are provided by the Ruby package optparse (which
I’ve used with great effect), and I was wondering if you could compare
optparse with Getopt-Declare; perhaps Nobu will add the missing
features :slight_smile:

You can find optparse documentation here:

http://www.ruby-doc.org/stdlib/libdoc/optparse/rdoc/index.html

Thanks,

Nathaniel

<:((><

···

On Jan 3, 2004, at 06:51, GGarramuno wrote:

Overall, it is vastly superior and makes all other option parsing
modules obsolete and primitive, imho.

Among the not so standard features:

  • Allows also using a config file for options and reading parameters
    from other places other than commandline (files, for example).
  • It keeps the flags and docs as a single string (ie. you basically
    type the help string message ONLY and the module extracts the flags
    from that). It makes for extremely clean code while still allowing
    you to format the help line as you wish. Help line is provided
    automatically, too, removing special characters or blocks.
  • It supports arbitrary user created types for matching, not just
    string, numerics, etc.
  • For numbers it supports matching positive, negative w or w/o 0.
  • Allows arrays parsing and ranges parsing/expanding.
  • Allows matching parameters with a specific manual regex.
  • It supports all sorts of user shortcuts for flags (not just two).
  • Supports aliases for flags easily.
  • It creates regex code that can be spit out for matching if needed.
  • It allows code blocks to be imbedded (ie. when flags are seen full
    blocks can be parsed with perl, MUCH more powerful ways than other
    similar getopts)
  • Allows case to be ignored on a parameter or globally.
  • Allows options to be exclusive, inclusive, strict, etc.
  • Allows clustering of flags in a couple of forms
  • Allows parameters to be put on a queue, so that they only get
    interpreted after all others have.
  • Can check file parameter to verify their existance.

Nathaniel Talbott nathaniel@talbott.ws wrote in message news:380B42FA-3E16-11D8-9233-000A95CD7A8E@talbott.ws

A lot of those things are provided by the Ruby package optparse (which
I’ve used with great effect), and I was wondering if you could compare
optparse with Getopt-Declare; perhaps Nobu will add the missing
features :slight_smile:

I can compare the features, most likely. But the first thing that
quickly turns me off against it is how parameters are passed,
inefficiently one at a time.

Compare that, to the simplicity and elegance of perl’s
Getopt::Declare.

For a simple example:

$args = new Getopt::Declare (<<‘EOPARAM’);

···

On Jan 3, 2004, at 06:51, GGarramuno wrote:

============================================================
Required parameter:

-in Input file [required]


Optional parameters:

(The first two are mutually exclusive) [mutex: -r -p]

-r[and[om]] Output in random order
-p[erm[ute]] Output all permutations


-out Optional output file


Note: this program is known to run very slowly of files with
long individual lines.

EOPARAM

The beauty of the system is that the syntax definition can almost
looks like the help itself (from which a default -h flag printout is
extracted), so it is very easy to understand, even for newbies who
never read the docs to the module.
You just need to recall anything within is optional or a special
command to the engine, while {} is code, etc.

For a more complex case (involving complex switches, embedded code,
multiple file parsing, ranges, arrays, etc.), look at this one:

$args = new Getopt::Declare <<‘EOARGS’;
($0 version $VERSION)
General options:

    -e <f:i>..<t:i> Set expansion factor to specified range
                    [requires: <file>]
                            { print "k = [$f..$t]\n"; }

    -e [<k:n>...]   Set expansion factor to <k> (or 2 by default)
                    [required]
                            { @k = (2) unless @k;
                              print "k = [", join(',', @k), "]\n";

}

    -b <blen:i>     Use byte length of <blen> 
                    [excludes: -a +c]
                            { print "byte len: $blen\n"; }

    <file>...       Process files [required] [implies: -a]
                            { print "files: \@file\n"; }

    -a [<N:n>]      Process all data [except item <N>]
                            { print "proc all\n"; print "except

$N\n" if $N; }

    -fab            The fabulous option (is always required :-)
                    [required]
                            { defer { print "fabulous!\n" } }

File creation options:

    +c <file>       Create file [mutex: +c -a]
                            { print "create: $file\n"; }

    +d <file>       Duplicate file [implies: -a and -b 8]
                    This is a second line
                            { print "dup (+d) $file\n"; }
    --dup <file>    [ditto] (long form)

{ print “dup (–dup) $file\n”; }

    -how <N:i>      Set height to <N>       [repeatable]

Garbling options:

    -g [<seed:i>]   Garble output with optional seed [requires:

+c]
{ print “garbling with $seed\n”; }
-i Case insensitive garbling [required]
{ print “insensitive\n”; }
-s Case sensitive garbling
-w WaReZ m0De 6aRBL1N6

    [mutex: -i -s -w]

EOARGS

The time it would take me to write something like that in other getopt
parsers, I’d get frustrated. It’s probably not an issue if you write
command-line tools once in a while, but if you write many of them
every now and then (or expect some program to keep adding complex
switches), it makes sense to use something like Getopt::Declare.

All flags, eventually end up being stored in a public hash of the
object, to extract them later, of course.