String#to_rx?

Alex_Fenton2 · 22 November 2005 23:47

Possible RCR: would anyone else find this a useful addition to the core

class String
  def to_rx
    Regexp.new( Regexp.escape(self) )
  end
end

as a more straightforward & readable alternative to interpolation:

/#{Regexp.escape(a_string)}/

I turned up a few references to this sort of thing in ruby code on the web.

alex

7rans · 23 November 2005 02:04

Yes, I agree. In Facets, its more like:

  class String
    def to_re( esc=true )
      Regexp.new( esc ? Regexp.escape(self) : self )
    end
  end

T.

Nikolai_Weibull · 24 November 2005 13:28

Alex Fenton wrote:

Possible RCR: would anyone else find this a useful addition to the
core

class String
def to_rx
Regexp.new( Regexp.escape(self) )
end
end

as a more straightforward & readable alternative to interpolation:

/#{Regexp.escape(a_string)}/

I turned up a few references to this sort of thing in ruby code on the
web.

Please, everyone read Apocalypse 5 and
let's use that as a base for regexes in Ruby 2. Trying to put a string
literal in a regex shouldn't be hard and Ruby should just "do the right
thing" for you. I think that regular expressions have been grossly
misused in a lot of places and a lot of ways in the applications of
computer science. I realize that we can't go back, but to be able to go
forward, perhaps we need to stop and look around - more and more
metacharacters, funky escapes, and so on isn't sustainable in the long
run. Apocalypse 5 is a good base and I think that a lot of nice things
can come out of it (well, we're here, three years later, and still
nothing, but bear with me).

I also want a way to feed data to a regular expression for christmas, so
if you're into the whole thing of "the joy of giving", then I'm
certainly into the whole thing of "the joy of receiving".

nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Austin_Ziegler5 · 23 November 2005 14:21

I disagree that either #to_re or #to_rx would be a good name for this
construct. I think that it would be better to have a method on Regexp
itself (a new, alternative constructor) that does this. Maybe:

Regexp.compile_escaped(str)

I don't think I've actually ever seen Regexp.compile used, so maybe it
can be repurposed in 1.9 to do this.

FWIW, I don't tend to use the construct that Alex did -- I tend to
either anchor my strings or insert them in the middle of a larger
regexp, which is why I don't particularly think that this is a method
that belongs on String.

If it's to be on String, though, it should probably be on a few others
as well (Fixnum) and it should be explicit: #to_regexp.

-austin

···

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Jeff_Wood · 26 November 2005 00:24

I believe the Facets project already contains a method like this for String
objects.

facets.rubyforge.org

j.

···

On 11/24/05, Nikolai Weibull <mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:

Alex Fenton wrote:

> Possible RCR: would anyone else find this a useful addition to the
> core
>
> class String
> def to_rx
> Regexp.new( Regexp.escape(self) )
> end
> end
>
> as a more straightforward & readable alternative to interpolation:
>
> /#{Regexp.escape(a_string)}/
>
> I turned up a few references to this sort of thing in ruby code on the
> web.

Please, everyone read http://www.perl.com/pub/a/2002/06/04/apo5.html and
let's use that as a base for regexes in Ruby 2. Trying to put a string
literal in a regex shouldn't be hard and Ruby should just "do the right
thing" for you. I think that regular expressions have been grossly
misused in a lot of places and a lot of ways in the applications of
computer science. I realize that we can't go back, but to be able to go
forward, perhaps we need to stop and look around - more and more
metacharacters, funky escapes, and so on isn't sustainable in the long
run. Apocalypse 5 is a good base and I think that a lot of nice things
can come out of it (well, we're here, three years later, and still
nothing, but bear with me).

I also want a way to feed data to a regular expression for christmas, so
if you're into the whole thing of "the joy of giving", then I'm
certainly into the whole thing of "the joy of receiving".

nikolai

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

7rans · 23 November 2005 17:22

Austin Ziegler wrote:

I disagree that either #to_re or #to_rx would be a good name for this
construct. I think that it would be better to have a method on Regexp
itself (a new, alternative constructor) that does this. Maybe:

Regexp.compile_escaped(str)

The problem here is largely one of brevity. Who wants to type all that
when #to_re will do?

I don't think I've actually ever seen Regexp.compile used, so maybe it
can be repurposed in 1.9 to do this.

FWIW, I don't tend to use the construct that Alex did -- I tend to
either anchor my strings or insert them in the middle of a larger
regexp, which is why I don't particularly think that this is a method
that belongs on String.

If it's to be on String, though, it should probably be on a few others
as well (Fixnum) and it should be explicit: #to_regexp.

#to_regexp would imply that the object was some type of Regexp already.
#to_re or #to_rx clearly indicate it is a conversion. It's a pretty
innocent method and I don't think it is too much trouble to have even
if it's not the most useful method in the world.

I think even more usful though would just be a method on String that
does the Regexp escaping. I had been using my own small Kernel method
#resc(str) for this, but now I see it would be much more useful as a
String method:

/^#{foo.resc}/ =~ bar

All things being the same, I'll put that in the next verison of Facets.

T.

Alex_Fenton2 · 23 November 2005 18:17

Hi Austin

Austin Ziegler wrote:

I disagree that either #to_re or #to_rx would be a good name for this
construct.

I don't feel strongly about the name, though compare #to_f, #to_i, and #to_s already in the core - I don't see those as obviously more 'expressive' than to_rx, simply more familiar.

After all, #to_s *could* mean to 'to_symbol', and #to_f *could* mean 'to_file', if one were being perverse.

I don't think I've actually ever seen Regexp.compile used, so maybe it
can be repurposed in 1.9 to do this.

Yes, the special constrctor always makes me think it should do something "special" - perhaps like the /o modifier in perlre. However, I'd prefer an instance method in String to a new or repurposed class method in Regexp.

FWIW, I don't tend to use the construct that Alex did -- I tend to
either anchor my strings or insert them in the middle of a larger
regexp, which is why I don't particularly think that this is a method
that belongs on String.

I also commonly use them anchored within a larger regexp, but it would still be nicer to be able to write

/before #{str.to_rx} after/

Than to have to wedge a long call to a class function in an interpolated section.

I suppose the point I'm making is that Strings have a 'natural' affinity to or representation as Regexps - viz their mutually substitutable uses in #split, #sub and friends, and so it would be nice to make conversion between the two less unwieldy and verbose.

Regexp seems to me a 'major' core class, with its own literal syntax (as Float, Integer, String, Symbol) etc. I wouldn't like to make anyone write "#{an_integer}" to do the work of Integer#to_s, unless they really wanted to.

I agree there is some ambiguity about the semantics re anchoring - should #to_rx mean

/#{Regexp.escape(a_string)/
or
/\A#{Regexp.escape(a_string)\z/

The strongest argument for the former is that the latter doesn't do anything useful that #== doesn't already do.

If it's to be on String, though, it should probably be on a few others
as well (Fixnum) and it should be explicit: #to_regexp.

Perhaps, yes. It's not something I've ever yearned for personally.

cheers
alex

Nikolai_Weibull · 26 November 2005 08:46

Jeff Wood wrote:

[me discussing the merits of a better regex syntax over having #to_*
methods]

I believe the Facets project already contains a method like this for String
objects.

What it may contains is a #to_re method. It doesn't include anything
relating to my message.

nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Austin_Ziegler5 · 23 November 2005 17:38

Austin Ziegler wrote:
> I disagree that either #to_re or #to_rx would be a good name for this
> construct. I think that it would be better to have a method on Regexp
> itself (a new, alternative constructor) that does this. Maybe:
>
> Regexp.compile_escaped(str)

The problem here is largely one of brevity. Who wants to type all that
when #to_re will do?

Because neither #to_re nor #to_rx are expressive enough. As such, they
don't belong in the core.

I think even more usful though would just be a method on String that
does the Regexp escaping. I had been using my own small Kernel method
#resc(str) for this, but now I see it would be much more useful as a
String method:

/^#{foo.resc}/ =~ bar

All things being the same, I'll put that in the next verison of Facets.

Another name without expressiveness. #escape_regexp would be better.
But that doesn't suit your apparent need for brevity. Maybe #regesc,
but I would still oppose its inclusion in the core, so it really does
probably belong in Facets, where I don't have to even care that it
exists.

-austin

···

On 11/23/05, Trans <transfire@gmail.com> wrote:
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

7rans · 26 November 2005 22:32

nikolai,

Could you give us a summary of how it would apply. I really don't have
time to weed through all that material.

Thanks,
T.

7rans · 23 November 2005 18:17

Maybe #regesc

#regesc is better, thanks.

but I would still oppose its inclusion in the core, so it really does
probably belong in Facets, where I don't have to even care that it
exists.

Uhuh, like you haven't scoured through its source for what suits you
;-p

T.

Nikolai_Weibull · 26 November 2005 23:47

Trans wrote:

Could you give us a summary of how it would apply. I really don't have
time to weed through all that material.

There are two possible solutions as I see it that make more sense than
#to_rx: a) strings are interpreted as just that - strings of symbols
that you want to interpret literally and b) add syntax to allow you to
easily embed strings inside a regular expression.

In Perl 6, the suggestion is to interpret $string as a string and
<$string> as a regular expression. In Ruby that'd be #{string} and
<#{string}>, I suppose. (#{regex} would still mean what it means today
in both cases.)

Another thing that'd be nice to have is a way to insert literal strings
directly, e.g., /<'common regex operators include *, ?, and .'>/, where
<'...'> is syntax for a literal string. A way to embedd a string
variable would then perhaps be /<'#{var}'>/, but I think that the
solution in the previous paragraph may make more sense.

I'm probably not making myself clear enough, but if what I'm saying
seems interesting I'd suggest you at least read "Synopsis 5" [1].

nikolai

[1] Synopsis 5

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Austin_Ziegler5 · 23 November 2005 18:25

I've looked through the docs. There's so much there that I can get
elsewhere...

-austin

···

On 11/23/05, Trans <transfire@gmail.com> wrote:

> Maybe #regesc
#regesc is better, thanks.

> but I would still oppose its inclusion in the core, so it really does
> probably belong in Facets, where I don't have to even care that it
> exists.
Uhuh, like you haven't scoured through its source for what suits you
;-p

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Jeff_Wood · 27 November 2005 00:02

How does this differ from embedding variables in regular expressions now with

a = "hello world"
b = "hello"
c = /#{ b }/
c.match( a ).to_a
#=> ["hello"]

Let me know if I'm missing something...

j.

···

On 11/26/05, Nikolai Weibull <mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:

Trans wrote:

> Could you give us a summary of how it would apply. I really don't have
> time to weed through all that material.

There are two possible solutions as I see it that make more sense than
#to_rx: a) strings are interpreted as just that - strings of symbols
that you want to interpret literally and b) add syntax to allow you to
easily embed strings inside a regular expression.

In Perl 6, the suggestion is to interpret $string as a string and
<$string> as a regular expression. In Ruby that'd be #{string} and
<#{string}>, I suppose. (#{regex} would still mean what it means today
in both cases.)

Another thing that'd be nice to have is a way to insert literal strings
directly, e.g., /<'common regex operators include *, ?, and .'>/, where
<'...'> is syntax for a literal string. A way to embedd a string
variable would then perhaps be /<'#{var}'>/, but I think that the
solution in the previous paragraph may make more sense.

I'm probably not making myself clear enough, but if what I'm saying
seems interesting I'd suggest you at least read "Synopsis 5" [1].

nikolai

[1] Synopsis 5

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

7rans · 27 November 2005 00:17

Thanks, I glanced over the synopisis...and whoa! that's a lot of
changes --basically remaking regular expressions. Looks like they're
good changes mostly, but still, that's a major shift.

As for the interpolation itself, I totally agree. It would be better to
have some standard construct. Your proposal cooresponding to Perl 6
seems reasonable to me.

T.

7rans · 27 November 2005 00:22

Jeff,

  a = "hello world"
  b = "w.*"
  c = /#{ b }/
  c.match( a ).to_a
  #=> ["world"]

Characters arn't escaped.

T.

Jeff_Wood · 27 November 2005 00:51

I didn't want them to be ... I wanted the body of the string to be
passed in literally ...

Yes, I understand if I wanted otherwise I would have to do just a
touch more work...

c = /#{ Regexp.escape( b ) }/

but, it's all literal, and doesn't surprise me... as it shouldn't anyone else.

/#{ b }/ where b = "w.*" should be /w.*/

... guess I'm not understanding the original point ... somebody
wanting an additional wrapper for strings that auto escapes them ?

why not write one %e( ) or something like that.

...

j.

···

On 11/26/05, Trans <transfire@gmail.com> wrote:

Jeff,

  a = "hello world"
  b = "w.*"
  c = /#{ b }/
  c.match( a ).to_a
  #=> ["world"]

Characters arn't escaped.

T.

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Jeff_Wood · 27 November 2005 01:00

Although I am surprised there isn't a String#escape ( or maybe #escaped ) method

/#{ b.escaped }/

... or something like that. would be literal, and make sense ...

j.

···

On 11/26/05, Jeff Wood <jeff.darklight@gmail.com> wrote:

I didn't want them to be ... I wanted the body of the string to be
passed in literally ...

Yes, I understand if I wanted otherwise I would have to do just a
touch more work...

c = /#{ Regexp.escape( b ) }/

but, it's all literal, and doesn't surprise me... as it shouldn't anyone else.

/#{ b }/ where b = "w.*" should be /w.*/

... guess I'm not understanding the original point ... somebody
wanting an additional wrapper for strings that auto escapes them ?

why not write one %e( ) or something like that.

...

j.

On 11/26/05, Trans <transfire@gmail.com> wrote:
> Jeff,
>
> a = "hello world"
> b = "w.*"
> c = /#{ b }/
> c.match( a ).to_a
> #=> ["world"]
>
> Characters arn't escaped.
>
> T.
>
>
>

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Nikolai_Weibull · 27 November 2005 10:06

Jeff Wood wrote:

Although I am surprised there isn't a String#escape ( or maybe
#escaped ) method

/#{ b.escaped }/

... or something like that. would be literal, and make sense ...

Did you even read this thread? That's what was being proposed, see the
subject. The point is that that's not the right way to solve this
problem.

nikolai

···

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Jeff_Wood · 27 November 2005 11:35

Yes I did read the original thread.

And although from time to time I have been "thumbs-up" for adding
punctuation soup into Ruby, I'm learning and growing into the fact that it's
not the ruby way.

I was simply trying to suggest something that felt more ruby to me.

adding <blah> syntax ( or anything directly from Perl ) just continues to
feed where most of the complaints I've ever heard about ruby ... It doesn't
need to feel any more perlish than it already does in some places.

The only opinion that matters to me is Matz. If he likes it, I'm sure I'll
get used to it, otherwise, bleck, I'll pass.

j.

···

On 11/27/05, Nikolai Weibull <mailing-lists.ruby-talk@rawuncut.elitemail.org> wrote:

Jeff Wood wrote:

> Although I am surprised there isn't a String#escape ( or maybe
> #escaped ) method
>
> /#{ b.escaped }/
>
> ... or something like that. would be literal, and make sense ...

Did you even read this thread? That's what was being proposed, see the
subject. The point is that that's not the right way to solve this
problem.

nikolai

--
Nikolai Weibull: now available free of charge at http://bitwi.se/\!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

Topic		Replies	Views
MetaRegexp: experimental extensions to Regexp (requesting feedback) ruby-talk	3	118	3 October 2010
String.to_re? ruby-talk	3	82	23 February 2003
It would be nice to be able to get a regexes string back ruby-talk	4	103	27 January 2003
Regexp.escape with un-escapes ruby-talk	4	101	8 December 2009
RCR: regex + regex ruby-talk	3	67	27 October 2009

String#to_rx?

Related Topics