“Clifford Heath” cjh_nospam@managesoft.com schrieb im Newsbeitrag
news:1052444464.885588@excalibur.osa.com.au…
While trying to build an RE to parse a shell-style regexp into an
array of non-wild, wild, non-wild, wild, etc I found (again) that
the grouping operator (), when followed by *, returns only the last
match into the MatchData:
matches = regex.match(str)
p matches[1…(matches.length-1)]
yields:
[“baz”]
That’s std behavior on all platforms. A possible reason is that for
substitutions like with gsub you would need an additional argument that
tells which element of the array. If you want to have the complete thing
just place another set of bracktes around.
It’s invalid because, for some reason, it’s seeing it as a ]. It
seems that interpolation is being done twice; once by the implicit
string constructor and once by Regexp#new. Thus, your string
‘[a-z\]’ is seen by Regexp#new as ‘[a-z]’. If you double the
backslashes, you’ll get the desired result (e.g., ‘[a-z\\]’).
Alternatively, you can explicitly escape the right bracket and it
will work as well: ‘[a-z\]’.
As to why, Robert, one might not actually be able to do this, as the
regexp could be specified through user input.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.09 at 11:48:08
···
On Fri, 9 May 2003 17:58:11 +0900, Robert Klemme wrote:
Clifford Heath:
Annoying. I wanted [“foo”, “*”, “bar”, “?”, “baz”]. How to do
this most simply?
Again, if and only if one is specifying the regular expression
statically.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.12 at 09:15:33
···
On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:
Clifford Heath:
Robert Klemme wrote:
Clifford Heath:
re = Regexp.new(‘[a-z\]’)
Why not simply do
re = /[a-z\]/
Because it was part of a larger extended re.
But you can still use re = /…/ instead of re = Regexp.new ‘…’
pat = “ana”
str = “banana”
index = /#{pat}/ =~ str
p index # >>1
Regards,
Brian.
···
On Mon, May 12, 2003 at 10:18:00PM +0900, Austin Ziegler wrote:
On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:
Clifford Heath:
Robert Klemme wrote:
Clifford Heath:
re = Regexp.new(‘[a-z\]’)
Why not simply do
re = /[a-z\]/
Because it was part of a larger extended re.
But you can still use re = /…/ instead of re = Regexp.new ‘…’
Again, if and only if one is specifying the regular expression
statically.
re = Regexp.new(‘[a-z\]’)
Why not simply do
re = /[a-z\]/
Because it was part of a larger extended re.
But you can still use re = /…/ instead of re = Regexp.new ‘…’
Again, if and only if one is specifying the regular expression
statically.
Am I missing something or just misunderstanding what you mean by
specifying it statically?
def check( name, string )
puts /Mr. (#{name.capitalize})/.match(string)[1]
end
check “nelson”, “Hello Mr. Nelson, how are you?” #=> Nelson
···
On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:
–
([ Kent Dahl ]/)_ ~[ http://www.stud.ntnu.no/~kentda/ ]/~
))_student/(( _d L b_/ NTNU - graduate engineering - 5. year )
( __õ|õ// ) )Industrial economics and technological management(
_/ö____/ (_engineering.discipline=Computer::Technology)
1 | pat = “[a-z\]”
2 | str = “abcd\efgh”
3 | index = /#{pat}/ =~ str
4 | p index
line 3 results in:
RegexpError: premature end of regular expression: /[a-z]/
Change the pattern to “[\a-z]” and the error goes away.
When building a pattern that includes a character class specifier,
it needs to be specified carefully or Regexp.new(pat) and /#{pat}/
won’t work.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.12 at 18:23:38
···
On Mon, 12 May 2003 23:51:44 +0900, Brian Candler wrote:
On Mon, May 12, 2003 at 10:18:00PM +0900, Austin Ziegler wrote:
On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:
Clifford Heath:
Robert Klemme wrote:
Clifford Heath:
re = Regexp.new(‘[a-z\]’)
Why not simply do
re = /[a-z\]/
Because it was part of a larger extended re.
But you can still use re = /…/ instead of re = Regexp.new ‘…’
Again, if and only if one is specifying the regular expression
statically.
It doesn’t have to be static:
pat = “ana”
str = “banana”
index = /#{pat}/ =~ str
p index # >> 1
It’s the patttern specified specifically: “[a-z\]”. So far as I can
tell, the backslash is being interpolated twice (once by String#new
and once by Regexp#new). Is this a bug? I’m not sure, but I think
so; it doesn’t do it twice with “[\a-z]”, “[a-z\]”, or
“[a-z\\]”.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.12 at 18:29:34
···
On Tue, 13 May 2003 00:02:34 +0900, Kent Dahl wrote:
Austin Ziegler wrote:
On Mon, 12 May 2003 17:39:19 +0900, Robert Klemme wrote:
Clifford Heath:
Robert Klemme wrote:
Clifford Heath:
re = Regexp.new(‘[a-z\]’)
Why not simply do
re = /[a-z\]/
Because it was part of a larger extended re.
But you can still use re = /…/ instead of re = Regexp.new
‘…’
Again, if and only if one is specifying the regular expression
statically.
Am I missing something or just misunderstanding what you mean by
specifying it statically?
It’s the patttern specified specifically: “[a-z\]”. So far as I can
tell, the backslash is being interpolated twice (once by String#new
and once by Regexp#new). Is this a bug? I’m not sure, but I think
so; it doesn’t do it twice with “[\a-z]”, “[a-z\]”, or
“[a-z\\]”.
Is this with 1.8? I thought I remembered a thread about some new character
class “warnings” code that do all sorts of weird unusual things that the pre 1.8
versions didn’t have.
1 | pat = “[a-z\]”
2 | str = “abcd\efgh”
3 | index = /#{pat}/ =~ str
4 | p index
line 3 results in:
RegexpError: premature end of regular expression: /[a-z]/
Double-quoted strings interpolate backslashes: so in your example
pat contains:
[a-z]
No:
irb(main):001:0> “[a-z\]”
=> “[a-z\]”
It’s rather irrelevant anyway, as the exact same behaviour happens
with single-quoted strings (change line 1 to ‘’ instead of “”).
Change the pattern to “[\a-z]” and the error goes away.
Yes, that’s
[ \ a - z ]
No:
irb(main):008:0> “[\a-z]”
=> “[\a-z]”
I guess ‘\a’ is treated as just ‘a’ in a character class.
No:
irb(main):009:0> /[\a]/ =~ “abc\a”
=> 3
This behaviour is in both 1.6.8 and 1.8.0 (2003-05-09). I really
think that it’s a bug because [\a-z] and [a-z\] should be
semantically equivalent. It works when done as a literal; it doesn’t
work when substitution is done.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.12 at 22:54:26
···
On Tue, 13 May 2003 07:54:02 +0900, Brian Candler wrote:
On Tue, May 13, 2003 at 07:29:24AM +0900, Austin Ziegler wrote:
As I noted to Brian Candler, both 1.6.8 and 1.8.0/2003-05-09.
-austin
– Austin Ziegler, austin@halostatue.ca on 2003.05.12 at 23:02:05
···
On Tue, 13 May 2003 07:56:07 +0900, Mike Campbell wrote:
It’s the patttern specified specifically: “[a-z\]”. So far as I
can tell, the backslash is being interpolated twice (once by
String#new and once by Regexp#new). Is this a bug? I’m not sure,
but I think so; it doesn’t do it twice with “[\a-z]”,
“[a-z\]”, or “[a-z\\]”.
Is this with 1.8? I thought I remembered a thread about some new
character class “warnings” code that do all sorts of weird unusual
things that the pre 1.8 versions didn’t have.
1 | pat = “[a-z\]”
2 | str = “abcd\efgh”
3 | index = /#{pat}/ =~ str
4 | p index
line 3 results in:
RegexpError: premature end of regular expression: /[a-z]/
Double-quoted strings interpolate backslashes: so in your example
pat contains:
[a-z]
No:
irb(main):001:0> “[a-z\]”
=> “[a-z\]”
Now try:
a = “[a-z\]”
p a.length # => 6
a.each_byte { |c| puts “%c” % c } # => [ a - z \ ]
irb is deceiving you, because ‘inspect’ outputs strings in a way which can
be re-input as strings into Ruby. A single backslash is printed as two
backslashes.
It’s rather irrelevant anyway, as the exact same behaviour happens
with single-quoted strings (change line 1 to ‘’ instead of “”).
And single-quoted strings have the same issue it turns out:
a = '\\'
p a.length # => 1
I am not sure why that should be, since ‘\n’.length is 2. I think you may
have uncovered a bug here.
No:
irb(main):009:0> /[\a]/ =~ “abc\a”
=> 3
“\a” in a double-quoted string is a ‘BEL’ (a for Audible), ASCII code 7
irb(main):001:0> “\a”[0]
=> 7
Brian.
···
On Tue, May 13, 2003 at 12:02:06PM +0900, Austin Ziegler wrote:
On Tue, 13 May 2003 07:54:02 +0900, Brian Candler wrote:
On Tue, May 13, 2003 at 07:29:24AM +0900, Austin Ziegler wrote:
I just realised, it’s not a bug: it’s necessary to give a mechanism for
inserting a single quote within a single-quoted string. This is done by
escaping it with backslash:
'\''.length #>> 1 (just a single quote)
But that in turn means that to get a literal backslash it also needs to be
escaped:
'\\'.length #>> 1 (just a backslash)
All other backslash-X sequences are inserted as the backslash and the X.
Regexps are subject to certain quoting rules too. Try:
z = /ab\c/
puts z.inspect
This program crashes under both ruby 1.6 ("unterminated regexp meets end of
file) and 1.8 (“unterminated string meets end of file”).
But z = /ab\d/ works. Somebody care to explain that one?
Regards,
Brian.
···
On Tue, May 13, 2003 at 04:10:36PM +0900, Brian Candler wrote:
And single-quoted strings have the same issue it turns out:
a = '\\'
p a.length # => 1
I am not sure why that should be, since ‘\n’.length is 2. I think you may
have uncovered a bug here.
When the RE is being built, a is being interpolated again. If I’ve
built my regular expression source string properly, it should NOT be
interpolated again. Is there any way to build a RE from a string
which does not reinterpolate the string? Is there a way to add such
a functionality if it does not exist?
Frankly, /#{a}/ for the above string should be no different than
/[a-z\]/.
When the RE is being built, a is being interpolated again. If I’ve
built my regular expression source string properly, it should NOT be
interpolated again. Is there any way to build a RE from a string
which does not reinterpolate the string? Is there a way to add such
a functionality if it does not exist?
I’m not sure what you mean by “interpolate again” (or reinterpolate).
I think everything is happening just once: you’ve created a string
([a-z]), and you’re interpolating it into a regex.
Frankly, /#{a}/ for the above string should be no different than
/[a-z\]/.
Except… a is a 6-char string ([a-z]), and the contents of the regex
there is 7 characters There’s no way for Ruby to backtrack and
know that, when you created the string, you typed \ twice. You might
have produced the string this way:
I think what he wants is reasonable in a way…
it would be nice if there were a way to specify
a string (or a regex) that did not expand
backslashes. If there were, that would solve his
minor dilemma.
“Interpolate” is not the right word here… but
there are definitely two levels of processing
going on. For example: The sequence of characters
“\\n” gets mapped internally to “\n” and if that
in turn were used in a regex, it would be collpsed
again into \n. Correct?
What about a naive solution like this?
class String
def raw
self.inspect[1…-2]
end
end
And then /#{myvar.raw}/ or some such. Am I way
off base here? This is untested.
Hal
···
----- Original Message -----
From: dblack@superlink.net
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, May 13, 2003 1:21 PM
Subject: Re: Regexp: why does (re)* return only last repetition?
I’m not sure what you mean by “interpolate again” (or reinterpolate).
I think everything is happening just once: you’ve created a string
([a-z]), and you’re interpolating it into a regex.
Frankly, /#{a}/ for the above string should be no different than
/[a-z\]/.
Except… a is a 6-char string ([a-z]), and the contents of the regex
there is 7 characters There’s no way for Ruby to backtrack and
know that, when you created the string, you typed \ twice. You might
have produced the string this way: