Look-behind regexp?

CT1 · 30 March 2005 13:48

Hi!
Are there any plans to support look-behinds in the core regexp engine?

I'm curious as to why we don't have it.
Thanks
Shajith

PS: I found an old request about this in the archives:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/417

Yukihiro_Matsumoto2 · 30 March 2005 13:58

Hi,

···

In message "Re: look-behind regexp ?" on Wed, 30 Mar 2005 22:48:18 +0900, Shajith <demerzel@gmail.com> writes:

Are there any plans to support look-behinds in the core regexp engine?

1.9 Oniguruma regexp engine already has one.

matz.

CT1 · 30 March 2005 14:11

Thanks!

Shajith

···

On Wed, 30 Mar 2005 22:58:08 +0900, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:

1.9 Oniguruma regexp engine already has one.

matz.

B_K_Oxley_binkley · 30 March 2005 14:44

Yukihiro Matsumoto wrote:

Shajith <demerzel@gmail.com> writes:

Are there any plans to support look-behinds in the core regexp
engine?

1.9 Oniguruma regexp engine already has one.

Where can I read more about this regexp engine? How does it compare to:

* Perl's own regexps
* regexp-engine from AEditor (http://aeditor.rubyforge.org/\)
* PCRE (http://www.pcre.org/\)

Thanks,
--binkley

Florian_Gross · 30 March 2005 15:24

B. K. Oxley (binkley) wrote:

1.9 Oniguruma regexp engine already has one.

Where can I read more about this regexp engine?

サービス終了のお知らせ seems to have a fairly complete listing of its features.

B_K_Oxley_binkley · 30 March 2005 15:29

Florian Gross wrote:

サービス終了のお知らせ seems to have a fairly complete listing of its features.

Ah, yes. Thanks. I should have Googled first.

But reading through that and the documentation on the same site, I am still looking for a rationale document. Why Onigurama and not, say, PCRE? Why a new regexp parser?

Cheers,
--binkley

George_Ogata1 · 2 April 2005 19:06

Florian Gross <flgr@ccan.de> writes:

サービス終了のお知らせ seems to have a
fairly complete listing of its features.

How about adding a metachar reference to the rdoc for Regexp? (Or
Regexp.new?)

Austin_Ziegler5 · 30 March 2005 15:58

1. Licensing. PCRE's licensing has been somewhat fluid. The current
release seems OK.
2. Control. In many ways, such a core feature to Ruby should be native to Ruby.
3. Native concepts. Ruby REs are a bit different because they end up
being objects.

-austin

···

On Thu, 31 Mar 2005 00:29:03 +0900, B. K. Oxley (binkley) <binkley@alumni.rice.edu> wrote:

Florian Gross wrote:
> サービス終了のお知らせ seems to have a
> fairly complete listing of its features.
Ah, yes. Thanks. I should have Googled first.

But reading through that and the documentation on the same site, I am
still looking for a rationale document. Why Onigurama and not, say,
PCRE? Why a new regexp parser?

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Yukihiro_Matsumoto2 · 30 March 2005 16:05

Hi,

···

In message "Re: look-behind regexp ?" on Thu, 31 Mar 2005 00:29:03 +0900, "B. K. Oxley (binkley)" <binkley@alumni.rice.edu> writes:

But reading through that and the documentation on the same site, I am
still looking for a rationale document. Why Onigurama and not, say,
PCRE? Why a new regexp parser?

PCRE does only support UTF-8 (as far as I know), not multiple
encodings like Ruby does. Oniguruma supports UTF-8, UTF-16,
ISO-8859-*, EUC-JP, Shift_JIS, and lot more.

matz.

B_K_Oxley_binkley · 30 March 2005 16:03

Austin Ziegler wrote:

1. Licensing. PCRE's licensing has been somewhat fluid. The current
release seems OK.
2. Control. In many ways, such a core feature to Ruby should be native to Ruby.
3. Native concepts. Ruby REs are a bit different because they end up
being objects.

Hrm.

In all honesty, these objections seem weak to me.

If the licensing is not a problem right now, why would it necessarily become one in the future? (Although I don't know the history of licensing in PCRE, so perhaps it has a record of arbitrariness.)

Control is not so important when you have the source code. And Ruby can contribute to the development of PCRE.

I'm unsure what you mean in point three. I presume that a Ruby regexp implementation would use PCRE for implementation, wrapping any details so that the implementation is not visible, and only objects remain.

Not to be so nitpicky, I only used PCRE as an example. I have an inherent dislike of wheel-reinvention (my natural laziness at play), so my ears perk up when I see something like a rewrite of regexp parsers when so many fine ones are already around.

Cheers,
--binkley

B_K_Oxley_binkley · 30 March 2005 16:08

Yukihiro Matsumoto wrote:

PCRE does only support UTF-8 (as far as I know), not multiple
encodings like Ruby does. Oniguruma supports UTF-8, UTF-16,
ISO-8859-*, EUC-JP, Shift_JIS, and lot more.

Ah. I inferred as much from the prominence given the list of encodings, but wanted to find out more.

Thanks,
--binkley

Yukihiro_Matsumoto2 · 30 March 2005 16:33

Here's the list of encodings supported by default:

   ASCII BIG5 EUC-KR EUC-JP EUC-TW
   ISO8859-1 ISO8859-2 ISO8859-3
   ISO8859-4 ISO8859-5 ISO8859-6
   ISO8859-7 ISO8859-8 ISO8859-9
   ISO8859-10 ISO8859-11 ISO8859-13
   ISO8859-14 ISO8859-15 ISO8859-16
   KOI8 KOI8-R Shift_JIS UTF-8
   UTF-16BE UTF-16LE UTF-32BE UTF-32LE

And more importantly, its encoding support is pluggable, you can add
new encoding support by writing callback routines.

matz.

···

In message "Re: look-behind regexp ?" on Thu, 31 Mar 2005 01:08:50 +0900, "B. K. Oxley (binkley)" <binkley@alumni.rice.edu> writes:

Ah. I inferred as much from the prominence given the list of encodings,
but wanted to find out more.

Christian_Neukirche1 · 31 March 2005 12:30

Yukihiro Matsumoto <matz@ruby-lang.org> writes:

>Ah. I inferred as much from the prominence given the list of encodings,
>but wanted to find out more.

Here's the list of encodings supported by default:

   ASCII BIG5 EUC-KR EUC-JP EUC-TW
   ISO8859-1 ISO8859-2 ISO8859-3
   ISO8859-4 ISO8859-5 ISO8859-6
   ISO8859-7 ISO8859-8 ISO8859-9
   ISO8859-10 ISO8859-11 ISO8859-13
   ISO8859-14 ISO8859-15 ISO8859-16
   KOI8 KOI8-R Shift_JIS UTF-8
   UTF-16BE UTF-16LE UTF-32BE UTF-32LE

And more importantly, its encoding support is pluggable, you can add
new encoding support by writing callback routines.

All this sounds very good. Is there any reason not to use Oniguruma
for 1.8.3?

···

In message "Re: look-behind regexp ?" > on Thu, 31 Mar 2005 01:08:50 +0900, "B. K. Oxley (binkley)" <binkley@alumni.rice.edu> writes:

matz.

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Nikolai_Weibull · 31 March 2005 13:25

* Christian Neukirchen (Mar 31, 2005 14:45):

Is there any reason not to use Oniguruma for 1.8.3?

It's still under very heavy development,
nikolai

···

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: minimalistic.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}

Topic		Replies	Views
Compatibility in regexps between perl and ruby ruby-talk	6	115	9 September 2007
Look-behind in oniguruma ruby-talk	5	78	12 September 2004
Regex in Ruby question ruby-talk	5	73	22 February 2008
[ann] regexp-engine 0.10 ruby-talk	0	112	6 May 2004
Oniguruma ruby-talk	4	86	6 January 2006

Look-behind regexp?

Related topics