regex

Hi

How can I build the Regex
/(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Opti

In regexps, "and" is implicit. For example:

    /foo/

means find an "f" somewhere, and followed by an "o", and in turn followed
by an "o".

Hi!

All continuous tokens in a regex are joined with your 'and' symbol implicitly.

/ab/ matches /ab/
/ac/ matches /ac/
/a[bc]/ matches 'ab' and 'ac'

Yunzhe

···

-----Original Messages-----
From: "Die Optimisten" <inform@die-optimisten.net>
Sent Time: 2022-08-09 17:08:33 (Tuesday)
To: Ruby-Talk <ruby-talk@ruby-lang.org>
Cc:
Subject: regex

Hi

How can I build the Regex
/(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Opti

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

There's no "and" operator in regexp. But you can use lookahead, which kind of simulates that, but is certainly less performant.

[4] pry(main)> /(?=hello)(hi|howdy|hello)/.match? "hi"
=> false
[5] pry(main)> /(?=hello)(hi|howdy|hello)/.match? "hello"
=> true
[6] pry(main)> /(?=hello)(hi|howdy|hello)/.match? "howdy"
=> false
[7] pry(main)>

Kind of, because length matters, note I used + in the lookahead.

[14] pry(main)> /(?=[A-Z]+)((?i)hello|world)/.match? "hello"
=> false
[15] pry(main)> /(?=[A-Z]+)((?i)hello|world)/.match? "HELLO"
=> true
[16] pry(main)> /(?=[A-Z]+)((?i)hello|world)/.match? "Hello"
=> true
[17] pry(main)>

To simulate "and not" you can use negative lookahead:

[17] pry(main)> /(?![A-Z]+)((?i)hello|world)/.match? "Hello"
=> false
[18] pry(main)> /(?![A-Z]+)((?i)hello|world)/.match? "HELLO"
=> false
[19] pry(main)> /(?![A-Z]+)((?i)hello|world)/.match? "hello"
=> true
[20] pry(main)>

···

On 8/9/22 11:08, Die Optimisten wrote:

Hi

How can I build the Regex
/(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Opti

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hello!
So for (a and (b or c) we have to search for (a before (b|c)) AND ((b

c) before a)

can this be somehow reduced? (there is no 'before-or-after' operator...
Is the performance + ram_needed better when using distinct commands (
/a/.match?... and /b|c/.match?... ) ?

Thank you
Opti

You can test each option in one regex:

"a something b".match(/a.*[bc]|[bc].*a/)

"c something a".match(/a.*[bc]|[bc].*a/)

Hi,

How can I build the Regex
   /(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Such an operator does not exist in Regular Expressions (as defined by
Kleene) and also does not exist in any Regexp engine I have ever heard
of. The reason is quite simple: it does not make sense because it can
never match.

A symbol can never be two symbols at the same time. Your proposed
Regex is looking for a symbol that is at the same time an `a` and also
either a `b` or a `c`. But that is not possible: if the symbol is an
`a`, then it cannot possibly be a `b` or a `c`, and if the symbol is a
`b` or a `c`, then it cannot possibly be an `a`.

To formalize it a little:

Let A be an arbitrary regular expression.
Let B ≠ A be an arbitrary regular expression.

∀A, B: There can never be a string that is recognized by the regular
expression A∧B.

The proof for that is a little too long to fit into an email (and also
a little bit over my head, so I will not even attempt it), but
intuitively, it should be possible to at least get an inkling why this
statement might be true.

OTOH, it is easy to see that for A=B, i.e. for the regular expression
A∧A, there *are* strings that are recognized by it, namely exactly the
set of strings that are recognized by A.

Cheers

···

Die Optimisten <inform@die-optimisten.net> wrote:

Hello,
thanks for your answer;
I should have written that a,b,c are placeholders for strings;
but ... also if they're one-char strings: why 'can't never match?'
# Also a could be the same as b or c.....
But (what I meant): -> / (a.*(b|c)) | ((b|c).*a) / # also with or
without '.*' ...
* Can this be simplified (without having to write a,b,c twice) ?
Opti
PS: What does OTOH mean?

···

Am 09.08.22 um 21:14 schrieb Jörg W Mittag:

Hi,

Die Optimisten <inform@die-optimisten.net> wrote:

How can I build the Regex
    /(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Such an operator does not exist in Regular Expressions (as defined by
Kleene) and also does not exist in any Regexp engine I have ever heard
of. The reason is quite simple: it does not make sense because it can
never match.

A symbol can never be two symbols at the same time. Your proposed
Regex is looking for a symbol that is at the same time an `a` and also
either a `b` or a `c`. But that is not possible: if the symbol is an
`a`, then it cannot possibly be a `b` or a `c`, and if the symbol is a
`b` or a `c`, then it cannot possibly be an `a`.

To formalize it a little:

Let A be an arbitrary regular expression.
Let B ≠ A be an arbitrary regular expression.

∀A, B: There can never be a string that is recognized by the regular
expression A∧B.

The proof for that is a little too long to fit into an email (and also
a little bit over my head, so I will not even attempt it), but
intuitively, it should be possible to at least get an inkling why this
statement might be true.

OTOH, it is easy to see that for A=B, i.e. for the regular expression
A∧A, there *are* strings that are recognized by it, namely exactly the
set of strings that are recognized by A.

Cheers

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hello,

Not sure if I understood you correctly nor whether this is appropiate solution, but

you could do something like ((a).*(b|c)) | ((?3).*(?2))

In ruby it would work look like

/((a).*(b|c))|((\g<3>).*(\g<2>))/

Here (a) would be group capture 2 and (b|c) would be group capture 3, that's why we later call them as (\g<3>) and (\g<2>) to reverse their order.

I'm in no ways exper (just learning tbf), so others should say if this is good idea.

Cheers

···

On 09.08.2022 21:14, Jörg W Mittag wrote:

Hi,

Die Optimisten<inform@die-optimisten.net> wrote:

How can I build the Regex
    /(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Such an operator does not exist in Regular Expressions (as defined by
Kleene) and also does not exist in any Regexp engine I have ever heard
of. The reason is quite simple: it does not make sense because it can
never match.

A symbol can never be two symbols at the same time. Your proposed
Regex is looking for a symbol that is at the same time an `a` and also
either a `b` or a `c`. But that is not possible: if the symbol is an
`a`, then it cannot possibly be a `b` or a `c`, and if the symbol is a
`b` or a `c`, then it cannot possibly be an `a`.

To formalize it a little:

Let A be an arbitrary regular expression.
Let B ≠ A be an arbitrary regular expression.

∀A, B: There can never be a string that is recognized by the regular
expression A∧B.

The proof for that is a little too long to fit into an email (and also
a little bit over my head, so I will not even attempt it), but
intuitively, it should be possible to at least get an inkling why this
statement might be true.

OTOH, it is easy to see that for A=B, i.e. for the regular expression
A∧A, there *are* strings that are recognized by it, namely exactly the
set of strings that are recognized by A.

Cheers

Unsubscribe:<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

I think the answer is much simpler than many have put it.

First, assuming no “to-be-ignored” strings between a, b, and c: /a(b|c)/ or /(a)b|c/

Second, if you want to allow extra characters between the two strings: //a.*(b|c)/ or /(a.*)b|c/

The parentheses are required so that the “a” (or “a.*”) are not grouped with the “b” in the alternation. However, if a, b, or c are complex Regexps themselves, you may need to add parentheses around them to make them atomic relative to the “|”.

···

On Aug 9, 2022, at 4:11 PM, Gludek <gludekpl@gmail.com> wrote:

Hello,

Not sure if I understood you correctly nor whether this is appropiate solution, but

you could do something like ((a).*(b|c)) | ((?3).*(?2))

In ruby it would work look like

/((a).*(b|c))|((\g<3>).*(\g<2>))/

Here (a) would be group capture 2 and (b|c) would be group capture 3, that's why we later call them as (\g<3>) and (\g<2>) to reverse their order.

I'm in no ways exper (just learning tbf), so others should say if this is good idea.

Cheers

On 09.08.2022 21:14, Jörg W Mittag wrote:

Hi,

Die Optimisten <inform@die-optimisten.net> <mailto:inform@die-optimisten.net> wrote:

How can I build the Regex
   /(a and (b or c)/ # (in one term) ? => is there an
'and'-symbol, like | ?

Such an operator does not exist in Regular Expressions (as defined by
Kleene) and also does not exist in any Regexp engine I have ever heard
of. The reason is quite simple: it does not make sense because it can
never match.

A symbol can never be two symbols at the same time. Your proposed
Regex is looking for a symbol that is at the same time an `a` and also
either a `b` or a `c`. But that is not possible: if the symbol is an
`a`, then it cannot possibly be a `b` or a `c`, and if the symbol is a
`b` or a `c`, then it cannot possibly be an `a`.

To formalize it a little:

Let A be an arbitrary regular expression.
Let B ≠ A be an arbitrary regular expression.

∀A, B: There can never be a string that is recognized by the regular
expression A∧B.

The proof for that is a little too long to fit into an email (and also
a little bit over my head, so I will not even attempt it), but
intuitively, it should be possible to at least get an inkling why this
statement might be true.

OTOH, it is easy to see that for A=B, i.e. for the regular expression
A∧A, there *are* strings that are recognized by it, namely exactly the
set of strings that are recognized by A.

Cheers

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe> <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt; <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

<OpenPGP_0xE6B3E570C267469B.asc>
Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;