Regex boundaries+Umlauts

Hello!

a='abc ded ghi'
p a.gsub('\w','B')
=> nothing found
p a.gsub(/\w/,'B')
=> BBBBBB...
how that?!
Does match \w an "emtpy" character?
what does regex as replacement?

What can be used for $1 $2 ... in gsub ?
e.g. /(ab)(c)/ => "cab" # want $2$1

When to use, what is (knowing regex):
/:alnum:+/ # wrong?
/[:alnum:]+/
and /[[:alnum:]+]/ or ...:]]+/ # ??
same is [\w]+ ?

Is there a definition for
[:alnum:äÄ..üÜßčàáâ₣éšç...] # including all 'Umlauts' ?

Is there a method for str=~/b/
  sg. like. "abc".re("b") ?

Thanks
Berg

Hello!

a='abc ded ghi'
p a.gsub('\w','B')
=> nothing found
p a.gsub(/\w/,'B')
=> BBBBBB...
how that?!
Does match \w an "emtpy" character?
what does regex as replacement?

What can be used for $1 $2 ... in gsub ?
e.g. /(ab)(c)/ => "cab" # want $2$1

Have you read Class: String (Ruby 2.3.0) and Class: Regexp (Ruby 2.3.0) <Class: Regexp (Ruby 2.3.0); ? They will at least get you started.

When to use, what is (knowing regex):
/:alnum:+/ # wrong?
/[:alnum:]+/
and /[[:alnum:]+]/ or ...:]]+/ # ??
same is [\w]+ ?

Is there a definition for
[:alnum:äÄ..üÜßčàáâ₣éšç...] # including all 'Umlauts' ?

Is there a method for str=~/b/
  sg. like. "abc".re("b") ?

[1] pry(main)> "abc".match(/b/)
=> #<MatchData "b">
[2] pry(main)> "abc".match(/x/)
=> nil

Thanks
Berg

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hope this helps,

Mike

···

On Feb 3, 2016, at 3:29 AM, A Berger <aberger7890@gmail.com> wrote:

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Thanks Mike

Hello!

a='abc ded ghi'
p a.gsub('\w','B')

WHY not implement gsub(string,...) where string is the regex?

=> nothing found
p a.gsub(/\w/,'B')
=> BBBBBB...
how that?!
Does match \w an "emtpy" character?
what does regex as replacement?

What can be used for $1 $2 ... in gsub ?
e.g. /(ab)(c)/ => "cab" # want $2$1

IT'S confusing that once it is $1, once it is \1

Have you read Class: String (Ruby 2.3.0)

and

?
They will at least get you started.

When to use, what is (knowing regex):
/:alnum:+/ # wrong?
/[:alnum:]+/
and /[[:alnum:]+]/ or ...:]]+/ # ??
same is [\w]+ ?

IN an old example I found [:...:] seems this is wrong.
Wouldnt it be nicer than [[:...:]] ?

Is there a definition for
[:alnum:äÄ..üÜßčàáâ₣éšç...] # including all 'Umlauts' ?

I Couldnt find any. Perhaps s.o. who wants to implement this?

Is there a method for str=~/b/
  sg. like. "abc".re("b") ?

[1] pry(main)> "abc".match(/b/)
=> #<MatchData "b">
[2] pry(main)> "abc".match(/x/)
=> nil

SO it seems there is no method str.xxx(regex) like =~ (returning the
position)

···

Am 03.02.2016 10:43 schrieb "Mike Stok" <mike@stok.ca>:

On Feb 3, 2016, at 3:29 AM, A Berger <aberger7890@gmail.com> wrote:

Thanks
Berg

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hope this helps,

Mike

--

Mike Stok <mike@stok.ca>
Mike Stok

The "`Stok' disclaimers" apply.

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

​​

>>
>> Hello!
>>
>>
​​
a='abc ded ghi'
>> p a.gsub('\w','B')
WHY not implement gsub(string,...) where string is the regex?

Three reasons:

1) If you don't want to have to deal with a regular expression and just
want to match the string, you can, without doing any regex escaping. Handy
when the string might look like a regexp (e.g. when it's part of a URL)

2) Because the Regexp class exists, so there's no need to coerce Strings.
If you want to do regex substitution, just use an instance of Regexp.

3) Because it's easier to implement if all Strings are replaced as-is, and
all Regexps are matched. There's no need to attempt to parse all Strings as
Regexps just in case they look like "/.../".

Also it's called 'gsub' for 'globally substitute', not 'preg_replace' for
'perl-compatible regular expression engine replace'. Not all replaced
things are regexps, or even regexp-like.

=> nothing found

>> p a.gsub(/\w/,'B')
>> => BBBBBB...
>> how that?!
>> Does match \w an "emtpy" character?

irb(main):001:0> a='abc def ghi'
=> "abc def ghi"
irb(main):002:0> a.gsub(/\w/,'B')
=> "BBB BBB BBB"
irb(main):003:0>

​​

>> what does regex as replacement?
>>
>> What can be used for $1 $2 ... in gsub ?
>> e.g. /(ab)(c)/ => "cab" # want $2$1
IT'S confusing that once it is $1, once it is \1

​Well,

$1
is a special global variable, and
'\1'
is a substitution sequence in a string, so they're really quite different.​

​Note that you do have options available to you:

    'bar'.gsub(/a(.)/, '@\1')
    'bar'.gsub(/a(.)/) { |match| '@' + match[1] }
    'bar'.gsub(/a(.)/) { '@' + $1 }

>
> Have you read Class: String (Ruby 2.3.0)
and
Class: Regexp (Ruby 2.3.0) ?
They will at least get you started.
>
>> When to use, what is (knowing regex):
>> /:alnum:+/ # wrong?
>> /[:alnum:]+/
>> and /[[:alnum:]+]/ or ...:]]+/ # ??
>> same is [\w]+ ?
>>
IN an old example I found [:...:] seems this is wrong.
Wouldnt it be nicer than [[:...:]] ?

​The simple response here is: don't use an old example.​

I believe the POSIX bracket expressions use double-brackets so they can be
distinguished from non-POSIX bracket expressions:

    irb(main):001:0> 'abc:def'.gsub(/[:alnum:]/, '_')
    => "_bc_def"
    irb(main):002:0> 'abc:def'.gsub(/[[:alnum:]]/, '_')
    => "___:___"
    irb(main):003:0>

Also remember: /[[:alnum:]]/ and /\w/ are different. The former is strictly
alphabetic and numeric characters (includes Unicode, excludes underscore)
while the latter is [a-zA-Z0-9_]

>> Is there a definition for
>> [:alnum:äÄ..
​​
üÜßčàáâ₣éšç...] # including all 'Umlauts' ?
I Couldnt find any. Perhaps s.o. who wants to implement this?

​From the Regexp doc [http://ruby-doc.org/core-2.3.0/Regexp.html\]:

"
Ruby also supports the following non-POSIX character classes:

···

On 3 February 2016 at 23:11, A Berger <aberger7890@gmail.com> wrote:

Am 03.02.2016 10:43 schrieb "Mike Stok" <mike@stok.ca>:
>> On Feb 3, 2016, at 3:29 AM, A Berger <aberger7890@gmail.com> wrote:

   -

   /[[:word:]]/ - A character in one of the following Unicode general
   categories*Letter*, *Mark*, *Number*, *Connector_Punctuation*
   -

   /[[:ascii:]]/ - A character in the ASCII character set"


    ​irb(main):001:0> '​ü'.match( /[[:word:]]/ )
    => #<MatchData "ü">
    irb(main):002:0> '​ü'.match( /[[:alnum:]]/ )
    => #<MatchData "ü">
    irb(main):003:0>

>> Is there a method for str=~/b/
>> sg. like. "abc".re("b") ?
>>
>>
>
> [1] pry(main)> "abc".match(/b/)
> => #<MatchData "b">
> [2] pry(main)> "abc".match(/x/)
> => nil
SO it seems there is no method str.xxx(regex) like =~ (returning the
position)

​Yep, it's:

    "abc" =~ regex
    # or if you really want dots and parens:
    "abc".=~(regex)​

Incidentally, a quick scan of the doc for String also gives:

    "abc".index(regex)

Please remember to read the documentation first; it's generally of a pretty
high quality. We're happy to answer questions and help clear up confusion
if you've done the initial groundwork yourself.

​Cheers
--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Hello

​​

​​
a='abc ded ghi'
>> p a.gsub('\w','B')
WHY not implement gsub(string,...) where string is the regex?

Three reasons:

1) If you don't want to have to deal with a regular expression and just

want to match the string, you can, without doing any regex escaping. Handy
when the string might look like a regexp (e.g. when it's part of a URL)

So its working with strings already!
In the doc it only says pattern, not gsub(string,...)! Where do i find that
a pattern can be a string??

2) Because the Regexp class exists, so there's no need to coerce Strings.

If you want to do regex substitution, just use an instance of Regexp.

3) Because it's easier to implement if all Strings are replaced as-is,

and all Regexps are matched. There's no need to attempt to parse all
Strings as Regexps just in case they look like "/.../".

Also it's called 'gsub' for 'globally substitute', not 'preg_replace' for

'perl-compatible regular expression engine replace'. Not all replaced
things are regexps, or even regexp-like.

>> => nothing found
>> p a.gsub(/\w/,'B')
>> => BBBBBB...
>> how that?!
>> Does match \w an "emtpy" character?

irb(main):001:0> a='abc def ghi'
=> "abc def ghi"
irb(main):002:0> a.gsub(/\w/,'B')
=> "BBB BBB BBB"
irb(main):003:0>

​​
>> what does regex as replacement?
>>
>> What can be used for $1 $2 ... in gsub ?
>> e.g. /(ab)(c)/ => "cab" # want $2$1
IT'S confusing that once it is $1, once it is \1

​Well,
$1
is a special global variable, and
'\1'
is a substitution sequence in a string, so they're really quite

different.​

​Note that you do have options available to you:

    'bar'.gsub(/a(.)/, '@\1')
    'bar'.gsub(/a(.)/) { |match| '@' + match[1] }
    'bar'.gsub(/a(.)/) { '@' + $1 }

Sometimes only one solution would be easier (to remember). (Why not each of
the same $1 OR \1)

>
> Have you read http://ruby-doc.org/core-2.3.0/String.html#method-i-gsub

and

?
They will at least get you started.
Yes read already!

>
>> When to use, what is (knowing regex):
>> /:alnum:+/ # wrong?
>> /[:alnum:]+/
>> and /[[:alnum:]+]/ or ...:]]+/ # ??
>> same is [\w]+ ?
>>
IN an old example I found [:...:] seems this is wrong.
Wouldnt it be nicer than [[:...:]] ?

​The simple response here is: don't use an old example.​

I believe the POSIX bracket expressions use double-brackets so they can

be distinguished from non-POSIX bracket expressions:

    irb(main):001:0> 'abc:def'.gsub(/[:alnum:]/, '_')
    => "_bc_def"
    irb(main):002:0> 'abc:def'.gsub(/[[:alnum:]]/, '_')
    => "___:___"
    irb(main):003:0>

Also remember: /[[:alnum:]]/ and /\w/ are different. The former is

strictly alphabetic and numeric characters (includes Unicode, excludes
underscore) while the latter is [a-zA-Z0-9_]

>> Is there a definition for
>> [:alnum:äÄ..

​​
üÜßčàáâ₣éšç...] # including all 'Umlauts' ?
I Couldnt find any. Perhaps s.o. who wants to implement this?

​From the Regexp doc [http://ruby-doc.org/core-2.3.0/Regexp.html\]:

"
Ruby also supports the following non-POSIX character classes:

/[[:word:]]/ - A character in one of the following Unicode general

categoriesLetter, Mark, Number, Connector_Punctuation

/[[:ascii:]]/ - A character in the ASCII character set"


    ​irb(main):001:0> '​ü'.match( /[[:word:]]/ )
    => #<MatchData "ü">
    irb(main):002:0> '​ü'.match( /[[:alnum:]]/ )
    => #<MatchData "ü">
    irb(main):003:0>

I see - both are valid!

>> Is there a method for str=~/b/
>> sg. like. "abc".re("b") ?
>>
>>
>
> [1] pry(main)> "abc".match(/b/)
> => #<MatchData "b">
> [2] pry(main)> "abc".match(/x/)
> => nil
SO it seems there is no method str.xxx(regex) like =~ (returning the

position)

​Yep, it's:

    "abc" =~ regex
    # or if you really want dots and parens:
    "abc".=~(regex)​

Incidentally, a quick scan of the doc for String also gives:

    "abc".index(regex)

Please remember to read the documentation first; it's generally of a

pretty high quality. We're happy to answer questions and help clear up
confusion if you've done the initial groundwork yourself.

Assumed .index cant work with regex. Surprized!
Never combined to .=~

What is .itself for??
I want to make my code looking longer?

Thanks
Berg

···

Am 03.02.2016 22:33 schrieb "Matthew Kerwin" <matthew@kerwin.net.au>:

​Cheers
--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Hello

>> ​​
>>
>>
>>
>> ​​
>> a='abc ded ghi'
>> >> p a.gsub('\w','B')
>> WHY not implement gsub(string,...) where string is the regex?
>
> Three reasons:
>
> 1) If you don't want to have to deal with a regular expression and just

want to match the string, you can, without doing any regex escaping. Handy
when the string might look like a regexp (e.g. when it's part of a URL)

So its working with strings already!
In the doc it only says pattern, not gsub(string,...)! Where do i find

that a pattern can be a string??

Read: Class: String (Ruby 2.3.0)

"pattern" is just a word, it doesn't mean it's interpreted as a regular
expression. It's explained in the prose.

> [snip]

>> ​​
>> >> what does regex as replacement?
>> >>
>> >> What can be used for $1 $2 ... in gsub ?
>> >> e.g. /(ab)(c)/ => "cab" # want $2$1
>> IT'S confusing that once it is $1, once it is \1
>
> ​Well,
> $1
> is a special global variable, and
> '\1'
> is a substitution sequence in a string, so they're really quite

different.​

>
> ​Note that you do have options available to you:
>
> 'bar'.gsub(/a(.)/, '@\1')
> 'bar'.gsub(/a(.)/) { |match| '@' + match[1] }
> 'bar'.gsub(/a(.)/) { '@' + $1 }
>

Sometimes only one solution would be easier (to remember). (Why not each

of the same $1 OR \1)

So only use one. Choose the one that will serve you best. But note:

    'abc'.gsub(/a(.)/, 'q\1')
    #easy to read

    'abc'.gsub(/a(.)/) { 'q' + $1.upcase }
    # impossible using \1

> [snip]
>
>> >> Is there a method for str=~/b/
>> >> sg. like. "abc".re("b") ?
>> >>
>> >>
>> >
>> > [1] pry(main)> "abc".match(/b/)
>> > => #<MatchData "b">
>> > [2] pry(main)> "abc".match(/x/)
>> > => nil
>> SO it seems there is no method str.xxx(regex) like =~ (returning the

position)

>
> ​Yep, it's:
>
> "abc" =~ regex
> # or if you really want dots and parens:
> "abc".=~(regex)​
>
> Incidentally, a quick scan of the doc for String also gives:
>
> "abc".index(regex)
>
> Please remember to read the documentation first; it's generally of a

pretty high quality. We're happy to answer questions and help clear up
confusion if you've done the initial groundwork yourself.

Assumed .index cant work with regex. Surprized!

Read: Class: String (Ruby 2.3.0)

Never combined to .=~

What is .itself for??
I want to make my code looking longer?

Every instruction in Ruby is a method -- even things that look like
operators*.

So the "actual" method call is:

    'abc'.=~(/b/)

But Ruby provides "syntactic sugar"** so we can write:

    'abc' =~ /b/

You said "SO it seems there is no method str.xxx(regex) like =~" and I
pointed out that `=~` *is* the method. You can even make it look like one
if you really want.

* except for some, like assignment
** i.e. makes the syntax sweeter

···

On 05/02/2016 11:39 AM, "A Berger" <aberger7890@gmail.com> wrote:

Am 03.02.2016 22:33 schrieb "Matthew Kerwin" <matthew@kerwin.net.au>: