Hello,
I try to find a regex which find only the domain name without the .com or .nl
I tried :
\w{3+}
But on http://www.tamarawobben.nl/testhier it finds http tamarawobben and testhier
How to solve this ?
Roelof
Hello,
I try to find a regex which find only the domain name without the .com or .nl
I tried :
\w{3+}
But on http://www.tamarawobben.nl/testhier it finds http tamarawobben and testhier
How to solve this ?
Roelof
You need to limit your expression more, your example will find ANY 3 "word" characters, even numbers and punctuation, and even if there are MORE word characters around them.
You could do this with a capture group (here, named "tld"):
regex = %r{
\. # one dot,
(?<tld> # capture as "tld":
[a-z]{3}+ # 3+ alpha characters (note: not \w)
)
$ # at the end of a line/string
}x
"http://example.com".match(regex)[:tld]
Or with a positive look-behind:
%r{
(?<=\.) # lookbehind for one dot,
[a-z]{3}+ # match 3+ alpha characters
$ # at the end of a line/string
}
"http://example.com".match(regex)
Another approach is using the URI library:
URI.parse("http://example.com/"\).host.split(".").last
Andrew Vit
On 14-06-02, 10:51, Roelof Wobben wrote:
I try to find a regex which find only the domain name without the .com
or .nlI tried :
\w{3+}
OP seems to want the domain name without the TLD or subdomains. Andrew's
URI solution actually seems quite the best (really no sense in rewriting
well-written regexps), but instead of the last part of the host, you'll
want the penultimate part. Several ways you can get that. Here's one:
URI.parse("http://www.tamarawobben.nl/testhier"\).host.split(".")[-2] #=>
*"*tamarawobben*"*
On Mon, Jun 2, 2014 at 2:18 PM, Andrew Vit <andrew@avit.ca> wrote:
On 14-06-02, 10:51, Roelof Wobben wrote:
I try to find a regex which find only the domain name without the .com
or .nlI tried :
\w{3+}
You need to limit your expression more, your example will find ANY 3
"word" characters, even numbers and punctuation, and even if there are MORE
word characters around them.You could do this with a capture group (here, named "tld"):
regex = %r{
\. # one dot,
(?<tld> # capture as "tld":
[a-z]{3}+ # 3+ alpha characters (note: not \w)
)
$ # at the end of a line/string
}x
"http://example.com".match(regex)[:tld]Or with a positive look-behind:
%r{
(?<=\.) # lookbehind for one dot,
[a-z]{3}+ # match 3+ alpha characters
$ # at the end of a line/string
}
"http://example.com".match(regex)Another approach is using the URI library:
URI.parse("http://example.com/"\).host.split(".").last
Andrew Vit
Andrew Vit schreef op 2-6-2014 21:18:
regex =
Thanks,
But if I try all three on a online ruby intepreter they do not give any answer.
Roelof
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?
Kind regards
robert
On Mon, Jun 2, 2014 at 7:51 PM, Roelof Wobben <r.wobben@home.nl> wrote:
I try to find a regex which find only the domain name without the .com or
.nlI tried :
\w{3+}
But on http://www.tamarawobben.nl/testhier it finds http tamarawobben and
testhierHow to solve this ?
--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/
rubular.com is a great site for testing regexes. Here is one for the last
regex given by Andrew Vit:
Good luck
On Mon, Jun 2, 2014 at 12:32 PM, Roelof Wobben <r.wobben@home.nl> wrote:
Andrew Vit schreef op 2-6-2014 21:18:
regex =
Thanks,
But if I try all three on a online ruby intepreter they do not give any
answer.Roelof
Robert Klemme schreef op 3-6-2014 8:41:
I try to find a regex which find only the domain name without the .com or
.nlI tried :
\w{3+}
But on http://www.tamarawobben.nl/testhier it finds http tamarawobben and
testhierHow to solve this ?
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?Kind regards
robert
Im a little bit further.
I have this : (?<=\.)(.*?)(?=\.)
it seems to work except I have to tell that on the .*? the / is not included.
And on the (<?=/.) I have to find a way to include the //
When I do (<?=/[.|/]) or (<?=/[.|//]) I see a message that I have to excape the /
Roelof
On Mon, Jun 2, 2014 at 7:51 PM, Roelof Wobben <r.wobben@home.nl> wrote:
URI can extract from larger texts (URI.extract), parse URIs (URI.parse), and after that it's easy to split the domain parts from the fully-qualified hostnames. Really, I don't think there's any point in reinventing this using a Regexp... unless it's just a learning exercise.
Andrew Vit
On 14-06-02, 23:41, Robert Klemme wrote:
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?
Roelof Wobben schreef op 3-6-2014 8:55:
Robert Klemme schreef op 3-6-2014 8:41:
I try to find a regex which find only the domain name without the .com or
.nlI tried :
\w{3+}
But on http://www.tamarawobben.nl/testhier it finds http tamarawobben and
testhierHow to solve this ?
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?Kind regards
robert
Im a little bit further.
I have this : (?<=\.)(.*?)(?=\.)it seems to work except I have to tell that on the .*? the / is not included.
And on the (<?=/.) I have to find a way to include the //When I do (<?=/[.|/]) or (<?=/[.|//]) I see a message that I have to excape the /
Roelof
I tried this one (?<=\[.|\//\)(.*?)(?=\.)
but still the error message taht there are un escaped backslashes .
Roelof
On Mon, Jun 2, 2014 at 7:51 PM, Roelof Wobben <r.wobben@home.nl> wrote:
This is a learning exercise from codewars.
But I think I will use a regex for finding the full domain and then use split to find only the part before the .com and so on.
I tried and I think its very difficult to find a regex which can solve all these problems.
http:///www.tamarawobben.nl/index.html
http://tamarawobben.nl/index.html
http://.tamarawobben.nl/index.html
where all three tamarawobben.nl must be found.
Roelof
Op 3 juni 2014 om 18:36 schreef Andrew Vit andrew@avit.ca:
On 14-06-02, 23:41, Robert Klemme wrote:
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?URI can extract from larger texts (URI.extract), parse URIs (URI.parse),
and after that it’s easy to split the domain parts from the
fully-qualified hostnames. Really, I don’t think there’s any point in
reinventing this using a Regexp… unless it’s just a learning exercise.Andrew Vit
Roelof Wobben schreef op 3-6-2014 8:55:
Robert Klemme schreef op 3-6-2014 8:41:
...
I tried this one (?<=\[.|\//\)(.*?)(?=\.)
but still the error message taht there are un escaped backslashes .
Please stop fullquoting - especially if you are not referring in any
way to the quoted text. Thank you.
Regards
robert
On Tue, Jun 3, 2014 at 10:11 AM, Roelof Wobben <r.wobben@home.nl> wrote:
On Mon, Jun 2, 2014 at 7:51 PM, Roelof Wobben <r.wobben@home.nl> wrote:
--
[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/
I tried to do it with a single regexp and I couldn't do anything
useful, so I tried to do it first with a regexp to extract the part
between the slashes (between http:// and the following /) and then use
split on "." to the result. This way is quite simpler. I'm not going
to give you the solution, so you can try a little bit this approach,
as this is a learning exercise.
Let me know if you get stuck.
Jesus.
On Tue, Jun 3, 2014 at 6:50 PM, Roelof Wobben <r.wobben@home.nl> wrote:
Op 3 juni 2014 om 18:36 schreef Andrew Vit <andrew@avit.ca>:
On 14-06-02, 23:41, Robert Klemme wrote:
That depends on your input. Do you want to find those domain names in
a larger text? Do you try to parse URIs? Do you have full qualified
domain names from which you want to extract a portion?URI can extract from larger texts (URI.extract), parse URIs (URI.parse),
and after that it's easy to split the domain parts from the
fully-qualified hostnames. Really, I don't think there's any point in
reinventing this using a Regexp... unless it's just a learning exercise.Andrew Vit
This is a learning exercise from codewars.
But I think I will use a regex for finding the full domain and then use
split to find only the part before the .com and so on.I tried and I think its very difficult to find a regex which can solve all
these problems.http:///www.tamarawobben.nl/index.html
http://tamarawobben.nl/index.html
http://<subdomain>.tamarawobben.nl/index.html
where all three tamarawobben.nl must be found.
Roelof