Regular expression to parse out "host" part of URL string

Wes_Gamble · 17 April 2006 15:39

All,

I am trying to write a regex to parse out the "host" part of a potential
URL.

So if presented with

I want to get

If presented with

I want to get

If presented with

www.cnn.com/some/other/stuff

I want to get

This regex: /(.*)?\/*/ matches everything
This regex: /(.*)?\// matches everything including the "/"

How can I just get the part of the string before the slash?

Thanks,
Wes

···

--
Posted via http://www.ruby-forum.com/.

David_A_Black3 · 17 April 2006 15:41

Hi --

···

On Tue, 18 Apr 2006, Wes Gamble wrote:

All,

I am trying to write a regex to parse out the "host" part of a potential
URL.

So if presented with

www.cnn.com

I want to get

www.cnn.com

If presented with

www.cnn.com/

I want to get

www.cnn.com

If presented with

www.cnn.com/some/other/stuff

I want to get

www.cnn.com

This regex: /(.*)?\/*/ matches everything
This regex: /(.*)?\// matches everything including the "/"

How can I just get the part of the string before the slash?

How about:

/[^\/]+/ # match one or more non-slash characters

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

Wes_Gamble · 17 April 2006 15:50

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/"

though

?

unknown wrote:

···

Hi --

On Tue, 18 Apr 2006, Wes Gamble wrote:

If presented with
How can I just get the part of the string before the slash?

How about:

/[^\/]+/ # match one or more non-slash characters

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

--
Posted via http://www.ruby-forum.com/\.

Tom_Moertel · 17 April 2006 16:41

Wes Gamble wrote:

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/" ?

The entire regexp matches everything up to the first slash *and* the slash. Recall, *this* is the regexp, and it matches a trailing slash:

(.*)?\/

The first parenthesized group of the regex, however, matches only up to, but not including, the trailing slash. So, if the entire regex matches, you want to get portion of the match that corresponds to the first parenthesized group, which will be stored in $1:

irb(main):009:0* ("example.com/" =~ /(.*)?\//) && $1
=> "example.com"

Or, you can capture the regex-matching operation's result as a MatchData object and query it to retrieve the desired portion:

irb(main):010:0> if matchdata = /(.*)?\//.match("example.com/")
irb(main):011:1> matchdata[1]
irb(main):012:1> end
=> "example.com"

Or, you can use the zero-width positive lookahead regexp extension to make sure the entire regexp matches only what you want. Then you can use the entire match as your result:

irb(main):024:0* if matchdata = /.*?(?=\/)/.match("example.com/")
irb(main):025:1> matchdata.to_s
irb(main):026:1> end
=> "example.com"

Cheers,
Tom

David_A_Black3 · 22 April 2006 20:57

Hi --

···

On Tue, 18 Apr 2006, Wes Gamble wrote:

unknown wrote:

Hi --

On Tue, 18 Apr 2006, Wes Gamble wrote:

If presented with
How can I just get the part of the string before the slash?

How about:

/[^\/]+/ # match one or more non-slash characters

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/"

though

?

Because you told it to You've got a / in the pattern.

(Keep in mind that the *whole* pattern matches, not just the part in
parentheses. The parentheses are just for capturing submatches.)

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

Topic		Replies	Views
Regular expression to parse out "host" part of URL string ruby-talk	1	107	17 April 2006
Regular expression to parse out "host" part of URL strin ruby-talk	2	108	22 April 2006
Splitting up hostname using Regex ruby-talk	6	116	15 January 2008
Quick Regex Query ruby-talk	6	80	27 December 2005
Regex hostnames? ruby-talk	5	158	16 January 2013

Regular expression to parse out "host" part of URL string

Related topics