Regular expression to parse out "host" part of URL string

All,

I am trying to write a regex to parse out the "host" part of a potential
URL.

So if presented with

I want to get

If presented with

I want to get

If presented with

www.cnn.com/some/other/stuff

I want to get

This regex: /(.*)?\/*/ matches everything
This regex: /(.*)?\// matches everything including the "/"

How can I just get the part of the string before the slash?

Thanks,
Wes

···

--
Posted via http://www.ruby-forum.com/.

Hi --

···

On Tue, 18 Apr 2006, Wes Gamble wrote:

All,

I am trying to write a regex to parse out the "host" part of a potential
URL.

So if presented with

www.cnn.com

I want to get

www.cnn.com

If presented with

www.cnn.com/

I want to get

www.cnn.com

If presented with

www.cnn.com/some/other/stuff

I want to get

www.cnn.com

This regex: /(.*)?\/*/ matches everything
This regex: /(.*)?\// matches everything including the "/"

How can I just get the part of the string before the slash?

How about:

   /[^\/]+/ # match one or more non-slash characters

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/"

though

?

unknown wrote:

···

Hi --

On Tue, 18 Apr 2006, Wes Gamble wrote:

If presented with
How can I just get the part of the string before the slash?

How about:

   /[^\/]+/ # match one or more non-slash characters

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!

--
Posted via http://www.ruby-forum.com/\.

Wes Gamble wrote:

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/" ?

The entire regexp matches everything up to the first slash *and* the slash. Recall, *this* is the regexp, and it matches a trailing slash:

     (.*)?\/

The first parenthesized group of the regex, however, matches only up to, but not including, the trailing slash. So, if the entire regex matches, you want to get portion of the match that corresponds to the first parenthesized group, which will be stored in $1:

irb(main):009:0* ("example.com/" =~ /(.*)?\//) && $1
=> "example.com"

Or, you can capture the regex-matching operation's result as a MatchData object and query it to retrieve the desired portion:

irb(main):010:0> if matchdata = /(.*)?\//.match("example.com/")
irb(main):011:1> matchdata[1]
irb(main):012:1> end
=> "example.com"

Or, you can use the zero-width positive lookahead regexp extension to make sure the entire regexp matches only what you want. Then you can use the entire match as your result:

irb(main):024:0* if matchdata = /.*?(?=\/)/.match("example.com/")
irb(main):025:1> matchdata.to_s
irb(main):026:1> end
=> "example.com"

Cheers,
Tom

Hi --

···

On Tue, 18 Apr 2006, Wes Gamble wrote:

unknown wrote:

Hi --

On Tue, 18 Apr 2006, Wes Gamble wrote:

If presented with
How can I just get the part of the string before the slash?

How about:

   /[^\/]+/ # match one or more non-slash characters

That works well.

Can you explain to me why

/(.*)?\//

matched the text in front of the "/" AND the "/"

though

?

Because you told it to :slight_smile: You've got a / in the pattern.

(Keep in mind that the *whole* pattern matches, not just the part in
parentheses. The parentheses are just for capturing submatches.)

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" PDF now on sale! Ruby for Rails
Paper version coming in early May!