Using a Regex to extract the domain from a URL

Adam_Wenham · 5 June 2014 15:03

Hi guys,

I'm having some problems on good old Codewars, writing a method that can
take a URL and return just the domain.

I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA)
that passes my tests, but I'm having trouble implementing it properly.

Here are my tests:
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"),
"github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"),
"zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")

Here's my method:
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end

As far as I can tell, this should work. Any ideas on what I'm doing wrong?
Thanks!

···

--
== People often come up to me and ask "What the heck are you doing in my
shed!?" ==

Stu1 · 5 June 2014 16:46

/^https?:\/\/(www.)?[a-zA-Z0-9_-]*\.(com|net|org)\/?((([a-zA-Z\/0-9_-])+)?)$/

···

On Thu, Jun 5, 2014 at 10:03 AM, Adam Wenham <adamwenham64@gmail.com> wrote:

Hi guys,

I'm having some problems on good old Codewars, writing a method that can
take a URL and return just the domain.

I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA\)
that passes my tests, but I'm having trouble implementing it properly.

Here are my tests:
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"\),
"github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"),
"zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")

Here's my method:
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end

As far as I can tell, this should work. Any ideas on what I'm doing wrong?
Thanks!

--
== People often come up to me and ask "What the heck are you doing in my
shed!?" ==

Panagiotis_Atmatzidi · 5 June 2014 15:44

Hello,

···

On 5 Ιουν 2014, at 17:03 , Adam Wenham <adamwenham64@gmail.com> wrote:

Hi guys,

I'm having some problems on good old Codewars, writing a method that can take a URL and return just the domain.

I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA\) that passes my tests, but I'm having trouble implementing it properly.

Here are my tests:
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"\), "github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"), "zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")

Here's my method:
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end

As far as I can tell, this should work. Any ideas on what I'm doing wrong? Thanks!

I tested your code and returns 'https://' too, so your regexp is wrong:
--
$ cat test.rb&& ruby test.rb

def domain_name(url)
puts url.match(/https*:\/\/w*\.*(\w*\-*\w*)./).to_s
end

list = %w{GitHub - carbonfive/raygun: Rails application generator that builds applications with the common customization stuff already done. http://www.zombie-bites.com https://www.cnet.com}

list.each {|x| domain_name(x)}

=> https://github.
=> http://www.zombie-bites.
=> https://www.cnet.
--

You could adjust the regexp to match a given set of domains domain names[1]. But the problem is unsolvable using regular expressions[2]. In theory you should create a list with all the available domain names, (maybe a google search will even give you a TXT file) and then write a complicate set of instructions to match those. Only this way you might get a complete solution IMHO.

Best regards,

[1] regex match main domain name - Stack Overflow

[2] regex match main domain name - Stack Overflow

Panagiotis (atmosx) Atmatzidis

email: atma@convalesco.org
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

"As you set out for Ithaca, hope the voyage is a long one, full of adventure, full of discovery [...]" - C. P. Cavafy

Topic		Replies	Views
How to extract domain name without sub domain from url ruby-talk	2	137	23 June 2009
Regex hostnames? ruby-talk	5	158	16 January 2013
Splitting up hostname using Regex ruby-talk	6	116	15 January 2008
Regex that works on rubular.com but not in my program ruby-talk	7	190	26 June 2009
About a regular expression ruby-talk	5	71	27 November 2007

Using a Regex to extract the domain from a URL

Related topics