How to extract domain name without sub domain from url

Hi everyone,

Does anyone know how to extract domain name without sub domain from url?

Example: http://test.domain.com => http://domain.com

Please give me an example code in ruby.

Thanks,
Leakhina

···

--
Posted via http://www.ruby-forum.com/.

Chem Leakhina wrote:

Hi everyone,

Does anyone know how to extract domain name without sub domain from url?

Example: http://test.domain.com => http://domain.com

Please give me an example code in ruby.

Thanks,
Leakhina
  
This is actually quite difficult, because there is a multitude of possible second-level domains which can be used (such as .co.uk), and they are not really standardized. Just picking one at random, the country of Jordan has .com.jo, .net.jo, .gov.jo, .edu.jo, .org.jo, .mil.jo, .name.jo, and .sch.jo.

If one were to ignore such things, then it becomes easier:

$ irb
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u = URI.parse "http://test.domain.com/"
=> #<URI::HTTP:0xb7bbf848 URL:Welcome dragndropbuilder.com - BlueHost.com;
irb(main):003:0> u.host
=> "test.domain.com"
irb(main):004:0> u.host.split(".")[-2,2]
=> ["domain", "com"]
irb(main):005:0> u.host.split(".")[-2,2].join(".")
=> "domain.com"

However, as mentioned above, there are a lot of domains this will not work for.

-Justin

We can get better results by ignoring particular known domain prefixes
such as "ftp" and "www":

# this works with 1.8 and 1.9
%w{
  www.google.com
  google.co.uk
  www.google.co.uk
  foo.bar
  }.each do |domain|
  dom = domain.sub(/^(?:www|ftp)\./, '')[/^[^.]+/]
  printf "%p -> %p\n", domain, dom
  # alternative
  dom = domain[/^(?:(?:ftp|www)\.)?([^.]+)/, 1]
  printf "%p -> %p\n", domain, dom
end

Kind regards

robert

···

2009/6/23 Justin Collins <justincollins@ucla.edu>:

Chem Leakhina wrote:

Hi everyone,

Does anyone know how to extract domain name without sub domain from url?

Example: http://test.domain.com => http://domain.com

Please give me an example code in ruby.

Thanks,
Leakhina

This is actually quite difficult, because there is a multitude of possible
second-level domains which can be used (such as .co.uk), and they are not
really standardized. Just picking one at random, the country of Jordan has
.com.jo, .net.jo, .gov.jo, .edu.jo, .org.jo, .mil.jo, .name.jo, and .sch.jo.

If one were to ignore such things, then it becomes easier:

$ irb
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u = URI.parse "Welcome dragndropbuilder.com - BlueHost.com;
=> #<URI::HTTP:0xb7bbf848 URL:Welcome dragndropbuilder.com - BlueHost.com;
irb(main):003:0> u.host
=> "test.domain.com"
irb(main):004:0> u.host.split(".")[-2,2]
=> ["domain", "com"]
irb(main):005:0> u.host.split(".")[-2,2].join(".")
=> "domain.com"

However, as mentioned above, there are a lot of domains this will not work
for.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/