Finding filename from a URL

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

···

--
Posted via http://www.ruby-forum.com/.

Sam Fent schrieb:

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

File.basename("http://www.example.com/x/y/z/myfile.txt"\)
works perfectly for urls :wink:

Sam Fent wrote:

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

$ irb
irb(main):001:0> x = "http://www.example.com/x/y/z/myfile.txt"
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> File.basename(x)
=> "myfile.txt"
irb(main):003:0> File.basename(x, '.txt')
=> "myfile"

···

--
RMagick: http://rmagick.rubyforge.org/

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

  robert

···

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Jan-Erik R. wrote:

Sam Fent schrieb:

Sam

File.basename("http://www.example.com/x/y/z/myfile.txt"\)
works perfectly for urls :wink:

Thanks a lot! I added ".txt" to the arguments of File.basename to get
rid of the filetype, but besides that, that was what I was looking for.

Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

  robert

Rather than jump to a Regexp, just use the right tool for the job.

require 'uri'

=> true

u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'

=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt&gt;

u.path

=> "/x/y/z/myfile.txt"

File.basename u.path, '.txt'

=> "myfile"

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

  robert

···

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt&quot; and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt&#39;
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt&#39;
=> "http://www.example.com/x/y/z/myfile.txt&quot;
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.

> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt&#39;
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt&gt;
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt&quot; and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt&#39;
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt&#39;
=> "http://www.example.com/x/y/z/myfile.txt&quot;
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.
> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt&#39;
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt&gt;
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

  robert

I guess it depends on what your url might look like. For example, if it contains a query string:

str = 'http://a.b.c/root/sub/dir/file?param=a&#39;

=> "http://a.b.c/root/sub/dir/file?param=a&quot;

File.basename str

=> "file?param=a"

Oops! File.basename just doesn't fit.

require 'uri'

=> true

url = URI.parse(str)

=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a&gt;

url.path

=> "/root/sub/dir/file"

File.basename url.path

=> "file"

The OP will have to make the final tool selection, but there may be lurkers that have similar problems who find URI a better fit than File.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

Certainly. I do have to say that I get the impression we are talking a bit past each other. I wasn't advocating to use File.basename at all - not alone and not in combination with URI!

For the URL with query part I would still rather do

name = URI.parse(str).path[%r{[^/]+\z}]

Kind regards

  robert

···

On 04.01.2009 22:58, Rob Biedenharn wrote:

On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt&quot; and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt&#39;
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt&#39;
=> "http://www.example.com/x/y/z/myfile.txt&quot;
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.
> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt&#39;
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt&gt;
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

  robert

I guess it depends on what your url might look like. For example, if it contains a query string:

> str = 'http://a.b.c/root/sub/dir/file?param=a&#39;
=> "http://a.b.c/root/sub/dir/file?param=a&quot;
> File.basename str
=> "file?param=a"

Oops! File.basename just doesn't fit.

> require 'uri'
=> true
> url = URI.parse(str)
=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a&gt;
> url.path
=> "/root/sub/dir/file"
> File.basename url.path
=> "file"

The OP will have to make the final tool selection, but there may be lurkers that have similar problems who find URI a better fit than File.

--
remember.guy do |as, often| as.you_can - without end