Finding filename from a URL

Sam_Fent · 4 January 2009 16:29

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

···

--
Posted via http://www.ruby-forum.com/.

Jan-Erik_R · 4 January 2009 16:30

Sam Fent schrieb:

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

File.basename("http://www.example.com/x/y/z/myfile.txt"\)
works perfectly for urls

Tim_Hunter4 · 4 January 2009 16:35

Sam Fent wrote:

Hi all,

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Any help would be much appreciated,
Thanks!

Sam

$ irb
irb(main):001:0> x = "http://www.example.com/x/y/z/myfile.txt"
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> File.basename(x)
=> "myfile.txt"
irb(main):003:0> File.basename(x, '.txt')
=> "myfile"

···

--
RMagick: http://rmagick.rubyforge.org/

Robert_K1 · 4 January 2009 18:04

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

robert

···

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

Sam_Fent · 4 January 2009 18:59

Jan-Erik R. wrote:

Sam Fent schrieb:

Sam

File.basename("http://www.example.com/x/y/z/myfile.txt"\)
works perfectly for urls

Thanks a lot! I added ".txt" to the arguments of File.basename to get
rid of the filetype, but besides that, that was what I was looking for.

Thanks!

···

--
Posted via http://www.ruby-forum.com/\.

Rob_Biedenharn1 · 4 January 2009 20:46

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Kind regards

robert

Rather than jump to a Regexp, just use the right tool for the job.

require 'uri'

=> true

u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'

=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>

u.path

=> "/x/y/z/myfile.txt"

File.basename u.path, '.txt'

=> "myfile"

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

Robert_K1 · 4 January 2009 21:44

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

robert

···

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.

> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

Rob_Biedenharn1 · 4 January 2009 21:58

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.
> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

robert

I guess it depends on what your url might look like. For example, if it contains a query string:

str = 'http://a.b.c/root/sub/dir/file?param=a'

=> "http://a.b.c/root/sub/dir/file?param=a"

File.basename str

=> "file?param=a"

Oops! File.basename just doesn't fit.

require 'uri'

=> true

url = URI.parse(str)

=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>

url.path

=> "/root/sub/dir/file"

File.basename url.path

=> "file"

The OP will have to make the final tool selection, but there may be lurkers that have similar problems who find URI a better fit than File.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

Robert_K1 · 5 January 2009 17:45

Certainly. I do have to say that I get the impression we are talking a bit past each other. I wasn't advocating to use File.basename at all - not alone and not in combination with URI!

For the URL with query part I would still rather do

name = URI.parse(str).path[%r{[^/]+\z}]

Kind regards

robert

···

On 04.01.2009 22:58, Rob Biedenharn wrote:

On Jan 4, 2009, at 4:44 PM, Robert Klemme wrote:

On 04.01.2009 21:46, Rob Biedenharn wrote:

On Jan 4, 2009, at 1:04 PM, Robert Klemme wrote:

On 04.01.2009 17:29, Sam Fent wrote:

This is just a basic parsing question, really. I'm trying to work out
how I would process a URL such as
"http://www.example.com/x/y/z/myfile.txt" and get back the filename
"myfile". Basically the pattern is to get the past part of the string
after the final /, and then strip off the filetype.

IMHO it is not a good idea to use a File method for URL's because File.basename has different criteria

irb(main):003:0> File.basename 'http://test.com/aaa\\bbb.txt'
=> "bbb.txt"

Although I am not sure whether a backslash is allowed there, this is what I'd do:

irb(main):001:0> url = 'http://www.example.com/x/y/z/myfile.txt'
=> "http://www.example.com/x/y/z/myfile.txt"
irb(main):002:0> name = url[%r{[^/]+\z}]
=> "myfile.txt"

Rather than jump to a Regexp, just use the right tool for the job.
> require 'uri'
=> true
> u=URI.parse 'http://www.example.com/x/y/z/myfile.txt'
=> #<URI::HTTP:0x1cac14 URL:http://www.example.com/x/y/z/myfile.txt>
> u.path
=> "/x/y/z/myfile.txt"
> File.basename u.path, '.txt'
=> "myfile"

I considered URI as well but what makes your code the "right tool for the job"? Basically you use URI only to extract the path and then use File.basename to get the last bit of the path. But: while the URI path consists of elements separated by "/", File.basename also considers "\\" as delimiter. So IMHO it is by no means "the right tool" - at least not more than using a regular expression which extracts exactly the part needed from the string at hand (and is likely faster as well).

The situation would be different if URI provided a method which returns the last path element but as far as I can see this does not exist.

Kind regards

robert

I guess it depends on what your url might look like. For example, if it contains a query string:

> str = 'http://a.b.c/root/sub/dir/file?param=a'
=> "http://a.b.c/root/sub/dir/file?param=a"
> File.basename str
=> "file?param=a"

Oops! File.basename just doesn't fit.

> require 'uri'
=> true
> url = URI.parse(str)
=> #<URI::HTTP:0x1c8446 URL:http://a.b.c/root/sub/dir/file?param=a>
> url.path
=> "/root/sub/dir/file"
> File.basename url.path
=> "file"

The OP will have to make the final tool selection, but there may be lurkers that have similar problems who find URI a better fit than File.

--
remember.guy do |as, often| as.you_can - without end

Topic		Replies	Views
Working with String ruby-talk	5	83	18 August 2007
How do I parse a string to find a URL? ruby-talk	7	117	18 September 2007
Non-static way to get the filename? ruby-talk	2	102	9 May 2006
[webrick] How to extract filename from a request? ruby-talk	4	103	13 September 2004
Is there a Ruby library that does HTML entity parsing? ruby-talk	0	121	28 April 2005

Finding filename from a URL

Related topics