Validating an Image file is an image file

I know how to validate a file based only on the file name dot extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc... is not enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a file of a given image type? Is there file header info to look for ?
Such as a particular byte sequence at a particular location in the file?

John Joyce

The canonical solution is to delegate to this library:

   http://grub.ath.cx/filemagic/

-- fxn

···

El Jul 18, 2007, a las 4:25 PM, John Joyce escribió:

I know how to validate a file based only on the file name dot extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc... is not enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a file of a given image type? Is there file header info to look for ?

I know how to validate a file based only on the file name dot extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc... is not enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a file of a given image type? Is there file header info to look for ?
Such as a particular byte sequence at a particular location in the file?

John Joyce

Use the unix file command `file #{file_name}`

example:

> file the.gif
the.gif: GIF image data, version 89a, 91 x 91

···

On Jul 18, 2007, at 10:25 , John Joyce wrote:

--
Wayne E. Seguin
Sr. Systems Architect & Systems Admin
wayneseguin@gmail.com

The file command and bindings to it are OK, but results are not consistent across common image file types. What's worse is that the code would be unportable. Ideally, the solution would rely simply on the file format internally and thus be portable.

···

On Jul 18, 2007, at 9:33 AM, Wayne E. Seguin wrote:

On Jul 18, 2007, at 10:25 , John Joyce wrote:

I know how to validate a file based only on the file name dot extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc... is not enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a file of a given image type? Is there file header info to look for ?
Such as a particular byte sequence at a particular location in the file?

John Joyce

Use the unix file command `file #{file_name}`

example:

> file the.gif
the.gif: GIF image data, version 89a, 91 x 91

--
Wayne E. Seguin
Sr. Systems Architect & Systems Admin
wayneseguin@gmail.com

John Joyce wrote:

I know how to validate a file based only on the file name dot extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc... is not enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a file of a given image type? Is there file header info to look for ?
Such as a particular byte sequence at a particular location in the file?

John Joyce

Use the unix file command `file #{file_name}`

example:

> file the.gif
the.gif: GIF image data, version 89a, 91 x 91

--
Wayne E. Seguin
Sr. Systems Architect & Systems Admin
wayneseguin@gmail.com

The file command and bindings to it are OK, but results are not consistent across common image file types. What's worse is that the code would be unportable. Ideally, the solution would rely simply on the file format internally and thus be portable.

If you know it's an image file, then ImageMagick's identify command will probably do what you need, especially with the --verbose switch. I think you get the same info from Magick::Image#inspect, if RMagick's an option for you.

···

On Jul 18, 2007, at 9:33 AM, Wayne E. Seguin wrote:

On Jul 18, 2007, at 10:25 , John Joyce wrote:

--
Alex

Well, looks like RMagick can do this for me.
For a minute I was starting to fear reading specs on formats, but perhaps not.

John Joyce

<snip>

The file command and bindings to it are OK, but results are not
consistent across common image file types. What's worse is that the
code would be unportable. Ideally, the solution would rely simply on
the file format internally and thus be portable.

How's this for a portable version? I based this on a 5 minute overview
of the Wikipedia content at Magic number (programming) - Wikipedia,
combined with some trial and error.

class File
   def self.image?(file)
      bmp?(file) || jpg?(file) || png?(file) || gif?(file)
   end

   def self.bmp?(file)
      IO.read(file, 3) == "BM6" && File.extname(file).downcase ==
'.bmp'
   end

   def self.jpg?(file)
      IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
File.extname(file).downcase == '.jpg'
   end

   def self.png?(file)
      IO.read(file, 4) == "\211PNG" && File.extname(file).downcase ==
'.png'
   end

   def self.gif?(file)
      ['GIF89a', 'GIF97a'].include?(IO.read(file, 6)) &&
File.extname(file).downcase == '.gif'
   end
end

Regards,

Dan

···

On Jul 18, 10:20 am, John Joyce <dangerwillrobinsondan...@gmail.com> wrote:

Cool, Dan, that looks pretty slick. I certainly couldn't have done that so quickly, but I knew what basically was needed. That looks like about what I was thinking of conceptually. Very nice and clean code! Nice touch with downcasing file extensions too! I hate that cameras all like to upcase filenames... (> <)!
That will probably run a lot lighter than RMagick methods, I'll try it out later. I've been fiddling with the RMagick API all afternoon. There's a lot of documentation but some of it is lacking in clear examples.
At this point I know how to scale images and keep the aspect ratio within a maximum new size, but I'd like to have the final output be square with matting on the sides for portrait orientation, matting on the top and bottom for landscape orientation, either one with the scaled image centered.
Will this require subsequent compositing after scaling? Or did I miss a method somewhere in the docs that does all of this at once?

John Joyce

···

On Jul 18, 2007, at 4:13 PM, Daniel Berger wrote:

On Jul 18, 10:20 am, John Joyce <dangerwillrobinsondan...@gmail.com> > wrote:

<snip>

The file command and bindings to it are OK, but results are not
consistent across common image file types. What's worse is that the
code would be unportable. Ideally, the solution would rely simply on
the file format internally and thus be portable.

How's this for a portable version? I based this on a 5 minute overview
of the Wikipedia content at Magic number (programming) - Wikipedia,
combined with some trial and error.

class File
   def self.image?(file)
      bmp?(file) || jpg?(file) || png?(file) || gif?(file)
   end

   def self.bmp?(file)
      IO.read(file, 3) == "BM6" && File.extname(file).downcase ==
'.bmp'
   end

   def self.jpg?(file)
      IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
File.extname(file).downcase == '.jpg'
   end

   def self.png?(file)
      IO.read(file, 4) == "\211PNG" && File.extname(file).downcase ==
'.png'
   end

   def self.gif?(file)
      ['GIF89a', 'GIF97a'].include?(IO.read(file, 6)) &&
File.extname(file).downcase == '.gif'
   end
end

Regards,

Dan

I've not had time to test it out or kick the tires yet, but I've dubbed it 'The Daniel Berger Detector'

John Joyce

daniel_berger_detector.rb (1.46 KB)

···

On Jul 18, 2007, at 4:13 PM, Daniel Berger wrote:

On Jul 18, 10:20 am, John Joyce <dangerwillrobinsondan...@gmail.com> > wrote:

<snip>

The file command and bindings to it are OK, but results are not
consistent across common image file types. What's worse is that the
code would be unportable. Ideally, the solution would rely simply on
the file format internally and thus be portable.

How's this for a portable version? I based this on a 5 minute overview
of the Wikipedia content at Magic number (programming) - Wikipedia,
combined with some trial and error.

class File
   def self.image?(file)
      bmp?(file) || jpg?(file) || png?(file) || gif?(file)
   end

   def self.bmp?(file)
      IO.read(file, 3) == "BM6" && File.extname(file).downcase ==
'.bmp'
   end

   def self.jpg?(file)
      IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
File.extname(file).downcase == '.jpg'
   end

   def self.png?(file)
      IO.read(file, 4) == "\211PNG" && File.extname(file).downcase ==
'.png'
   end

   def self.gif?(file)
      ['GIF89a', 'GIF97a'].include?(IO.read(file, 6)) &&
File.extname(file).downcase == '.gif'
   end
end

Regards,

Dan

<snip>

   def self.jpg?(file)
      IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
File.extname(file).downcase == '.jpg'
   end

Ack! The proper abbreviation, and thus also the filename extension, is
'jpeg'. 'jpg' is an abomination introduced by the same people who introduced
the *shudder* '.htm' filename extension.

R.

···

2007/7/18, Daniel Berger <djberg96@gmail.com>:

John Joyce wrote:

Cool, Dan, that looks pretty slick. I certainly couldn't have done that so quickly, but I knew what basically was needed. That looks like about what I was thinking of conceptually. Very nice and clean code! Nice touch with downcasing file extensions too! I hate that cameras all like to upcase filenames... (> <)!
That will probably run a lot lighter than RMagick methods, I'll try it out later. I've been fiddling with the RMagick API all afternoon. There's a lot of documentation but some of it is lacking in clear examples.
At this point I know how to scale images and keep the aspect ratio within a maximum new size, but I'd like to have the final output be square with matting on the sides for portrait orientation, matting on the top and bottom for landscape orientation, either one with the scaled image centered.
Will this require subsequent compositing after scaling? Or did I miss a method somewhere in the docs that does all of this at once?

John Joyce

Yes, you'll need to composite the scaled image on top of the background of your desired size. See my article "Alpha Compositing - Part 1" [http://rmagick.rubyforge.org/src_over.html\]. If you're making a lot of thumbnails you might also be interested in "Making Thumbnails with RMagick" [http://rmagick.rubyforge.org/resizing-methods.html\], which compares the performance of all the RMagick resizing methods.

Also if you'll tell me which parts of the documentation are lacking in clear examples I'll see what I can do to fix it. You can always open a documentation bug in the RMagick bug tracker on RubyForge.

···

--
RMagick OS X Installer [http://rubyforge.org/projects/rmagick/\]
RMagick Hints & Tips [http://rubyforge.org/forum/forum.php?forum_id=1618\]
RMagick Installation FAQ [http://rmagick.rubyforge.org/install-faq.html\]

Be that as it may, you will see both extensions in practice. So, the
code should probably be refactored to be:

def self.jpeg?(file)
   IO.read(file, 10) == "\377\330\377\340\000\020JFIF" && ['.jpg',
'.jpeg'].include?(File.extname(file).downcase)
end

alias jpg? jpeg?

Regards,

Dan

···

On Jul 19, 5:05 am, "Raf Coremans" <rra...@gmail.com> wrote:

2007/7/18, Daniel Berger <djber...@gmail.com>:
<snip>

> def self.jpg?(file)
> IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
> File.extname(file).downcase == '.jpg'
> end

Ack! The proper abbreviation, and thus also the filename extension, is
'jpeg'. 'jpg' is an abomination introduced by the same people who introduced
the *shudder* '.htm' filename extension.

Regardless, .JPG is what you get from most digital cameras! Nothing is ever guaranteed forever...!
As for .htm it is meaningless, as is .html
All that matters is the .conf or .htaccess declaration, and the mime type(s).
html/xhtml is often served with .php, .pl, .rhtml, .py etc...

The point of all of this is the same as my OP: file extensions are meaningless. Only Windows and poorly written apps really rely on them.
A file is a file is a file. It's what's inside that matters. (usually) The extensions are intended for humans to easily identify a file and to help extend the name spaces. In addition it makes some processing easier when looking for particular extensions ( C files for example with .h and .c )
Anticipating particular extensions is fine, But checking should still occur.
extensions do get deleted or munged. Especially at the hands of users clicking on things in a desktop GUI.

John Joyce

···

On Jul 19, 2007, at 6:05 AM, Raf Coremans wrote:

2007/7/18, Daniel Berger <djberg96@gmail.com>:
<snip>

   def self.jpg?(file)
      IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
File.extname(file).downcase == '.jpg'
   end

Ack! The proper abbreviation, and thus also the filename extension, is
'jpeg'. 'jpg' is an abomination introduced by the same people who introduced
the *shudder* '.htm' filename extension.

R.

<snip>

I've not had time to test it out or kick the tires yet, but I've
dubbed it 'The Daniel Berger Detector'

Heh. :slight_smile:

Well, I've officially added it to ptools, release 1.1.5, which I just
put out. So, now you can do:

require 'ptools'

File.image?(file)

Regards,

Dan

···

On Jul 18, 9:46 pm, John Joyce <dangerwillrobinsondan...@gmail.com> wrote:

Also very cool. Thanks.
I will check that stuff out.
I guess it's just a bit much the first time looking at the API.
The main thing I found a bit vague was creating a new Image or ImageList instance.
ImageList responds basically as I would expect, but Image doesn't .
Initially I was trying:
img = Image.read('redzigzag.jpg')
Which apparently creates something similar to ImageList.
But when trying to use Image methods, I kept getting an error about ...private method ... called for ....[...]:array
(the elipses are just leaving out specifics)
What I didn't realize I was missing was the [0] in:
img = Image.read('redzigzag.jpg')[0]
I'm still not quite sure how that is working as part of that statement really, but I know how to get it going anyway.
I can certainly see the convenience of using ImageList to batch a bunch of images, but I figured that Image was fine for working out the logic and flow of what I want to do first.

The only other truly confusing thing was lack of examples in many method definitions. This is just me, I can "get it" faster when I have a method definition and example with dummy data plugged in.

BTW, the RVG stuff looks pretty interesting. I didn't have time to really dig deep into that so much, just skimming.

Overall, I especially like the simplicity of reading and writing files with RMagick. I'm just glad I can do it with Ruby instead of PHP, because Ruby is much easier for me to get a clear picture in my mind of what I'm looking at.

···

On Jul 18, 2007, at 7:28 PM, Tim Hunter wrote:

John Joyce wrote:

Cool, Dan, that looks pretty slick. I certainly couldn't have done that so quickly, but I knew what basically was needed. That looks like about what I was thinking of conceptually. Very nice and clean code! Nice touch with downcasing file extensions too! I hate that cameras all like to upcase filenames... (> <)!
That will probably run a lot lighter than RMagick methods, I'll try it out later. I've been fiddling with the RMagick API all afternoon. There's a lot of documentation but some of it is lacking in clear examples.
At this point I know how to scale images and keep the aspect ratio within a maximum new size, but I'd like to have the final output be square with matting on the sides for portrait orientation, matting on the top and bottom for landscape orientation, either one with the scaled image centered.
Will this require subsequent compositing after scaling? Or did I miss a method somewhere in the docs that does all of this at once?

John Joyce

Yes, you'll need to composite the scaled image on top of the background of your desired size. See my article "Alpha Compositing - Part 1" [http://rmagick.rubyforge.org/src_over.html\]. If you're making a lot of thumbnails you might also be interested in "Making Thumbnails with RMagick" [http://rmagick.rubyforge.org/resizing-methods.html\], which compares the performance of all the RMagick resizing methods.

Also if you'll tell me which parts of the documentation are lacking in clear examples I'll see what I can do to fix it. You can always open a documentation bug in the RMagick bug tracker on RubyForge.

Whoops, that should be:

class << self
   alias jpg? jpeg?
end

Regards,

Dan

···

On Jul 19, 7:06 am, Daniel Berger <djber...@gmail.com> wrote:

On Jul 19, 5:05 am, "Raf Coremans" <rra...@gmail.com> wrote:

> 2007/7/18, Daniel Berger <djber...@gmail.com>:
> <snip>

> > def self.jpg?(file)
> > IO.read(file, 10) == "\377\330\377\340\000\020JFIF" &&
> > File.extname(file).downcase == '.jpg'
> > end

> Ack! The proper abbreviation, and thus also the filename extension, is
> 'jpeg'. 'jpg' is an abomination introduced by the same people who introduced
> the *shudder* '.htm' filename extension.

Be that as it may, you will see both extensions in practice. So, the
code should probably be refactored to be:

def self.jpeg?(file)
   IO.read(file, 10) == "\377\330\377\340\000\020JFIF" && ['.jpg',
'.jpeg'].include?(File.extname(file).downcase)
end

alias jpg? jpeg?

# require 'ptools'
# File.image?(file)

i like the image routines, but could it be made extendible? maybe a template where we can add file info/properties easily like..

cat /temp/image_template

bmp BM6
jpg,jpeg \377\330\377\340\000\020JFIF
png \211PNG
gif GIF89a
gif GIF97a
....

i've updated my ptools to 1.5 and am looking at ptools.rb. but i have concern, are you sure you like to add those extra methods like .jpg? .png?, etc?
i find too many methods already in ruby. You have already image?, would it be ok if image? return the image type like "jpg" eg, and nil if it's not? like,

File.image?("test.jpg") => "jpg"

also, image? should not be extension dependent since i rename some files here =)

File.image?("test.jpg.renamed") => "jpg"

File.image?("justadatafile.data") => nil

kind regards -botp

···

From: Daniel Berger [mailto:djberg96@gmail.com]

Hey Mr. Hunter,
I'm using RMagick semi-successfully now, in a dumb Rails app, works like a charm with jpg files, but png files give me a Corrupt image `path/to/image.png' while using the same set of code as with jpg files:

     img = Image.read(image_to_alter)[0]

This is the line called out with the error.

Is it possible, since this was created with Photoshop CS2, that the png is somehow different from what RMagick expects to read? or is there something I'm missing with the fact that it contains a transparency?

Regards,
John Joyce

···

On Jul 18, 2007, at 7:28 PM, Tim Hunter wrote:

Yes, you'll need to composite the scaled image on top of the background of your desired size. See my article "Alpha Compositing - Part 1" [http://rmagick.rubyforge.org/src_over.html\]. If you're making a lot of thumbnails you might also be interested in "Making Thumbnails with RMagick" [http://rmagick.rubyforge.org/resizing-methods.html\], which compares the performance of all the RMagick resizing methods.

Also if you'll tell me which parts of the documentation are lacking in clear examples I'll see what I can do to fix it. You can always open a documentation bug in the RMagick bug tracker on RubyForge.

--
RMagick OS X Installer [http://rubyforge.org/projects/rmagick/\]
RMagick Hints & Tips [http://rubyforge.org/forum/forum.php?forum_id=1618\]
RMagick Installation FAQ [http://rmagick.rubyforge.org/install-faq.html\]

> John Joyce wrote:
>> Cool, Dan, that looks pretty slick. I certainly couldn't have done
>> that so quickly, but I knew what basically was needed. That looks
>> like about what I was thinking of conceptually. Very nice and
>> clean code! Nice touch with downcasing file extensions too! I hate
>> that cameras all like to upcase filenames... (> <)!
>> That will probably run a lot lighter than RMagick methods, I'll
>> try it out later. I've been fiddling with the RMagick API all
>> afternoon. There's a lot of documentation but some of it is
>> lacking in clear examples.
>> At this point I know how to scale images and keep the aspect ratio
>> within a maximum new size, but I'd like to have the final output
>> be square with matting on the sides for portrait orientation,
>> matting on the top and bottom for landscape orientation, either
>> one with the scaled image centered.
>> Will this require subsequent compositing after scaling? Or did I
>> miss a method somewhere in the docs that does all of this at once?

>> John Joyce

> Yes, you'll need to composite the scaled image on top of the
> background of your desired size. See my article "Alpha Compositing
> - Part 1" [http://rmagick.rubyforge.org/src_over.html\]. If you're
> making a lot of thumbnails you might also be interested in "Making
> Thumbnails with RMagick" [http://rmagick.rubyforge.org/resizing-
> methods.html], which compares the performance of all the RMagick
> resizing methods.

> Also if you'll tell me which parts of the documentation are lacking
> in clear examples I'll see what I can do to fix it. You can always
> open a documentation bug in the RMagick bug tracker on RubyForge.

Also very cool. Thanks.
I will check that stuff out.
I guess it's just a bit much the first time looking at the API.
The main thing I found a bit vague was creating a new Image or
ImageList instance.
ImageList responds basically as I would expect, but Image doesn't .
Initially I was trying:
img = Image.read('redzigzag.jpg')
Which apparently creates something similar to ImageList.
But when trying to use Image methods, I kept getting an error
about ...private method ... called for ....[...]:array
(the elipses are just leaving out specifics)
What I didn't realize I was missing was the [0] in:
img = Image.read('redzigzag.jpg')[0]
I'm still not quite sure how that is working as part of that
statement really, but I know how to get it going anyway.

Because the file can contain multiple images (say, in the case of an
animated GIF or a multi-layer Photoshop image), the Image.read method
returns an array with an element for each image in the file. By adding
[0] you're simply saying "the first image in the array."

Perhaps I need to emphasize this more in the doc. I'll see what I can
do.

I can certainly see the convenience of using ImageList to batch a
bunch of images, but I figured that Image was fine for working out
the logic and flow of what I want to do first.

There's a lot of overlap between ImageList and Image. ImageList is
good if you're working with animations or layers, otherwise it doesn't
offer much. I hardly ever use it. I've often thought it would've been
smarter to just have one class, named Image but with the properties of
ImageList, but that's water under the dam.

The only other truly confusing thing was lack of examples in many
method definitions. This is just me, I can "get it" faster when I
have a method definition and example with dummy data plugged in.

So do I. In fact, there are some who say that RMagick already has too
many examples, especially when they're waiting for them to run during
the install. :slight_smile: I look at every method to see whether it really needs
its own example, or whether it's sufficiently similar to other methods
that someone with reasonable familiarity with RMagick can figure out
how to use it. Of course I can be wrong. Again, if you have a list of
methods that you think need an example, let me know.

BTW, the RVG stuff looks pretty interesting. I didn't have time to
really dig deep into that so much, just skimming.

Overall, I especially like the simplicity of reading and writing
files with RMagick. I'm just glad I can do it with Ruby instead of
PHP, because Ruby is much easier for me to get a clear picture in my
mind of what I'm looking at.

Thanks for taking the time to post your impressions. If you're really
interested in RMagick, there are a number of books that include
tutorials or recipes. I've posted their names on the RMagick home
page. Hal Fulton's _The_Ruby_Way_ is particularly thorough.

I hope you enjoy using RMagick!

···

On Jul 18, 9:47 pm, John Joyce <dangerwillrobinsondan...@gmail.com> wrote:

On Jul 18, 2007, at 7:28 PM, Tim Hunter wrote:

From: Daniel Berger [mailto:djber...@gmail.com]
# require 'ptools'
# File.image?(file)

i like the image routines, but could it be made extendible? maybe a template where we can add file info/properties easily like..

>cat /temp/image_template

bmp BM6
jpg,jpeg \377\330\377\340\000\020JFIF
png \211PNG
gif GIF89a
gif GIF97a
....

I'd rather not. Based on the information I read, those templates don't
change (for the file formats I support anyway). I'm not sure what the
point would be, and it would be more work that I want to avoid. :slight_smile:

i've updated my ptools to 1.5 and am looking at ptools.rb. but i have concern, are you sure you like to add those extra methods like .jpg? .png?, etc?

They're private.

i find too many methods already in ruby. You have already image?, would it be ok if image? return the image type like "jpg" eg, and nil if it's not? like,

File.image?("test.jpg") => "jpg"

But, the '?' indicates a boolean method. I'd rather not.

also, image? should not be extension dependent since i rename some files here =)

True, but the method I implemented is meant as a poor man's
replacement for filemagic, to deal with the more likely and common
cases. I want to keep it simple. If you want a more robust and
technically more accurate way to detect images, use filemagic
instead. :slight_smile:

Regards,

Dan

···

On Jul 19, 8:14 pm, Peña, Botp <b...@delmonte-phil.com> wrote: