FileTest.size? vs du -b

I’ve noticed something curious. I was throwing together a small script
when I noticed that FileTest.size? give a defferent byte size than du -b on
the same file. Here’s the source:

require 'find'

def tree_size( file )
    result = 0
    
    Find.find (file) do |x|
        r = FileTest.size?( x )
        puts "#{r}\t#{x}"
        result += r
    end
    
    result
end

puts File.size( ARGV[0] )
puts tree_size( ARGV[0] )

And here’s the output I get:

$ ruby ~/text.rb test_dir/

4096
4096    test_dir/
5805035 test_dir//file 4 of 4.foo
7388160 test_dir//file 3 of 4.foo
7444001 test_dir//file 2 of 4.foo
7448001 test_dir//file 1 of 4.foo
28089293

Out of curiousity, I compared it to what du would return:

$ du -sbc test_dir:

7462912 test_dir/file 1 of 4.foo
7458816 test_dir/file 2 of 4.foo
7401472 test_dir/file 3 of 4.foo
5820416 test_dir/file 4 of 4.foo
28143616        total

Er… something doesn’t add up. BTW, when I verify the filesizes with ls -
al, it matches what FileTest.size returns. Does anyone know what’s going
on here? Is du inaccurate?

Have a Nice Day,
-Curious.

Curious Person wrote:

Er… something doesn’t add up. BTW, when I verify the filesizes with ls -
al, it matches what FileTest.size returns. Does anyone know what’s going
on here? Is du inaccurate?

‘du’ reports the disk usage, or how much space on the disk is allocated
to this file. That will almost always be larger than the actual file
size, depending on how the underlying file system works.

> Er... something doesn't add up. BTW, when I verify the filesizes with ls - > al, it matches what FileTest.size returns. Does anyone know what's going > on here? Is du inaccurate?

du → ‘disk usage’

this is the amount of disk used by the file, it will always be a multiple of
the block size whether or not you ask du to show this in bytes or whatever

FileTest.size reports the actual size of the file as reported by stat. this
is the actual length of the file.

so accurate depends on you viewpoint : if you are concerned about the total
size of files on disk the du is more accurate, if you simply want to know the
size of a file to, for example, read it into a malloc’d array then stat
(FileTest.size) is the way to go. all you can really count on is that

du file >= stat.size file

-a

···

On Tue, 28 Jan 2003, Curious Person wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Wed, 29 Jan 2003 01:42:03 +0900, ahoward ahoward@fsl.noaa.gov pisze:

all you can really count on is that

du file >= stat.size file

You can’t because on some filesystems files can have physical holes
in them which read as 0 and don’t take space (seek & write to create
such a file).

···


__("< Marcin Kowalczyk
__/ qrczak@knm.org.pl
^^ Blog człowieka poczciwego.

ahoward ahoward@fsl.noaa.gov wrote in
news:Pine.LNX.4.33.0301281641050.8727-100000@eli.fsl.noaa.gov:

> > so accurate depends on you viewpoint : if you are concerned about the > total size of files on disk the du is more accurate, if you simply > want to know the size of a file to, for example, read it into a > malloc'd array then stat (FileTest.size) is the way to go.

Well, what I eventually want to do is have a script that fills CDRs up to
capacity for mkisofs to burn (but not over). Therefore, I need a way to
measure size to ensure a good fit on the disc. I’m not sure which would be
better; blocks or bytes, though. Would FileTest.size would be the most
appropriate method to use?

Thanks again,
-C

i’m pretty sure that summing FileTest.size for all files bound for CDR would
be the correct way to go, at least that’s how i programmed some backups here
at work. i don’t really know how the iso filesystem works regarding block
size (or if the concept even applies) but, as lyle pointed out, filesystems do
differ.

i think that

total_size = 0
files.map {|f| total_size += f.size}

if (total_size < CDR.size)
#burn
end

will work.

-a

···

On Tue, 28 Jan 2003, Curious Person wrote:

ahoward ahoward@fsl.noaa.gov wrote in
news:Pine.LNX.4.33.0301281641050.8727-100000@eli.fsl.noaa.gov:

> > so accurate depends on you viewpoint : if you are concerned about the > total size of files on disk the du is more accurate, if you simply > want to know the size of a file to, for example, read it into a > malloc'd array then stat (FileTest.size) is the way to go.

Well, what I eventually want to do is have a script that fills CDRs up to
capacity for mkisofs to burn (but not over). Therefore, I need a way to
measure size to ensure a good fit on the disc. I’m not sure which would be
better; blocks or bytes, though. Would FileTest.size would be the most
appropriate method to use?

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Curious Person wrote:

Well, what I eventually want to do is have a script that fills CDRs up to
capacity for mkisofs to burn (but not over). Therefore, I need a way to
measure size to ensure a good fit on the disc. I’m not sure which would be
better; blocks or bytes, though. Would FileTest.size would be the most
appropriate method to use?

Measuring in blocks for the target filesystem would be the correct way.
The blocks given by the source filesystem can’t be trusted, since the
block size might be bigger (waste space) or smaller (might mess up the
total size).

So take in the filesize, as FileTest.size reports, round upwards to
nearest block size as defined by the target filesystem (IOS9660?) and
use that number to fill it to bursting point.

berlios - Open Source Software News & Infos

Uhm, would this be a bad time to mention that this sounds like the
knapsack problem? :stuck_out_tongue:

HTH

···


([ Kent Dahl ]/)_ ~ [ http://www.stud.ntnu.no/~kentda/ ]/~
))_student
/(( _d L b_/ NTNU - graduate engineering - 5. year )
( __õ|õ// ) )Industrial economics and technological management(
_
/ö____/ (_engineering.discipline=Computer::Technology)

Kent Dahl kentda@stud.ntnu.no wrote in
news:3E36C62D.9096E98@stud.ntnu.no:

So take in the filesize, as FileTest.size reports, round upwards to
nearest block size as defined by the target filesystem (IOS9660?) and
use that number to fill it to bursting point.

berlios - Open Source Software News & Infos

Good idea regarding the blocks. Thank you for that very helpful link.

Uhm, would this be a bad time to mention that this sounds like the
knapsack problem? :stuck_out_tongue:

Naturally, it’s going to be a (somewhat modified) knapsack, but first I
need to get the sizes of the directory trees so it knows how big everything
is [and in the best size for the target device (CD, DiskOnKey, Zip,
whatever].

Thanks to everybody for their very kind help!
-C