Database vs Filesystem

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn’t want to base64 encode everything. :wink:

Any suggestions would be appreciated.

Regards

Sven Schott

Any suggestions would be appreciated.

did you considered using PStore (a simple file based DB that workds
like an hash, included in ruby)?
What about madeleine?

···

il Tue, 1 Jun 2004 09:46:22 +0900, Sven Schott sven_schott@compnow.com.au ha scritto::

No, you would want to store the file in a BLOB column of your database.
There are no encodings defined on BLOB columns.

To get your files out of the database, you need a CGI script (or
something equivalent), which is for sure slower that direct file access
(especially when written in Ruby). But that depends on what you prefer,
flexibilty or performance, and the size of your files. For many “small”
files, with attached meta-data I’d prefer a database over the file
system.

For example, I stored the article images for an online shop inside a
postgres database. But I havn’t measured the performance. But I can give
you hints how to write the CGI that gets the file out of the db.

Regards,

Michael

···

On Tue, Jun 01, 2004 at 09:46:22AM +0900, Sven Schott wrote:

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn’t want to base64 encode everything. :wink:

“Sven Schott” sven_schott@compnow.com.au schrieb im Newsbeitrag
news:17CE3692-B365-11D8-A071-000A9571732E@compnow.com.au…

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn’t want to base64 encode everything. :wink:

Any suggestions would be appreciated.

Generally speaking a database might be the better solution if you otherwise
ended up with thousands of files per single directory. In this scenario,
file systems typically don’t perform well. I’m not sure about more recent
developments of file systems, since they evolve towards database
functionality (transactions, support for many small objects (i.e. files) and
the like).

You could have a database table like [fileName VARCHAR(1024), fileData
BLOB)] with an index on “fileName” for fast access. Maybe you need an
additional ordering column if your files are so big that they don’t fit a
single blob field of the database you are using. If you want to store meta
data (file creation and modification time, size etc.) you’ll need even more
columns (and / or tables).

Kind regards

robert

Sven Schott wrote:

I am writing a very simple web based file/asset manager. I was initially
thinking of simply creating a database with the info on each file and
keeping the files in the filesystem. However, I have begun to wonder if
having the files in a database would be faster(or slower). Has anybody
had any experience with this sort of thing? Would large databases slow
things down? Should I just keep stuff on the filesystem? I have had some
issues on the DB with encoding types. I probably wouldn’t want to base64
encode everything. :wink:

Any suggestions would be appreciated.

Regards

Sven Schott

Having done a gallery-like webpage, both using FS and DB(mysql), I would
say that using a DB to hold blobs/binaries would be slower, quite
slower, even adding caching and other optims of mysql, I have keep doing
it 'cos this way I hold all the data in single db(metadata and binaries)
but I plan on figuring a way to render the “site” to FS and keep the
metadata in a db(maybe using sqlite for non-binary-data).

The FS has 6000+ files and I didn’t see any slowdown from the server,
now trying to navigate those on the explorer…ouch, on my experience:
*[speed] Keep the binaries on FS and other data on a DB.
*[convenience] keep it all on DB(more and easier “control”).

Having said so I’m a hobbyist, take that into account, not a pro here,
just my limited experience, besides I read somewhere that in such case
the Db would be the bottleneck to move all the binary data, FS being
more efficient. I use Win2k,Apache,Eruby,Mysql with blobs for binaries

Best regards
Adartse

I would suggest using a database and going with sqlite (www.sqlite.org).
Very fast, very easy to use. Single file database.

Also, in the future if need be you could easily migrate to another db system
if need be.

Greg B.

“Sven Schott” sven_schott@compnow.com.au wrote in message
news:17CE3692-B365-11D8-A071-000A9571732E@compnow.com.au…

···

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn’t want to base64 encode everything. :wink:

Any suggestions would be appreciated.

Regards

Sven Schott

Pstore is a bit too simple as I need something that handles multiple
clients and I don’t really want to write a transaction engine(I’m
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

P.S. I had a look at Madeleine but I don’t have the memory size for it(
some of the files can be a few hundred MB and when you have several
thousand files… :slight_smile: )

···

On 01/06/2004, at 5:28 PM, gabriele renzi wrote:

il Tue, 1 Jun 2004 09:46:22 +0900, Sven Schott > sven_schott@compnow.com.au ha scritto::

Any suggestions would be appreciated.

did you considered using PStore (a simple file based DB that workds
like an hash, included in ruby)?
What about madeleine?

Sven Schott wrote:

Pstore is a bit too simple as I need something that handles multiple
clients and I don’t really want to write a transaction engine(I’m hardly
qualified), so MySQL(or Post) would be my first choice. I just really
wanted to know if using the MySQL ruby libraries would be faster or
slower than the File ruby library. How well would ruby handle passing
binary data to MySQL? How fast would it be? Should I just stick to file
operations?

As an alternative to PStore, there is my FSDB library[1]. Like PStore,
it’s pure ruby. But instead of putting everything in one file, it makes
multiple files look like a database, of sorts. Also, it’s thread and
process safe and has simple transactions. It allows multiple back-end
formats for objects: strings (binary or ascii), marshal, yaml, etc.

I don’t expect it will be as fast as as a true database, but it is
convenient to be able to access your “database” as a normal file tree.

[1] FSDB

Pstore is a bit too simple as I need something that handles multiple
clients and I don’t really want to write a transaction engine(I’m
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

I’d be extremely surprised if any database could give you the data faster
than a filesystem can. Certainly not a DB server like MySQL.

SQLite would give you the data quite fast if it could find it quickly
(e.g. with good indexes), since it’s basically a file anyway. This would
be a good trade-off between performance and convenience, as the database
is a single file so you can take it anywhere.

If you go with the filesystem, be careful not to put too many files in one
directory. If each file is named something benign like

D0342345.dat
D0343351.dat

then a good strategy might be to arrange them like this:

D/03/42/34/5/D0342345.dat
D/03/43/35/1/D0343351.dat

That way, the database just remembers “D0343351” and the path to that file
becomes completely predictable and easy for the filesystem to resolve.

This technique applies to some filesystems more than others and I haven’t
used it myself, but know someone who did with thousands of user records,
“indexed” on user ID.

Cheers,
Gavin

In that case, keep the files as files. Next issue is, why not put your
“info” in file instead of in database.

···

Sven Schott sven_schott@compnow.com.au wrote:

Pstore is a bit too simple as I need something that handles multiple
clients and I don’t really want to write a transaction engine(I’m
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

P.S. I had a look at Madeleine but I don’t have the memory size for it(
some of the files can be a few hundred MB and when you have several
thousand files… :slight_smile: )


William Park, Open Geometry Consulting, opengeometry@yahoo.ca
Slackware – because I can type.

i was just checking this out today joel - very cool.

marshal aint that slow for medium sized data sets:

~ > ruby pstore_test.rb 8192
schema @ 0.0243198871612549
insert @ 0.0122055444226135
select @ 0.0250020936364308

~ > ruby sqlite_test.rb 8192
schema @ 0.0509798526763916
insert @ 0.0439234171062708
select @ 0.000174661865457892

~ > cat pstore_test.rb

···

On Wed, 2 Jun 2004, Joel VanderWerf wrote:

Sven Schott wrote:

Pstore is a bit too simple as I need something that handles multiple
clients and I don’t really want to write a transaction engine(I’m hardly
qualified), so MySQL(or Post) would be my first choice. I just really
wanted to know if using the MySQL ruby libraries would be faster or
slower than the File ruby library. How well would ruby handle passing
binary data to MySQL? How fast would it be? Should I just stick to file
operations?

As an alternative to PStore, there is my FSDB library[1]. Like PStore,
it’s pure ruby. But instead of putting everything in one file, it makes
multiple files look like a database, of sorts. Also, it’s thread and
process safe and has simple transactions. It allows multiple back-end
formats for objects: strings (binary or ascii), marshal, yaml, etc.

I don’t expect it will be as fast as as a true database, but it is
convenient to be able to access your “database” as a normal file tree.

[1] FSDB

builtin

require ‘pstore’

n = (n or ARGV.shift).to_i

connect

db = PStore.new ‘pstore.db’

schema

a = Time.now
db.transaction do
db[‘table’] =
db[‘pk_index’] = {}
end
b = Time.now
puts “schema @ #{ b.to_f - a.to_f }”

insert

a = Time.now
n.times do |i|
db.transaction do
table = db[‘table’]
table <<
db[‘pk_index’][i] = table.size - 1
db[‘table’] = table
end
end
b = Time.now
puts “insert @ #{ (b.to_f - a.to_f) / n }”

select

a = Time.now
n.times do |i|
magic = rand n
table, pk_index = db.transaction{[db[‘table’], db[‘pk_index’]]}
tuple = table[pk_index[magic]]
end
b = Time.now
puts “select @ #{ (b.to_f - a.to_f) / n }”

~ > cat sqlite_test.rb

raa

require ‘sqlite’

n = (n or ARGV.shift).to_i

connect

db = SQLite::Database.new ‘sqlite.db’, 0

schema

schema = <<-sql
create table foo(bar, primary key (bar));
sql
a = Time.now
db.execute schema
b = Time.now
puts “schema @ #{ b.to_f - a.to_f }”

insert

a = Time.now
n.times do |i|
sql = <<-sql
insert into foo values (‘#{ i }’);
sql
db.execute sql
end
b = Time.now
puts “insert @ #{ (b.to_f - a.to_f) / n }”

select

a = Time.now
n.times do |i|
magic = rand n
sql = <<-sql
select * from foo where bar = ‘#{ magic }’
sql
db.execute sql
end
b = Time.now
puts “select @ #{ (b.to_f - a.to_f) / n }”

i realize the test is neither accurate nor complete - but interesting
nonetheless.

-a

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it; and a weed grows, even though we do
not love it. --Dogen
===============================================================================