[ANN] KirbyBase 2.2

I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby database management system that stores it's data in plain-text files.

You can download the new version here:

Windows: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.zip
Linux/Unix: http://www.netpromi.com/files/KirbyBase_Ruby_2.2.tar.gz

You can find out more about Kirbybase at:

http://www.netpromi.com/kirbybase_ruby.html

I would like to thank Hugh Sasse for his bug fixes and code enhancements and I would like to thank Emiel van de Larr for his bug fixes.

List of changes:

* By far the biggest change in this version is that I have completely
   redesigned the internal structure of the database code. Because the
   KirbyBase and KBTable classes were too tightly coupled, I have created
   a KBEngine class and moved all low-level I/O logic and locking logic
   to this class. This allowed me to restructure the KirbyBase class to
   remove all of the methods that should have been private, but couldn't be
   because of the coupling to KBTable. In addition, it has allowed me to
   take all of the low-level code that should not have been in the KBTable
   class and put it where it belongs, as part of the underlying engine. I
   feel that the design of KirbyBase is much cleaner now. No changes were
   made to the class interfaces, so you should not have to change any of
   your code.

* Changed str_to_date and str_to_datetime to use Date#parse method.

* Changed #pack method so that it no longer reads the whole file into
   memory while packing it.

* Changed code so that special character sequences like &linefeed; can be
   part of input data and KirbyBase will not interpret it as special
   characters.

Enjoy!

Jamey Cribbs
jcribbs@twmi.rr.com

* Jamey Cribbs wrote:

I would like to announce version 2.2 of KirbyBase, a simple, pure-Ruby
database management system that stores it's data in plain-text files.

The idea of plain text files appealed to me a lot (I had been pondering
something similar myself, but couldn't have implemented in such a
general fashion), so I decided to try it in my Usenet news statistics
script, on which I'm learning lots of Ruby techniques.

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

Unfortunately, it turned out to be none faster. I wonder if I'm doing
anything wrong. What I save in time waiting for the server, KirbyBase
seems to eat away in processing time (disk access hardly mentionable
with my 6000 rows, 10KB of data). Is it true that you need a lot of
processing power to use it, and my PIII-500 (Win-2K/Cygwin) is just not
up to the task?

You said:

Right now, it performs pretty well on small databases

and even

It is fairly fast, comparing favorably to SQLite

Well, one reason to try it was that I had installation problems with
SQLite, so I can't compare directly, but now I wonder how it could ever
compete. One select for string equality on my 6000 rows takes half a
second or so, so I gave up on that completely.

···

--
Oliver C.
45n31, 73w34
Temperatur: 6.9°C (13 May 2005 10:00 AM EDT)

Oliver Cromm wrote:

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

This might be the source of the slowness. Is this field that you are reading by date defined as a Date field in the KirbyBase table? If it is, this is probably the problem. As I note in the manual, Ruby's Date/DateTime librarys are S-L-O-W! They really need to be rewritten as C libraries. Every time KirbyBase does a select on a Date field, it has to read in each record from the table's physical file and do a Date.new on the data. Like I said, this is slow!

Here is an alternative to try: define this field in the table as a String field instead of a Date field. Select's will still work pretty much the same way because, for example:

    2005-05-25 > 2005-05-24

and

Date.new(2005,05,25) > Date.new(2005,05,24)

are both true. In other words, Strings formatted similarly to the way Date's look compare the same way.

Give this a try and see if you see a speed improvement. I have tried it and have seen dramatic improvements.

Let me know how it goes.

Jamey

Jamey Cribbs ha scritto:

Here is an alternative to try: define this field in the table as a String field instead of a Date field. Select's will still work pretty much the same way because, for example:

   2005-05-25 > 2005-05-24

and

Date.new(2005,05,25) > Date.new(2005,05,24)

are both true. In other words, Strings formatted similarly to the way Date's look compare the same way.

Give this a try and see if you see a speed improvement. I have tried it and have seen dramatic improvements.

Let me know how it goes.

why don't use a Time object?

Jamey Cribbs wrote:

Oliver Cromm wrote:

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? [...]

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

    2005-05-25 > 2005-05-24

I left the Date field as a string in the format I originally receive
them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
use ParseDate. This is overhead for sure, but the point is that it is
the same thing I do for the non-caching version (receive a specified
number of Dates and decide which are within my limits).

But I'll go ahead and try a version where I parse at read-in time and
store the result, which would be a number (or two numbers, as I'd want
to keep the time zone separate).

···

--
WinErr 008: Erroneous error. Nothing is wrong.

gabriele renzi wrote:

why don't use a Time object?

I chose to have Date/DateTime be field types in KirbyBase, rather than Time, because Time can only store dates back to 1970.

Jamey

* Oliver Cromm wrote:

Jamey Cribbs wrote:

Oliver Cromm wrote:

So for a start, I plugged KirbyBase in just as a cache - where before, I
was reading header data from a news server each time, in the new version
I save the raw data to a KirbyBase, add only recent messages, then read
the part of the data I want (by date) from the KirbyBase.

This might be the source of the slowness. Is this field that you are
reading by date defined as a Date field in the KirbyBase table? [...]

Here is an alternative to try: define this field in the table as a
String field instead of a Date field. Select's will still work pretty
much the same way because, for example:

    2005-05-25 > 2005-05-24

I left the Date field as a string in the format I originally receive
them, e.g. "Wed, 18 May 2005 10:29:44 +0900". Then, for each message, I
use ParseDate. This is overhead for sure, but the point is that it is
the same thing I do for the non-caching version (receive a specified
number of Dates and decide which are within my limits).

But I'll go ahead and try a version where I parse at read-in time and
store the result, which would be a number (or two numbers, as I'd want
to keep the time zone separate).

I found some time now for further experiments, and stored time as an
integer. And yes, it is significantly faster this way, even slightly
faster than my first attempt to do the same with SQLite.

Times from some test with similar, not exactly equal tasks, so read with
spoons of salt:
- reading data fresh from News server: 50s
- reading from KirbyBase with original format (rfc2822) Date field: 45s
- reading from KirbyBase with Date as Integer: 12s
- reading from SQLite with Date as Integer: 16s

I have to do quite a number of calculations on that field; for every
record selected (and in my simple experiments, that is nearly all of
them), I need to extract at least the day of the week and the day
number. But apparently, that doesn't take nearly as much time as a
KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
what is going on with the select, but I know how to circumvent the
problem.

···

--
Oliver C.
45n31, 73w34
Temperatur: 14.9°C (25 May 2005 11:00 AM EDT)

Jamey Cribbs <jcribbs@twmi.rr.com> writes:

gabriele renzi wrote:

why don't use a Time object?

I chose to have Date/DateTime be field types in KirbyBase, rather than
Time, because Time can only store dates back to 1970.

ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]

irb(main):006:0> Time.at -1600000000
=> Sun Apr 20 12:33:20 CET 1919

···

Jamey

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Oliver Cromm wrote:

I found some time now for further experiments, and stored time as an
integer. And yes, it is significantly faster this way, even slightly
faster than my first attempt to do the same with SQLite.

Times from some test with similar, not exactly equal tasks, so read with
spoons of salt:
- reading data fresh from News server: 50s
- reading from KirbyBase with original format (rfc2822) Date field: 45s
- reading from KirbyBase with Date as Integer: 12s
- reading from SQLite with Date as Integer: 16s

I have to do quite a number of calculations on that field; for every
record selected (and in my simple experiments, that is nearly all of
them), I need to extract at least the day of the week and the day
number. But apparently, that doesn't take nearly as much time as a
KirbyBase "select" based on ParseDate(aField). I'm not quite clear about
what is going on with the select, but I know how to circumvent the
problem.

If I remember my experiments correctly when I first ported KirbyBase from Python to Ruby and noticed the significant speed difference when using Date/Datetime, my guess was that there isn't anything going on in #select that is causing the slowness. It is just that, in Ruby, creating a new Date/DateTime object is relatively slow, compared to Python. My further guess as to why this was is that, in Python, the datetime library is written in C, while in Ruby, the Date/DateTime library is written in Ruby. How's that for exhaustive scientific analysis? :slight_smile:

I could be totally wrong about this, but I am guessing that if the Date/DateTime library was re-written in C, it would be significantly faster and you would likewise notice a marked speed improvement while using Date/DateTime fields in KirbyBase. Unfortunately, since I am not a C programmer, I can't actually do this to test my theory. Hence, my workaround is to usually define any date fields I need as String fields. It speeds things up and, for comparison purposes, things pretty much work the same way.

Jamey

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.

Christian Neukirchen wrote:

Jamey Cribbs <jcribbs@twmi.rr.com> writes:

gabriele renzi wrote:

why don't use a Time object?

I chose to have Date/DateTime be field types in KirbyBase, rather than
Time, because Time can only store dates back to 1970.
   
ruby 1.8.2 (2004-12-25) [powerpc-darwin7.7.0]

irb(main):006:0> Time.at -1600000000
=> Sun Apr 20 12:33:20 CET 1919

When I tried this on my WindowsXP machine I got the following error:

irb(main):001:0> Time.at -1600000000
ArgumentError: time must be positive
        from (irb):1:in `at'
        from (irb):1
irb(main):002:0>

So, it does not let you use negative Times on XP. That's why I had to use Date/DateTime.

Jamey

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.