Opinions on library interface design?

I have a question about an interface design issue for Kansas that I wanted
to get some general input on.

Kansas, if you are unaware, is an object-relational mapping library. Result
sets from queries are returned as hash-like objects. However, be object
knows where it came from, and updating the object causes the underlying
database data to be updated transparently. This lets one manipulate the
data without worrying about writing SQL.

i.e.

···

-----
discontinued_products = ksdbh.select(:Products) {|p|
  p.idx.between(104,114)}

discontinued_products.each do |dp|
  dp.name = "#{dp.name} (DISCONTINUED)"
  dp.status = 'unavailable'
end
-----

This small example would select all products with an idx between 104 and 114
from the database. Then by iterating over the array and changing attributes
on the objects, we change the underlying database. Great. Okay, now for
the question.

There are times where one would like to use the query interface to select
data, but one does not want the result sets that are returned to be coupled
to the database because the data is going to be manipulated in some sort of
temporary way. Maybe we select an array of results and we're going to
iterate over them and display them on a web page, but we want to run the
array through some sort of filtering or modification first. The changes are
for display only and are not to be applied back to the original data in the
database.

I have two thoughts about this. First would be nice way of telling the
select() method to return it's data in a different format than the standard
format. Maybe via a seperate set of options to the library or via an
additional parameter passed to the method.

Second would be to have some method attached to the data objects returned by
Kansas that would have the object return its data in some other object
format. Maybe it could be as simple as having the object honor a to_h()
method or a to_a() method to return a representation of itself as a hash or
an array?

The first approach includes more magic. However, if I am doing a select on
10000 rows, and I want to retrieve them simply as hashes, it'd be nice to be
able to easily tell Kansas that for this query, return the result set as
hashes without having to return the result set as a db coupled object, then
iterate through all of them asking for hash representation of each.

So....what is a nice, elegant way to tell select() that I want the result
set back as hashes, or arrays (or arrayfields), or structs, or whatever,
instead of the default db coupled objects?

Thanks in advance for your ideas,

Kirk Haines

Kirk Haines wrote:

There are times where one would like to use the query interface to select data, but one does not want the result sets that are returned to be coupled to the database because the data is going to be manipulated in some sort of temporary way. Maybe we select an array of results and we're going to iterate over them and display them on a web page, but we want to run the array through some sort of filtering or modification first. The changes are for display only and are not to be applied back to the original data in the database.

I have two thoughts about this. First would be nice way of telling the select() method to return it's data in a different format than the standard format. Maybe via a seperate set of options to the library or via an additional parameter passed to the method.

Second would be to have some method attached to the data objects returned by Kansas that would have the object return its data in some other object format. Maybe it could be as simple as having the object honor a to_h() method or a to_a() method to return a representation of itself as a hash or an array?

The first solution seems nicer to me (although the second one doesn't
seem totally exclusive of that). You might add on a parameter or even
use a different method name (select! comes to mind, but it's probably
not appropriate).

For one thing, I can't help thinking that it's more efficient to return
"uncoupled" data except when you need it coupled.

But I am not a db guy, so my opinion is far from definitive.

Hal

Kansas, if you are unaware, is an object-relational mapping library. Result
sets from queries are returned as hash-like objects. However, be object
knows where it came from, and updating the object causes the underlying
database data to be updated transparently. This lets one manipulate the
data without worrying about writing SQL.

i was unaware - sounds super cool - i'll check it out today.

<snip op question>

So....what is a nice, elegant way to tell select() that I want the result
set back as hashes, or arrays (or arrayfields), or structs, or whatever,
instead of the default db coupled objects?

i like the idea of NOT using 'select' since, in essence these object have not
been 'selected' as our working set - we would only be looking at them. to me
that suggests a completely new method name, something like

   def browse criteria, as = Hash
     cb = block_given?
     accum =

     query(criteria) do |intermediate|
       res =
         case as
           when Hash
             intermediate.to_h
           when Array
             intermediate.to_a
           when Struct
             intermediate.to_su
           else
             as.new intermediate
         end

       cb ? yield res : accum << res
     end

     cb ? self : accum
   end

which provide some canned types to be returns and would provide user
defineable types to be returned (case - else).

thoughts...

-a

···

On Wed, 30 Jun 2004, Kirk Haines wrote:
--

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
A flower falls, even though we love it;
and a weed grows, even though we do not love it. --Dogen

===============================================================================

I think, at every row iteration, you should have a "accept" or "discard"
method. Changes probably shouldn't occur automatically without a call to
"accept", and row iterations without a call to "accept" should automatically
enact a "discard" action. That way, only when someone changes data and then
calls "accept" does the underlying data get called.

Doing it that way means that iterating over thousands of rows to build a web
page goes fast; no special calls are made just to read data, and changes are
automatically discarded. Reads are usually more common than writes, so
speeding up the read is probably the best idea.

Transforming rows into hashes at each iteration would be a burden to reads,
which are more common than writes. I would leave your code alone, and
present the data the way you are now, but don't write data changes directly
to the DB. Cache the changes and write to the DB only when "accept" is
called. Discard changes when "discard" is called or when the row is iterated
again.

  Sean O'Dell

···

On Wednesday 30 June 2004 07:57, Kirk Haines wrote:

I have two thoughts about this. First would be nice way of telling the
select() method to return it's data in a different format than the standard
format. Maybe via a seperate set of options to the library or via an
additional parameter passed to the method.

Second would be to have some method attached to the data objects returned
by Kansas that would have the object return its data in some other object
format. Maybe it could be as simple as having the object honor a to_h()
method or a to_a() method to return a representation of itself as a hash or
an array?

i was unaware - sounds super cool - i'll check it out today.

I have a couple of super stupid bugs in the current release at rubyforge.
I'll release a slight update shortly that fixes them. You can find some
docs at:

http://enigo.com/projects/kansas

It can be downloaded at:

http://rubyforge.org/projects/kansas

i like the idea of NOT using 'select' since, in essence these object
have not been 'selected' as our working set - we would only be
looking at them. to me that suggests a completely new method name,
something like

   def browse criteria, as = Hash
     cb = block_given?
     accum =

     query(criteria) do |intermediate|
       res =
         case as
           when Hash
             intermediate.to_h
           when Array
             intermediate.to_a
           when Struct
             intermediate.to_su
           else
             as.new intermediate
         end

       cb ? yield res : accum << res
     end

     cb ? self : accum
   end

which provide some canned types to be returns and would provide user
defineable types to be returned (case - else).

That's a good idea. I like the method name, too. I'm going to give this a
whirl in some code I'm writing now.

Thanks,

Kirk Haines

···

On Thu, 1 Jul 2004 01:02:52 +0900, Ara.T.Howard wrote

Hmmm. What you are describing, basically, is to not autocommit changes.
This is a configurable setting in Kansas. When autocommit is off, changes
to an object only get committed to the database if commit() is called. If
rollback() is called, the object is returned to the state it was in
originally.

That is a viable solution, but I think that I prefer the notion that when
you use Kansas to select() from the database, you are getting back data that
is bidirectionally mapped to the database whereas something like browse()
can be used to perform syntactically identical selects that, rather than
returning bidirectionally mapped data, return something that is more speed
and memory efficient for reading (such as plain old DBI::Row objects or
arrays or whatever), such as what Ara suggested.

This doesn't preclude using the transactional features to avoid accidental
writes to the database. That's actually one very good reason to make use of
the transaction features, IMHO.

Thanks,

Kirk Haines

···

On Thu, 1 Jul 2004 02:04:47 +0900, Sean O'Dell wrote

Transforming rows into hashes at each iteration would be a burden to
reads, which are more common than writes. I would leave your code
alone, and present the data the way you are now, but don't write
data changes directly to the DB. Cache the changes and write to the
DB only when "accept" is called. Discard changes when "discard" is
called or when the row is iterated again.

If you prefer a browse method which is analogous to select, except that it's
read-only, then I would make browse simply turn off autocommit and then call
select. That way you are always giving row data to the developers using the
same objects; they don't have to re-write any code to suddenly decide that
they need to make small changes to a row as they iterate through a table.
Your pattern here could be "select creates a return set and changes to fields
are written, browse creates the same return set but changes to fields are
ignored." You could also allow changes to the data set returned by browse to
be written back to the database with an explicit call.

If you want hash-like functionality for your rows for both select and browse,
do something like what Ara did with arrayfields. Write a special method that
accesses fields in a row by name, so you can say:
dataset.fields_by_name['id'] or dataset.fields[0] and if id is the first
field in the row, both return the same value. You could also allow setting
the field to a new value this way.

  Sean O'Dell

···

On Wednesday 30 June 2004 10:46, Kirk Haines wrote:

On Thu, 1 Jul 2004 02:04:47 +0900, Sean O'Dell wrote

> Transforming rows into hashes at each iteration would be a burden to
> reads, which are more common than writes. I would leave your code
> alone, and present the data the way you are now, but don't write
> data changes directly to the DB. Cache the changes and write to the
> DB only when "accept" is called. Discard changes when "discard" is
> called or when the row is iterated again.

Hmmm. What you are describing, basically, is to not autocommit changes.
This is a configurable setting in Kansas. When autocommit is off, changes
to an object only get committed to the database if commit() is called. If
rollback() is called, the object is returned to the state it was in
originally.

That is a viable solution, but I think that I prefer the notion that when
you use Kansas to select() from the database, you are getting back data
that is bidirectionally mapped to the database whereas something like
browse() can be used to perform syntactically identical selects that,
rather than returning bidirectionally mapped data, return something that is
more speed and memory efficient for reading (such as plain old DBI::Row
objects or arrays or whatever), such as what Ara suggested.

This doesn't preclude using the transactional features to avoid accidental
writes to the database. That's actually one very good reason to make use
of the transaction features, IMHO.