Refactoring transactional support within Puppet (long)

Hi all,

This is one of those "complicated" problems I mentioned a bit ago. I'm looking for help simplifying some of Puppet's internals, specifically the parts related to transactions and idempotency. Transactional support is important for the standard reasons plus good reporting, and idempotency allows me to apply a configuration and only change the bits that are out of compliance, so the same configuration can be applied, say, every half hour and it will just fix anything that's somehow gotten broken, which is both safer and faster.

Puppet's idempotency and transactions are currently closely related. Puppet is organized around a lot of high-level types like 'user', 'group', 'package', and 'service'. Each of these types has parameters that affect how they function (like 'loglevel' and 'recurse') and parameters that actually modify the system (like 'uid' and 'gid' on files). The complicated ones are those that modify the system.

Currently, each of these parameters is defined in a separate class, and that class has to define a 'retrieve' and a 'sync' method. 'retrieve' sets '@is' to be the current value (e.g., for UID on a file, it would do a stat and get the UID from the stat), and 'sync' basically takes the desired configuration (in '@should') and modifies the system so it matches. So, the types have hooks for idempotency.

It's the transactions that actually provide idempotency. I create a transaction and pass it a list of type instances (e.g., a bunch of files, services, whatever). It steps through each instance, checks to see if the instance is out of sync, and syncs any parameters that are out of sync. This gives me great reporting, because the transaction can always log exactly what it's doing, and rollback is pretty easy because I can just switch '@is' and '@should' and sync again for most cases.

There are two significant problems with this scenario: First, it's pretty annoying to have to maintain @is and @should separately in the parameters. It would be much, much better if 'retrieve' and 'sync' could just work like getter and setter methods, returning or accepting a value. Second, this system makes it pretty annoying for someone else to use the library. I want to get to the point where anyone can use the Puppet library to make changes to the system, but for that to happen, the library interface needs to be simple. I want something like this:

   sudoers = Puppet::Type.create(:type => :file, :path => "/etc/sudoers")
   sudoers.uid = 0 unless sudoers.uid == 0

Instead, you pretty much have to use a transaction to do any work right now:

   sudoers = Puppet::Type.create(:type => :file, :path => "/etc/sudoers")
   trans = Puppet::Transaction.new(sudoers)
   trans.evaluate

It's totally unclear what's going on there, and it's not exactly easy to use. I'd also like to make it simple for people to use transactions if they want, but I want it to be a good bit simpler:

   report = Puppet.transaction do
     sudoers = Puppet::Type.create(:type => :file, :path => "/etc/sudoers")
     sudoers.uid = 0 unless sudoers.uid == 0
   end

That way people could still get the logging and rollback that always come with transactions, but only if they wanted them and in a way that they can see what's happening. By the way, the objects often live much longer than the transactions -- I have a long-running daemon that instantiates the objects once and applies them all in a new transaction every half hour.

I think all of these problems (getting rid of '@is' and '@should', simplifying transactional use, and simplifying use of the objects) can have a single solution, but I don't know what it is. It could be something like objects somehow knowing whether they're running under a transaction, but I don't know how I'd do that without making transactions either a singleton (which I can't afford, because I know sometimes I'll need subtransactions) or very complex (e.g., creating a 'transaction' instance variable for every object, and then nil'ing that variable at the end of the transaction).

Anyone have any ideas? Any recommendations for what you'd want this library interface to look like, either using transactions or not?

If you want to look at the code more closely, you can get it from svn at http://reductivelabs.com/svn/puppet/trunk, or in Trac at https://reductivelabs.com/cgi-bin/puppet.cgi/browser/trunk . The transaction class is relatively straightforward, and most of the types are simple enough to understand, although the Type baseclass is a bit long and messy for my tastes.

···

--
Luke Kanies
http://madstop.com | http://reductivelabs.com | 615-594-8199

1) make transactions re-entrant AND singleton

2) make __all__ operations take place in a transaction

eg

   def initialize
     ...
     @transaction_mutex = Mutex.new
     @in_transaction = false
     ...
   end

   def transaction
     @transaction_mutex.sychronize do
       if @in_transaction
         yield
       else
         it = @in_transaction
         begin
           @in_transaction = true
           yield
         ensure
           @in_transaction = it
         end
       end
     end
   end

   alias_method "t", "transaction"

...

   def foo() t{ @foo = 42 } end
   def bar() t{ @bar = 42 } end
   def foobar() t{ foo and bar } end

the state may have to global/module-level for this to work, but you get the
idea. probably the easiest way is

   module Transaction
     # all transaction (global) state and methods
   end

   class C
     include Transaction
   end

etc.

2 cts.

-a

···

On Sat, 19 Aug 2006, Luke Kanies wrote:

It's totally unclear what's going on there, and it's not exactly easy to use. I'd also like to make it simple for people to use transactions if they want, but I want it to be a good bit simpler:

report = Puppet.transaction do
   sudoers = Puppet::Type.create(:type => :file, :path => "/etc/sudoers")
   sudoers.uid = 0 unless sudoers.uid == 0
end

That way people could still get the logging and rollback that always come
with transactions, but only if they wanted them and in a way that they can
see what's happening. By the way, the objects often live much longer than
the transactions -- I have a long-running daemon that instantiates the
objects once and applies them all in a new transaction every half hour.

I think all of these problems (getting rid of '@is' and '@should',
simplifying transactional use, and simplifying use of the objects) can have
a single solution, but I don't know what it is. It could be something like
objects somehow knowing whether they're running under a transaction, but I
don't know how I'd do that without making transactions either a singleton
(which I can't afford, because I know sometimes I'll need subtransactions)
or very complex (e.g., creating a 'transaction' instance variable for every
object, and then nil'ing that variable at the end of the transaction).

Anyone have any ideas? Any recommendations for what you'd want this library interface to look like, either using transactions or not?

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dali lama

Hmm. If I do this, then at the very least I want to organize things so that the developer doesn't need to know about transactions; either I use intermediate methods that handle the fact that it's in a transaction, or I do some method-renaming so that the direct methods get replaced with methods that go through a transaction.

That's somewhat immaterial, though, I guess. You're basic point is that there should only ever be one transaction at a time, right? There'd be no concept of sub-transactions, but anything that got done in the middle of a transaction would automatically be included in the transaction.

I'm not sure about always working within a transaction; I'm not sure it's reasonable to assume that every user of Puppet's library will want to use transactions, but I'm not sure it harms anything to do so. I'd probably want things set up so that if there is a transaction, all work is done within that transaction, and if there is not one, then no transaction is used.

So that gives me an idea of how to handle transactions essentially transparently (with some modification necessary to make it transparent to the developer, also), but I still need to figure out how to transparently handle the three-phase collect, compare, commit operations. It seems that some controlling process would need to do that, and currently my transaction is the controlling process, but with your recommendation the transaction moves completely into the background (which is probably where it belongs, largely).

There's also a lot of logging, error handling, and event handling that currently take place in the transaction, so I would need to translate that into these behind-the-scenes transactions, but I wouldn't guess that would be too difficult.

Thanks.

···

On Aug 18, 2006, at 2:11 PM, ara.t.howard@noaa.gov wrote:

1) make transactions re-entrant AND singleton

2) make __all__ operations take place in a transaction

eg

  def initialize
    ...
    @transaction_mutex = Mutex.new
    @in_transaction = false
    ...
  end

  def transaction
    @transaction_mutex.sychronize do
      if @in_transaction
        yield
      else
        it = @in_transaction
        begin
          @in_transaction = true
          yield
        ensure
          @in_transaction = it
        end
      end
    end
  end

  alias_method "t", "transaction"

...

  def foo() t{ @foo = 42 } end
  def bar() t{ @bar = 42 } end
  def foobar() t{ foo and bar } end

--
Luke Kanies
http://madstop.com | http://reductivelabs.com | 615-594-8199