Code organisation help request

Hello,

I've written some ruby code that creates Fortran95 code to define and manipulate derived types (aka structures) as well as perform "regular" (what I ambiguously refer to as "Binary") and netCDF I/O. Via a driver script, I provide a text file with the simple structure definition to my ruby code and out pops three modules (fortran95 modules, not ruby ones) that I can compile.

The way I do this currently is like so:

f=FDefMod.new
f.read(def_file) # Parse the file containing an f95 structure definition
f.createDefModule # Create the structure definition f95 module
f.createBinIOModule # Create the binary I/O f95 module
f.createNCIOModule # Create the netCDF I/O f95 module

This creates three compilable f95 modules. (It's all very cool, really :o)

The problem I'm facing is that my ruby class definition file, fdefmod.rb, has grown unmanageably large and I want to split the code up into more manageable pieces but I can't figure out how best to do it.

I would like all the methods used to create the f95 definition code (i.e. the public createDefModule and about 20 private methods) to reside in one file, the methods to create the f95 binary I/O in another, and similarly for the methods to create the f95 netCDF I/O code. E.g.

   fdefmod.rb : contains the FDefMod class definition and associated methods
   fdefmod_def.rb : contains the instance methods that create the f95 structure
                       definition module
   fdefmod_binio.rb: contains the instance methods that create the f95 binary I/O module
   fdefmod_ncio.rb : contains the instance methods that create the f95 netCDF I/O module

The last three files would be included (in the C sense) in the FDefMod class definition file. All of the methods in the three "functional categories" (def, binio, and ncio) are, of course, instance methods. Currently, the monolithic fdefmod.rb file contains everything and it's a nightmare to organise the code itself and associated unit tests -- to say nothing of adding new features.

So: how to split up the code? Module mixins? Subclasses?

The way I see it, the above type of organisational structure, while logical, is opposite to how one would typically use modules to mixin methods since the methods in question aren't generic, they're very specific.

And inheritance doesn't seem right either since each code component I'm trying to separate aren't really classes in their own right; they're just a bunch of methods operating on a class to create a particular format of output.

In addition, there are a number of class methods that are used by all three "functional categories" of code. These class methods are just little utilities for formatting the output f95 code are all private to the FDefMod class.

I've been thinking myself in circles for a while now, so my explanation may be malformed. If so, I apologise.

Any ideas, suggestions, hints, etc much appreciated.

cheers,

paulv

···

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx

I would keep FDefMod as driver (i.e. the user instantiates it and invokes a method that does *all* the work. I'd then go on to create a class containing configuration information only (this is filled from the file; an instance of OpenStruct or Hash might be sufficient depending on your config info). Then I'd go on creating three classes for the three output types if there is enough different code to justify this. You could then connect those classes in any way that seems reasonable (i.e. have a common base class that contains shared code and state, create a module and include it in all three or other).

Your FDefMod.create_struct(file) will then instantiate those three classes while it goes along and those instances of your output generation classes will then create output files.

If you would want to classify this with a pattern name it's a bit like nested command object pattern, i.e. you have a major command and sub commands (i.e. one per output type).

Of course, this is said with the little we know about your code. There might be potential for further refactorings in the code you have (i.e. split up methods into several methods, identify common code / patterns and extract them into additional methods that are invoked in multiple places etc.). Btw, how many LOC are we talking about?

Kind regards

  robert

···

On 08.01.2007 16:50, Paul van Delst wrote:

Hello,

I've written some ruby code that creates Fortran95 code to define and manipulate derived types (aka structures) as well as perform "regular" (what I ambiguously refer to as "Binary") and netCDF I/O. Via a driver script, I provide a text file with the simple structure definition to my ruby code and out pops three modules (fortran95 modules, not ruby ones) that I can compile.

The way I do this currently is like so:

f=FDefMod.new
f.read(def_file) # Parse the file containing an f95 structure definition
f.createDefModule # Create the structure definition f95 module
f.createBinIOModule # Create the binary I/O f95 module
f.createNCIOModule # Create the netCDF I/O f95 module

This creates three compilable f95 modules. (It's all very cool, really :o)

The problem I'm facing is that my ruby class definition file, fdefmod.rb, has grown unmanageably large and I want to split the code up into more manageable pieces but I can't figure out how best to do it.

I would like all the methods used to create the f95 definition code (i.e. the public createDefModule and about 20 private methods) to reside in one file, the methods to create the f95 binary I/O in another, and similarly for the methods to create the f95 netCDF I/O code. E.g.

  fdefmod.rb : contains the FDefMod class definition and associated methods
  fdefmod_def.rb : contains the instance methods that create the f95 structure
                      definition module
  fdefmod_binio.rb: contains the instance methods that create the f95 binary I/O module
  fdefmod_ncio.rb : contains the instance methods that create the f95 netCDF I/O module

The last three files would be included (in the C sense) in the FDefMod class definition file. All of the methods in the three "functional categories" (def, binio, and ncio) are, of course, instance methods. Currently, the monolithic fdefmod.rb file contains everything and it's a nightmare to organise the code itself and associated unit tests -- to say nothing of adding new features.

So: how to split up the code? Module mixins? Subclasses?

The way I see it, the above type of organisational structure, while logical, is opposite to how one would typically use modules to mixin methods since the methods in question aren't generic, they're very specific.

And inheritance doesn't seem right either since each code component I'm trying to separate aren't really classes in their own right; they're just a bunch of methods operating on a class to create a particular format of output.

In addition, there are a number of class methods that are used by all three "functional categories" of code. These class methods are just little utilities for formatting the output f95 code are all private to the FDefMod class.

I've been thinking myself in circles for a while now, so my explanation may be malformed. If so, I apologise.

Any ideas, suggestions, hints, etc much appreciated.

Robert Klemme wrote:

Hello,

I've written some ruby code that creates Fortran95 code to define and manipulate derived types (aka structures) as well as perform "regular" (what I ambiguously refer to as "Binary") and netCDF I/O. Via a driver script, I provide a text file with the simple structure definition to my ruby code and out pops three modules (fortran95 modules, not ruby ones) that I can compile.

The way I do this currently is like so:

f=FDefMod.new
f.read(def_file) # Parse the file containing an f95 structure definition
f.createDefModule # Create the structure definition f95 module
f.createBinIOModule # Create the binary I/O f95 module
f.createNCIOModule # Create the netCDF I/O f95 module

This creates three compilable f95 modules. (It's all very cool, really :o)

The problem I'm facing is that my ruby class definition file, fdefmod.rb, has grown unmanageably large and I want to split the code up into more manageable pieces but I can't figure out how best to do it.

I would like all the methods used to create the f95 definition code (i.e. the public createDefModule and about 20 private methods) to reside in one file, the methods to create the f95 binary I/O in another, and similarly for the methods to create the f95 netCDF I/O code. E.g.

  fdefmod.rb : contains the FDefMod class definition and associated methods
  fdefmod_def.rb : contains the instance methods that create the f95 structure
                      definition module
  fdefmod_binio.rb: contains the instance methods that create the f95 binary I/O module
  fdefmod_ncio.rb : contains the instance methods that create the f95 netCDF I/O module

The last three files would be included (in the C sense) in the FDefMod class definition file. All of the methods in the three "functional categories" (def, binio, and ncio) are, of course, instance methods. Currently, the monolithic fdefmod.rb file contains everything and it's a nightmare to organise the code itself and associated unit tests -- to say nothing of adding new features.

So: how to split up the code? Module mixins? Subclasses?

The way I see it, the above type of organisational structure, while logical, is opposite to how one would typically use modules to mixin methods since the methods in question aren't generic, they're very specific.

And inheritance doesn't seem right either since each code component I'm trying to separate aren't really classes in their own right; they're just a bunch of methods operating on a class to create a particular format of output.

In addition, there are a number of class methods that are used by all three "functional categories" of code. These class methods are just little utilities for formatting the output f95 code are all private to the FDefMod class.

I've been thinking myself in circles for a while now, so my explanation may be malformed. If so, I apologise.

Any ideas, suggestions, hints, etc much appreciated.

I would keep FDefMod as driver (i.e. the user instantiates it and invokes a method that does *all* the work. I'd then go on to create a class containing configuration information only (this is filled from the file; an instance of OpenStruct or Hash might be sufficient depending on your config info). Then I'd go on creating three classes for the three output types if there is enough different code to justify this. You could then connect those classes in any way that seems reasonable (i.e. have a common base class that contains shared code and state, create a module and include it in all three or other).

O.k., I think I grok your meaning, but some implementation details still escape me (I'm exposing my ignorance of both OO and ruby below, so please bear with me.... and no laughing. Groans of disbelief, however, are acceptable :o). This is what I came up with:

In file basemodule.rb:

module BaseModule
   class BaseClass
     attr_accessor :name
     def initialize
       @name=""
     end
     def self.basem
       puts("This is class method basem")
     end
   end
end

In file derivedclass.rb

require 'basemodule'
class DerivedClass
   include BaseModule
   def self.output(obj)
     puts("The obj class : #{obj.class}")
     puts("The config info is: #{obj.config.inspect}")
   end
end

and in mainclass.rb

require 'basemodule'
require 'derivedclass'
class MainClass
   include BaseModule
   attr_accessor :config
   def initialize
     @config=BaseClass.new
   end
   def output
     DerivedClass.output(self)
   end
end

??

Now the above is very messy, so I obviously still need some guidance. One thing I find particularly odious is that to get the above to work, I needed to make the DerivedClass methods class methods rather than instance methods. For a suitable definition of "work", it does work, though:

lnx:scratch/ruby : irb --simple-prompt
>> require 'mainclass'
=> true
>> f=MainClass.new
=> #<MainClass:0x401081ec @config=#<BaseModule::BaseClass:0x401081c4 @name="">>
>> f.config.name="blah"
=> "blah"
>> f.output
The obj class : MainClass
The config info is: #<BaseModule::BaseClass:0x401081c4 @name="blah">
=> nil
>> MainClass::BaseClass::basem
This is class method basem
=> nil

Further enlightenment would be appreciated.

Your FDefMod.create_struct(file) will then instantiate those three classes while it goes along and those instances of your output generation classes will then create output files.

If you would want to classify this with a pattern name it's a bit like nested command object pattern, i.e. you have a major command and sub commands (i.e. one per output type).

That is what I was thinking to do. Sometimes, I just need the definition module, but not the I/O ones.

Of course, this is said with the little we know about your code. There might be potential for further refactorings in the code you have (i.e. split up methods into several methods, identify common code / patterns and extract them into additional methods that are invoked in multiple places etc.).

That is exactly my plan. For each of the so-called "derived classes", the same procedure can be applied again. For example, if my f95 structure is named "MyStruct", then the f95 definition module that is created, MyStruct_Define.f90, would contain the public procedures,
   Associated_MyStruct: Check if all the pointer components are associated
   Create_MyStruct : Allocate the pointer components of the structure
   Destroy_MyStruct : Deallocate the pointer components
   Assign_MyStruct : Deep copy the structure (a simple assignment just copies
                        the pointer references, not the actual data)
   Equal_MyStruct : Determine if two structures are equal
   Info_MyStruct : Print out info on the structure dimensions

The ruby methods to create each individual f95 procedure could be put into its own class with one public instance method and a bunch of private methods (for formatting particular things in the output procedure.) I could have a separate file (class?) that handles the creation of each of the above procedures in the f95 module. Same for the i/o stuff. It might make unit testing each part easier too.

There are some methods that are used in several places to create parts of just the f95 definition module, just as there are common methods used to create parts of just the f95 I/O module; and then there are common methods used in formatting output for all three f95 modules.

As I added functionality, I've been refactoring a lot - that's another reason I want to split the code into smaller bits: it should allow for much easier identification of the common parts (at least, I think so.)

Btw, how many LOC are we talking about?

Hardly any. About 3K loc. In the future I think that can be reduced quite a bit since there is still a fair amount of boilerplate Fortran95 code in there that is just getting dumped via herefiles.

cheers,

paulv

···

On 08.01.2007 16:50, Paul van Delst wrote:

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx

Hi Paul

Paul van Delst wrote:

Hardly any. About 3K loc.

3,000 is hardly any?! Have I created a monster? :slight_smile:

Regards,

···

--
Bil Kleb
http://fun3d.larc.nasa.gov

Robert Klemme wrote:

I would keep FDefMod as driver (i.e. the user instantiates it and invokes a method that does *all* the work. I'd then go on to create a class containing configuration information only (this is filled from the file; an instance of OpenStruct or Hash might be sufficient depending on your config info). Then I'd go on creating three classes for the three output types if there is enough different code to justify this. You could then connect those classes in any way that seems reasonable (i.e. have a common base class that contains shared code and state, create a module and include it in all three or other).

O.k., I think I grok your meaning, but some implementation details still escape me (I'm exposing my ignorance of both OO and ruby below, so please bear with me.... and no laughing. Groans of disbelief, however, are acceptable :o). This is what I came up with:

In file basemodule.rb:

module BaseModule
  class BaseClass
    attr_accessor :name
    def initialize
      @name=""
    end
    def self.basem
      puts("This is class method basem")
    end
  end
end

Why do you have a module in this file and not in the other files?

In file derivedclass.rb

require 'basemodule'
class DerivedClass
  include BaseModule
  def self.output(obj)
    puts("The obj class : #{obj.class}")
    puts("The config info is: #{obj.config.inspect}")
  end
end

and in mainclass.rb

require 'basemodule'
require 'derivedclass'
class MainClass
  include BaseModule
  attr_accessor :config
  def initialize
    @config=BaseClass.new
  end
  def output
    DerivedClass.output(self)
  end
end

??

Now the above is very messy, so I obviously still need some guidance.

Yes, you should start with proper class names. That makes things much easier.

One thing I find particularly odious is that to get the above to work, I needed to make the DerivedClass methods class methods rather than instance methods.

You probably got inheritance and module inclusion mixed up.

Further enlightenment would be appreciated.

class Base
   def foo() end
end

class Derived < Base
   # works:
   def bar() foo() end
end

module Foo
   def forx() end
end

class Includer
   include Foo
   # works:
   def barx() forx() end
end

Your FDefMod.create_struct(file) will then instantiate those three classes while it goes along and those instances of your output generation classes will then create output files.

If you would want to classify this with a pattern name it's a bit like nested command object pattern, i.e. you have a major command and sub commands (i.e. one per output type).

That is what I was thinking to do. Sometimes, I just need the definition module, but not the I/O ones.

I've attached a file to describe what I mean. You can distribute that code to multiple files (you'll have to open and close the module once per file).

If you need to do some preprocessing common to all outputs, you might want to add a PreProcessor class that converts a Config into something else that stores the preprocessed state (and maybe also references the original config). You can then pass that on to individual generators.

Of course, this is said with the little we know about your code. There might be potential for further refactorings in the code you have (i.e. split up methods into several methods, identify common code / patterns and extract them into additional methods that are invoked in multiple places etc.).

That is exactly my plan. For each of the so-called "derived classes", the same procedure can be applied again. For example, if my f95 structure is named "MyStruct", then the f95 definition module that is created, MyStruct_Define.f90, would contain the public procedures,
  Associated_MyStruct: Check if all the pointer components are associated
  Create_MyStruct : Allocate the pointer components of the structure
  Destroy_MyStruct : Deallocate the pointer components
  Assign_MyStruct : Deep copy the structure (a simple assignment just copies
                       the pointer references, not the actual data)
  Equal_MyStruct : Determine if two structures are equal
  Info_MyStruct : Print out info on the structure dimensions

Are you talking about Fortran or Ruby code here ^^^^? I was talking about Ruby code *only*.

The ruby methods to create each individual f95 procedure could be put into its own class with one public instance method and a bunch of private methods (for formatting particular things in the output procedure.) I could have a separate file (class?) that handles the creation of each of the above procedures in the f95 module. Same for the i/o stuff. It might make unit testing each part easier too.

There are some methods that are used in several places to create parts of just the f95 definition module, just as there are common methods used to create parts of just the f95 I/O module; and then there are common methods used in formatting output for all three f95 modules.

Are you doing this in two steps, i.e. create Fortran modules and then format and output them? Assuming yes, this does not seem to make much sense to me - you can create the Fortran code formatted right from the start, can't you?

As I added functionality, I've been refactoring a lot - that's another reason I want to split the code into smaller bits: it should allow for much easier identification of the common parts (at least, I think so.)

Yep.

Btw, how many LOC are we talking about?

Hardly any. About 3K loc. In the future I think that can be reduced quite a bit since there is still a fair amount of boilerplate Fortran95 code in there that is just getting dumped via herefiles.

Sounds good, i.e. still early enough that it's feasible and you can try out variants.

Kind regards

  robert

sample.rb (1.02 KB)

···

On 08.01.2007 20:28, Paul van Delst wrote:

On 08.01.2007 16:50, Paul van Delst wrote:

Sorry, I made things more complex than necessary. The driver class in the first example was pretty useless. I've changed that to also be more consistent with the command pattern. I've also played a bit further to give you a better idea what I was thinking. Hope it helps.

Kind regards

  robert

sample.rb (2 KB)

···

On 09.01.2007 22:19, Robert Klemme wrote:

I've attached a file to describe what I mean. You can distribute that code to multiple files (you'll have to open and close the module once per file).

If you need to do some preprocessing common to all outputs, you might want to add a PreProcessor class that converts a Config into something else that stores the preprocessed state (and maybe also references the original config). You can then pass that on to individual generators.

Robert Klemme wrote:

Robert Klemme wrote:

[stuff snipped]

One thing I find particularly odious is that to get the above to work, I needed to make the DerivedClass methods class methods rather than instance methods.

You probably got inheritance and module inclusion mixed up.

Actually, I don't think I understand either too well. :o(

Your FDefMod.create_struct(file) will then instantiate those three classes while it goes along and those instances of your output generation classes will then create output files.

If you would want to classify this with a pattern name it's a bit like nested command object pattern, i.e. you have a major command and sub commands (i.e. one per output type).

That is what I was thinking to do. Sometimes, I just need the definition module, but not the I/O ones.

I've attached a file to describe what I mean. You can distribute that code to multiple files (you'll have to open and close the module once per file).

Wow. Thank you very much. I have been bumbling about trying different things the last couple of days but your sample code makes everything quite clear. And, it does things the way I was thinking about for future enhancements (mostly documentation related) - two proverbials with one stone. Excellent.

If you need to do some preprocessing common to all outputs, you might want to add a PreProcessor class that converts a Config into something else that stores the preprocessed state (and maybe also references the original config). You can then pass that on to individual generators.

Of course, this is said with the little we know about your code. There might be potential for further refactorings in the code you have (i.e. split up methods into several methods, identify common code / patterns and extract them into additional methods that are invoked in multiple places etc.).

That is exactly my plan. For each of the so-called "derived classes", the same procedure can be applied again. For example, if my f95 structure is named "MyStruct", then the f95 definition module that is created, MyStruct_Define.f90, would contain the public procedures,
  Associated_MyStruct: Check if all the pointer components are associated
  Create_MyStruct : Allocate the pointer components of the structure
  Destroy_MyStruct : Deallocate the pointer components
  Assign_MyStruct : Deep copy the structure (a simple assignment just copies
                       the pointer references, not the actual data)
  Equal_MyStruct : Determine if two structures are equal
  Info_MyStruct : Print out info on the structure dimensions

Are you talking about Fortran or Ruby code here ^^^^? I was talking about Ruby code *only*.

Me too; although the way I constructed my post it may seem like I've conflated the two. The ruby code creates the Fortran code. The stuff immediately above refers to the Fortran95 code that the ruby code generates. Fortran95 (an 11-year old standard) contains pointers, allocatables, and modules. Fortran2003 contains classes, inheritance, and C interop (although there are no complete f2003 compilers available yet).

The ruby methods to create each individual f95 procedure could be put into its own class with one public instance method and a bunch of private methods (for formatting particular things in the output procedure.) I could have a separate file (class?) that handles the creation of each of the above procedures in the f95 module. Same for the i/o stuff. It might make unit testing each part easier too.

There are some methods that are used in several places to create parts of just the f95 definition module, just as there are common methods used to create parts of just the f95 I/O module; and then there are common methods used in formatting output for all three f95 modules.

Are you doing this in two steps, i.e. create Fortran modules and then format and output them? Assuming yes, this does not seem to make much sense to me - you can create the Fortran code formatted right from the start, can't you?

The latter is what I am doing. I use string interpolation (I think that's the right term) within herefiles for the dynamic bits..

As I added functionality, I've been refactoring a lot - that's another reason I want to split the code into smaller bits: it should allow for much easier identification of the common parts (at least, I think so.)

Yep.

Btw, how many LOC are we talking about?

Hardly any. About 3K loc. In the future I think that can be reduced quite a bit since there is still a fair amount of boilerplate Fortran95 code in there that is just getting dumped via herefiles.

Sounds good, i.e. still early enough that it's feasible and you can try out variants.

Once again, thanks very much for the sample file. I'm going to be busy this weekend.... :o)

cheers,

paulv

···

On 08.01.2007 20:28, Paul van Delst wrote:

On 08.01.2007 16:50, Paul van Delst wrote:

--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx