Building a data structure - Hashes of Hashes of Hashes of Arrays

Good morning everyone,

This week at work I had the challenge of parsing a specific file format
that contains IP ranges categorized by different Sites, Areas and
Regions.
Basically I needed a script to load all this location information into a
data structure that would allow me an easy way of obtaining all the IPs
of sites, areas or regions for a later transformation.

Required data structure:
data[Region][Area][Site] -> IPs
       Hash Hash Hash Array

I would like to know if the function “processLocations” could be
optimized or if there exists a simpler way of achieving the desired data
structure.

Current working copy:

···

-------------------------------------------------------
require 'pp'

# Function that processes the content of the locations file and returns
the following structure:
#
# data[Region][Area][Site] -> IPs
# Hash Hash Hash Array
#
def processLocations (lines)
  sites = Hash.new{|h, k| h[k] = []} # HashOFArray
  area = Hash.new{|h,k| h[k]=Hash.new(&h.default_proc)} # HashOFHash
  region = Hash.new{|h,k| h[k]=Hash.new(&h.default_proc)} # HashOFHash

  lines.each do |line|
    next if lines =~ /^#.*/

    # Process IPs range section
    if line =~ /(.*)=([\d|\-|\.]+)/
      #puts "IP: #{$1} - #{$2}"
      sites[$1.chomp.capitalize] << $2
    end

    # Process area section
    if line =~ /(.*)\.area=(.*)/i
      #puts "Area: #{$1} - #{$2}"
      if sites.has_key?($1.chomp.capitalize)
        #puts "A: #{$2.chomp.capitalize} - #{$1.chomp.capitalize} -
#{sites.class} - #{sites.keys} - #{sites[$1.chomp.capitalize].class}"

        if (area.has_key?($2.chomp.capitalize) &
area[$2.chomp.capitalize].has_key?($1.chomp.capitalize))
          # The hash exists
          #puts "Adding to an existing hash key more IPs elements to the
array"
          area[$2.chomp.capitalize][$1.chomp.capitalize] <<
sites[$1.chomp.capitalize]
       else
          # The hash does not exist
          #puts "Adding new hash key with new array"
          area[$2.chomp.capitalize][$1.chomp.capitalize] =
sites[$1.chomp.capitalize]
        end

        # Clean site hash
        sites = Hash.new{|h, k| h[k] = []} # HashOFArray
      end
    end

    # Process region section
    if line =~ /(.*)\.region=(.*)/i
      #puts "Region: #{$1} - #{$2}"
      if area.has_key?($1.chomp.capitalize)
        #puts "R: #{$2.chomp.capitalize} - #{$1.chomp.capitalize} -
#{area.class} - #{area.keys} - #{area[$1.chomp.capitalize].class} -
#{area[$1.chomp.capitalize].keys}"
        tmp = Hash.new
        tmp = area.dup

        region[$2.chomp.capitalize][$1.chomp.capitalize] =
tmp[$1.chomp.capitalize]
      end
    end
  end
  return region
end

##############
# MAIN

f = File.open(DATA)
lines = f.readlines
f.close
data = processLocations(lines)

puts "+data---------------------------------------------------------"
pp data

puts "+data['Asia']-------------------------------------------------"
pp data['Asia']

puts "+data['Asia']['Australia']------------------------------------"
pp data['Asia']['Australia']

puts "+data['Europe-middle east-africa']['France']['Paris']---------"
pp data['Europe-middle east-africa']['France']['Paris']

__END__
Alexandria (ALH)=192.168.6.0-192.168.6.127
Alexandria (ALH).area=Australia
Australia.region=Asia

Altona=192.168.1.192-192.168.1.255
Altona=192.168.2.192-192.168.2.255
Altona.area=Australia

TOKYO VPN=192.168.3.192-192.168.3.255
TOKYO VPN.area=JAPAN
JAPAN.region=Asia

Paris=192.168.4.192-192.168.4.255
Paris.area=France

Rennes=192.168.5.192-192.168.5.255
Rennes.area=France
France.region=EUROPE-MIDDLE EAST-AFRICA
-------------------------------------------------------

Example output:

# ruby ruby_help.rb
+data---------------------------------------------------------
{"Asia"=>
  {"Australia"=>
    {"Alexandria (alh)"=>["192.168.6.0-192.168.6.127"],
     "Altona"=>["192.168.1.192-192.168.1.255",
"192.168.2.192-192.168.2.255"]},
   "Japan"=>{"Tokyo vpn"=>["192.168.3.192-192.168.3.255"]}},
"Europe-middle east-africa"=>
  {"France"=>
    {"Paris"=>["192.168.4.192-192.168.4.255"],
     "Rennes"=>["192.168.5.192-192.168.5.255"]}}}
+data['Asia']-------------------------------------------------
{"Australia"=>
  {"Alexandria (alh)"=>["192.168.6.0-192.168.6.127"],
   "Altona"=>["192.168.1.192-192.168.1.255",
"192.168.2.192-192.168.2.255"]},
"Japan"=>{"Tokyo vpn"=>["192.168.3.192-192.168.3.255"]}}
+data['Asia']['Australia']------------------------------------
{"Alexandria (alh)"=>["192.168.6.0-192.168.6.127"],
"Altona"=>["192.168.1.192-192.168.1.255",
"192.168.2.192-192.168.2.255"]}
+data['Europe-middle east-africa']['France']['Paris']---------
["192.168.4.192-192.168.4.255"]
-------------------------------------------------------

Regards and thanks in advance for any suggestions,
Sebastian YEPES

--
Posted via http://www.ruby-forum.com/.

Hi there.

Hard to read all your text... Consider to write enough... but not all.

You have to declare your variable in this way:

a = Hash.new{|h, k| h[k] = Hash.new(&h.default_proc)}

So, for example:

a = Hash.new{|h, k| h[k] = Hash.new(&h.default_proc)}

=> {}

b = [1,2,3,4,5]

=> [1, 2, 3, 4, 5]

a["2011"]["november"]["tuesday"] = b

=> [1, 2, 3, 4, 5]

puts a["2011"]["november"]["tuesday"]

1
2
3
4
5
=> nil

puts a

{"2011"=>{"november"=>{"tuesday"=>[1, 2, 3, 4, 5]}}}
=> nil

···

--
Posted via http://www.ruby-forum.com/\.

I prefer a more readable approach:

class NestedHash < Hash
  def initialize
    super { |h,k| h[k] = NestedHash.new }
  end
end

a = NestedHash.new

···

On Nov 8, 2011, at 13:48 , Cassna Capriet wrote:

You have to declare your variable in this way:

a = Hash.new{|h, k| h[k] = Hash.new(&h.default_proc)}