Need Help with Split

I have a list of records that need to be split between the address and
the city. Here is some of the data:

</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT

If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).

If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.

example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?

thanks

atomic

···

--
Posted via http://www.ruby-forum.com/.

DATA.each{|s|
  city = s.reverse[ /^.*?,.*?[[:upper:]](?=[\d[:alpha:]])/m ].reverse
  street = s[0, s.size - city.size]
  puts street
  puts city
}

__END__
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT

···

On Dec 14, 1:05 pm, "A. Mcbomb" <atomicmcb...@gmail.com> wrote:

I have a list of records that need to be split between the address and
the city. Here is some of the data:

</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT

If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).

If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.

example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?

thanks

atomic

--
Posted viahttp://www.ruby-forum.com/.

/tmp$ cat i.rb
s = <<END
  </a>-16 Bonner StreetHartford,-CT
  </a>-450 Main StreetHartford,-CT
  </a>-812 Farmington AvenueWest Hartford,-CT
  </a>-25 Forest Street No. 18Stamford,-CT
  </a>-25 Forest Street No. 6AStamford,-CT
  </a>-1450 Main StreetBridgeport,-CT
END

require 'pp'

pp s.scan(/-(.*?[a-zA-Z\d])([A-Z][a-z].*)/)

/tmp$ ruby i.rb
[["16 Bonner Street", "Hartford,-CT"],
["450 Main Street", "Hartford,-CT"],
["812 Farmington Avenue", "West Hartford,-CT"],
["25 Forest Street No. 18", "Stamford,-CT"],
["25 Forest Street No. 6A", "Stamford,-CT"],
["1450 Main Street", "Bridgeport,-CT"]]

···

On Tue, Dec 14, 2010 at 2:05 PM, A. Mcbomb <atomicmcbomb@gmail.com> wrote:

I have a list of records that need to be split between the address and
the city. Here is some of the data:

If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.