I have a list of records that need to be split between the address and
the city. Here is some of the data:
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).
If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.
example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?
I have a list of records that need to be split between the address and
the city. Here is some of the data:
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).
If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.
example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?
/tmp$ cat i.rb
s = <<END
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
END
require 'pp'
pp s.scan(/-(.*?[a-zA-Z\d])([A-Z][a-z].*)/)
/tmp$ ruby i.rb
[["16 Bonner Street", "Hartford,-CT"],
["450 Main Street", "Hartford,-CT"],
["812 Farmington Avenue", "West Hartford,-CT"],
["25 Forest Street No. 18", "Stamford,-CT"],
["25 Forest Street No. 6A", "Stamford,-CT"],
["1450 Main Street", "Bridgeport,-CT"]]