I have a list of records that need to be split between the address and
the city. Here is some of the data:
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).
If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.
example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?
DATA.each{|s|
city = s.reverse[ /^.*?,.*?[[:upper:]](?=[\d[:alpha:]])/m ].reverse
street = s[0, s.size - city.size]
puts street
puts city
}
__END__
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
···
On Dec 14, 1:05 pm, "A. Mcbomb" <atomicmcb...@gmail.com> wrote:
I have a list of records that need to be split between the address and
the city. Here is some of the data:
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
If you notice, the address butts directly up against the city name and
the only thing that is consistant it that the city always starts with a
capital letter (but can be more than one word).
If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.
example: StreetHartford => if I could split between the lower case t and
the upper case H that are directly next to each other?
/tmp$ cat i.rb
s = <<END
</a>-16 Bonner StreetHartford,-CT
</a>-450 Main StreetHartford,-CT
</a>-812 Farmington AvenueWest Hartford,-CT
</a>-25 Forest Street No. 18Stamford,-CT
</a>-25 Forest Street No. 6AStamford,-CT
</a>-1450 Main StreetBridgeport,-CT
END
require 'pp'
pp s.scan(/-(.*?[a-zA-Z\d])([A-Z][a-z].*)/)
/tmp$ ruby i.rb
[["16 Bonner Street", "Hartford,-CT"],
["450 Main Street", "Hartford,-CT"],
["812 Farmington Avenue", "West Hartford,-CT"],
["25 Forest Street No. 18", "Stamford,-CT"],
["25 Forest Street No. 6A", "Stamford,-CT"],
["1450 Main Street", "Bridgeport,-CT"]]
···
On Tue, Dec 14, 2010 at 2:05 PM, A. Mcbomb <atomicmcbomb@gmail.com> wrote:
I have a list of records that need to be split between the address and
the city. Here is some of the data:
If I could find a way to split where a lower case letter butts directly
against an Upper case letter, that might be a good start.