Quick & Dirty Address Geocoding and Formatting with Google Maps API
Howdy. Cheers to all ya'll down there in internet land. I got in a conversation earlier today regarding geocoding addresses in data-sets - it's a pretty common thing, and I've done it numerous times for a WIDE variety of data-points (Bigfoot, Sex Offenders, Concert Venues, etc.), so I figured hell, I'll clean it up and offer it to the Google gods. Maybe someone will find it useful.
"Why aren't we flying? Because getting there is half the fun. You know that."
The script is pretty simple, but it usually gets the job done with minimal modification. I tend to use a small dimension table for geo lat / long data - because generally the coordinates of an address won't change, so there is no reason to RE-Geocode an address (in most cases).
In this example, I'm using a SQL Server connection - but the query usage are so basic that any other could be used just by loading a different module and changing the connection syntax a bit (SQLite, Oracle, MYSQL, etc).
You'll need to download and install this simple Google Maps API wrapper first.
import pymssql # or MySQLdb or etc.
# sql server example
db = pymssql.connect(host='localhost', user='sa', password='trolololo')
cursor = db.cursor()
geocoder = GoogleGeocoderClient(False)
# your source SQL in this example I'm expect at least 2 fields, the first a unique id for that address (for updating), the next a string of the address
cursor.execute("select 12341 as myid, '101 s presidents st baltimore MD USA' as address ") # the hotel I wrote this in
addy_queue = cursor.fetchall()
for curraddy in addy_queue:
addy_id = str(curraddy[0]) # your unique id
addy = str(curraddy[1]) # your location
print '****************************ADDY IN*********************************'
print ' (' + str(addy_id) + ') ' + addy
try:
result = geocoder.geocode(addy)
# result.is_success() # nice little boolean if you need it
googaddy = result.get_formatted_address().replace("'","")
coord = result.get_location()
latitude = coord[0]
longitude = coord[1]
location_type = result.get_location_type() # just for fun
print '****************************ADDY OUT*********************************'
print str(latitude) + ', ' + str(longitude) + ' (' + location_type + ')'
print googaddy # Google Maps formatted address
print ' '
# this is where you're updating the geo coords BACK to your data, or inserting them in a sep table
cursor.execute("delete from homes.dbo.my_geo where id = '" +str(addy_id)+"' ")
cursor.execute("insert into homes.dbo.my_geo (lat, long, id, clean_address) values ('"+str(latitude)+"', '"+str(longitude)+"', '" +str(addy_id)+"', '"+str(googaddy)+"' ) ")
db.commit()
except:
pass
# in case of error: do something (optional) I'm sure you're going to have some bad addys, just address them later, or flag them here
#time.sleep(.2) # just in case you dont want to hammer Google's servers, but I say - fuck 'em
Feel free to give me a shout!