10 December 2012 5 Comments

Quick & Dirty Address Geocoding and Formatting with Google Maps API

Why aren't we flying? Because getting there is half the fun. You know that.

Howdy. Cheers to all ya'll down there in internet land. I got in a conversation earlier today regarding geocoding addresses in data-sets - it's a pretty common thing, and I've done it numerous times for a WIDE variety of data-points (Bigfoot, Sex Offenders, Concert Venues, etc.), so I figured hell, I'll clean it up and offer it to the Google gods. Maybe someone will find it useful.

"Why aren't we flying? Because getting there is half the fun. You know that."

The script is pretty simple, but it usually gets the job done with minimal modification. I tend to use a small dimension table for geo lat / long data - because generally the coordinates of an address won't change, so there is no reason to RE-Geocode an address (in most cases).

In this example, I'm using a SQL Server connection - but the query usage are so basic that any other could be used just by loading a different module and changing the connection syntax a bit (SQLite, Oracle, MYSQL, etc).

You'll need to download and install this simple Google Maps API wrapper first.

from geocode.google import GoogleGeocoderClient
import pymssql # or MySQLdb or etc.

# sql server example
db = pymssql.connect(host='localhost', user='sa', password='trolololo')
cursor = db.cursor()

geocoder = GoogleGeocoderClient(False)


# your source SQL in this example I'm expect at least 2 fields, the first a unique id for that address (for updating), the next a string of the address
cursor.execute("select 12341 as myid, '101 s presidents st baltimore MD USA' as address ") # the hotel I wrote this in
addy_queue = cursor.fetchall()

for curraddy in addy_queue:

    addy_id = str(curraddy[0]) # your unique id
    addy = str(curraddy[1]) # your location

    print '****************************ADDY IN*********************************'
    print ' (' + str(addy_id) + ') ' + addy

    try:   
        result = geocoder.geocode(addy)
        # result.is_success() # nice little boolean if you need it

        googaddy = result.get_formatted_address().replace("'","")

        coord = result.get_location()
        latitude = coord[0]
        longitude = coord[1]
        location_type = result.get_location_type() # just for fun

        print '****************************ADDY OUT*********************************'
        print str(latitude) + ', ' + str(longitude) + ' (' + location_type + ')'
        print googaddy # Google Maps formatted address
        print ' '

        # this is where you're updating the geo coords BACK to your data, or inserting them in a sep table
        cursor.execute("delete from homes.dbo.my_geo where id = '" +str(addy_id)+"' ")
        cursor.execute("insert into homes.dbo.my_geo (lat, long, id, clean_address) values ('"+str(latitude)+"', '"+str(longitude)+"', '" +str(addy_id)+"', '"+str(googaddy)+"' ) ")
        db.commit()

    except:
        pass
        # in case of error: do something (optional) I'm sure you're going to have some bad addys, just address them later, or flag them here

    #time.sleep(.2) # just in case you dont want to hammer Google's servers, but I say - fuck 'em
FYI - Last I heard, the Google API limits geocoding to 2,500 requests within 24 hours, after that you'll just get error responses.

Feel free to give me a shout!