15 March 2012 4 Comments

Visualizing Bigfoot of the United States

A fun little Viz I did about Bigfoot sightings.


Check it out here.

I guess you could file this under "Too Much Time on my Hands", but hey, people said that about my Metallica shit too, and that was pure genius. :)

Basically, I scraped all the sighting info off of The Bigfoot Field Research Organization site with Python and the Scrapemark module - inserted it into a local SQL Server database, cleaned it up (good lord, what a f*cking mess of data in some places) and threw up a pretty nice looking Viz using Tableau Public.

I'm pretty happy with it - its quite interactive and hopefully someone finds it fun or interesting.

Not too shabby for a couple hours worth of work that was spawned from sitting on the couch watch a re-run of "Finding Bigfoot" on the Animal Planet - after a few beers - and laughing to myself,

"Hah, it'd be funny to map out some freaking Bigfoot sightings"

On another note - after doing my research and harvesting the info... I wish that they'd re-vamp their site and not only display this "data" in a more visual / analytical way - BUT, more importantly, they should be collecting this data in a better way - validating fields, trying to make inferences and not just allowing anything to fly. Maybe its because I'm a data guy - and a bit of a purist at times, but still. It would be cool.

Oh well.

I shouldn't stress over the data compilation of (possibly) mythical creatures. :)

     If anyone cares about the Python script I used to "extract" this data, check it out below.
     It is the DEFINITION of 'quick and dirty', so be warned - I take no responsibility for anything.
from time import clock
import time
clock()

import pprint, time
import urllib, os, string, sys
from scrapemark import scrape
import pymssql

mssql_db = pymssql.connect(host='localhost', user='sa', password='#####')
mssql_cursor = mssql_db.cursor()


mssql_cursor.execute("truncate table BigFootData.dbo.Sightings")
mssql_db.commit()


f = urllib.urlopen("http://www.bfro.net/gdb/")  ##http://www.bfro.net
html = f.read()
f.close()

feetmaster = scrape("""
{*
<td class="cs"><b><a href="{{ [index].statepage_link }}">{{ [index].statename }}</a></b></td>
*}
"""
,html)

pp = pprint.PrettyPrinter(indent=4)
#pp.pprint(feetmaster)

for item in feetmaster:
        for i in feetmaster[item]:

                print 'Now working on',
                state_name = i['statename']
                print state_name
                state_url = 'http://www.bfro.net'+i['statepage_link']

                time.sleep(1)

                try:
                        cf = urllib.urlopen(state_url)
                except:
                        time.sleep(30)
                        cf = urllib.urlopen(state_url)

                chtml = cf.read()
                cf.close()
               
                feetcounty = scrape("""
                                        {*
                                        <TD class=cs><B><a href="{{ [county].county_link }}">{{ [county].countyname }}</a></B></td>
                                        *}
                                        """
,chtml)

                for citem in feetcounty:
                        for c in feetcounty[citem]:

                                county_name = c['countyname']
                                county_url = 'http://www.bfro.net/GDB/'+c['county_link']
                                print '         '+c['countyname']
                                #print '         '+county_url

                                #time.sleep(1)

                                try:
                                        sf = urllib.urlopen(county_url)
                                except:
                                        time.sleep(30)
                                        sf = urllib.urlopen(county_url)

                                shtml = sf.read()
                                sf.close()

                                feetsighting = scrape("""
                                        {*<span class=reportcaption><B><a href="{{ [sighting].sighting_link }}">{{ [sighting].sighting_date }}</a></b>*}
                                        """
,shtml)

                                #pp.pprint(feetsighting)

                                for sitem in feetsighting:
                                        for s in feetsighting[sitem]:

                                               

                                                sighting_url = 'http://www.bfro.net/GDB/'+s['sighting_link']+'&PrinterFriendly=True'
                                                print '               '+s['sighting_link']

                                                try:
                                                        dsf = urllib.urlopen(sighting_url)
                                                except:
                                                        time.sleep(30)
                                                        dsf = urllib.urlopen(sighting_url)

                                                dshtml = dsf.read()
                                                dsf.close()

                                                dshtml = dshtml.decode("utf-8","ignore")
                                                dshtml = dshtml.replace("'","''").replace('&nbsp;',' ')

                                                feetdetailsighting = scrape("""
                                                        {* <span class=reportheader>Report # {{ [s].report_number }}</span>  <span class=reportclassification>({{ [s].class }})</span> *}
                                                        {* <BR> <span class=field>{{ [s].submitted }}</span> <hr size="5" noshade> *}
                                                        {* <hr size="5" noshade> <span class=field>{{ [s].title }}</span> <hr size="5" noshade> *}
                                                        {* <p><span class=field>YEAR:</span>{{ [s].year }}</p> *}
                                                        {* <p><span class=field>SEASON:</span>{{ [s].season }}</p> *}
                                                        {* <p><span class=field>MONTH:</span>{{ [s].month }}</p> *}
                                                        {* <p><span class=field>DATE:</span>{{ [s].date }}</p> *}
                                                        {* <p><span class=field>STATE:</span>{{ [s].state }}</p> *}
                                                        {* <p><span class=field>COUNTY:</span>{{ [s].county }}</p> *}
                                                        {* <p><span class=field>LOCATION DETAILS:</span>{{ [s].location_details }}</p> *}
                                                        {* <p><span class=field>NEAREST TOWN:</span>{{ [s].nearest_town }}</p> *}
                                                        {* <p><span class=field>NEAREST ROAD:</span>{{ [s].nearest_road }}</p> *}
                                                        {* <p><span class=field>OBSERVED:</span>{{ [s].observed|html }}</p> *}
                                                        {* <p><span class=field>ALSO NOTICED:</span>{{ [s].also_noticed }}</p> *}
                                                        {* <p><span class=field>OTHER WITNESSES:</span>{{ [s].other_witnesses }}</p> *}
                                                        {* <p><span class=field>OTHER STORIES:</span>{{ [s].other_stories }}</p> *}
                                                        {* <p><span class=field>TIME AND CONDITIONS:</span>{{ [s].time_and_conditions }}</p> *}
                                                        {* <p><span class=field>ENVIRONMENT:</span>{{ [s].environment }}</p> *}
                                                                        """
,dshtml)

                                                #pp.pprint(feetdetailsighting)   #

                                                for e in feetdetailsighting:
                                                        for t in feetdetailsighting[e]:

                                                                #print '               '+t['year']
                                                                #print '               '+t['month']

                                                                try:
                                                                        mssql_cursor.execute("""insert into BigFootData.dbo.Sightings
                                (bfroid, submitted, title, class, [month], [date], [year], season, state, county, locationdetails, nearesttown,
                                nearestroad, observed, alsonoticed, otherwitnesses, otherstories, timeandconditions, environment, URL)
                                                        values
                                (rtrim(ltrim(%s)), %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s
                )"""
,(t['report_number'], t['submitted'], t['title'], t['class'], t['month'], t['date'], t['year'], t['season'], t['state'], t['county'], t['location_details'], t['nearest_town'], t['nearest_road'], t['observed'], t['also_noticed'], t['other_witnesses'], t['other_stories'], t['time_and_conditions'], t['environment'],sighting_url))

                                                                        mssql_db.commit()

                                                                except:

                                                                        print 'ERROR'
                                                                        print sys.exc_info()[0]
                                                                        print sys.exc_info()[1]
                                                                        errorz = str(sys.exc_info()[0]) + ' - ' + str(sys.exc_info()[1])
                                                                        errorz = errorz.replace("'","''")
                                                                        mssql_cursor.execute("""insert into BigFootData.dbo.Sightings
                                                                                                (state, county, month, season, year, locationdetails, URL)
                                                                                                values (%s,%s,%s,%s,%s,'ERROR """
+errorz+"""',%s)""",(t['state'], t['county'], t['month'], t['season'], t['year'],sighting_url))
                                                                        mssql_db.commit()


end_script = clock()
mins = end_script / 60
print "Finished: %i minutes" % mins

It took about an hour to run, looping through all the pages and sub-pages until you are scraping the main detail page - I DID have some time.sleep(2) commands sprinkled in there to not overwhelm their website, but, as it turns out, their site is so slow it didn't really matter. Check out the Bigfoot Viz and enjoy!

STILL, nowhere near as cool as my Metallica visualizations, but hey, I'm wicked biased on that one. :)
  • Anon

    I ran into yout stuff and now lost sleep. Awesome work. You’re my idol.

    JC

  • Mike

    Outstanding!  I’m sharing on http://www.squatchwatch.com

    • http://ryrobes.com/ Ryan Robitaille

      Hey, Thanks Mike!

  • Pallas

    Fabulously epic data crunching! What would be interesting is to overlay this with native vegetation communities data, distance from human habitation, water resources, altitude, temperature records, rainfall, etc., etc. and beging to get a feel for the ecological envleope of b’foot. Also, does it tend to co-occur with any known species (I don’t live in US, but guessing that bird community data is very good indeed). Finally, how nicely does it correlate with black bear records?!