15 June 2012 1 Comment

Viz Updates – Metallica & Bigfoot refreshed! Plus, “The Data Box”…

Did a data refresh on 2 of my most the most popular visualizations last night - Metallica has played about 15 shows since I last ran the data-gathering routines, and Bigfoot has been reported another 40+ times! Holy cats!

In honor of these new bigfoot sightings, I've modified the viz slightly - now anytime I refresh it, the newest sighting submissions will appear in green, in stark contrast to the older red sightings, just in case you wanted to know if the Squatch is trending UP or DOWN in your neck of the woods.

     Public Service Announcement: Washington State, look out! Its prime season!

     Another 15 shows in the books, and next weekend.. Orion Fest in Atlanic City.

In other news, announcing... "DATA BOX"

Based on the feedback I've gotten on my many little "data harvest" scripts on this site (each of varying complexity and "hacky-ness"), I've decided to build a completely ready-to-run "turn-key" VMimage of all my data harvesting scripts (now and in the future). You'll literally be able to download it, turn it on (using the free VMware Player app) customize your keywords (for Twitter and Facebook at least) and begin pulling sweet sweet data.

Very early shot of the dev version

Already have 4 streams up and some backend infrastructure done

I should have an early version released here next week, just trying to get all the pieces together and working decently first.

Initial "Data Harvests" Included

Twitter Public

Just like I talk about in this script here, except with many updates logic-wise. Plug in a bunch of keywords to a config file or web interface (yet to be decided), and let you database fill up. Getcha Pull!

Facebook Public

Same as Twitter, except Facebook updates. Be they statuses, links, videos, or whatever - if they match your keywords, they're coming to you.

MLB GameDay

An old favorite, pitch-by-pitch data feeds from MLB. Keep up with the latest 2012 games or go back a few years and harvest 2010. It's a LOT of data, all yours for the taking. Easy as pie.


[Historical] Pretty much every cumulative stat there is. Aggregated at the year level, active players and past players.


[Daily] Different spin: All current players stats, timestamped stats as of "that day", so after awhile you'll be able to see the trending of batting averages, ERAs, etc.


What else? Bigfoot sightings? UFO sightings? Video Games? Comic Books? Soccer? If its out there we can harvest and report on it. Suggestions?

We're in a reporting / analytics goldrush right now. Great tools are out there, people just need INTERESTING data. Let's make it happen.

Initial Features Planned:

  • Full data browsing / data export capability via web interface
  • High-level data dashboard and monitoring of pulled data
  • Query MySQL from your host machine
  • OData Connectivity to the data (for Tableau Public!)
  • "Updatability" ;) (so I can push NEW data streams, fixes / changes)
  • (possible) Ducksboard.com hooks (for sexier data monitoring)
  • (possible) Pentaho? (open-source reporting / analytics)

We shall see. I'm trying to make it a 'data harvest' box only - and leave the reporting up to you, but we shall see how it works out. Maybe a built-in reporting solution makes sense.

Feedback? Give it! I'm excited to get this off the ground and I'll have more updates soon.

  • Kovner

    Hey Ryan, any updates on the Harvest Box?