<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Using a simple Python script for End-to-End Data Transformation and ETL (Part 1)</title>
	<atom:link href="http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/</link>
	<description>Site Fabricator, Data Welder, Heavy Metal Systems Integrator &#124; I.Make.Shit.Work.</description>
	<lastBuildDate>Wed, 18 Aug 2010 16:18:52 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: How to run your Python scripts as a Windows Service &#124; Ryan Robitaille</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-311</link>
		<dc:creator>How to run your Python scripts as a Windows Service &#124; Ryan Robitaille</dc:creator>
		<pubDate>Tue, 17 Aug 2010 23:53:34 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-311</guid>
		<description>[...] any Python duct-taper integrate-anything junkie like me has a need to schedule their things (in production) every once in awhile. Usually this is [...]</description>
		<content:encoded><![CDATA[<p>[...] any Python duct-taper integrate-anything junkie like me has a need to schedule their things (in production) every once in awhile. Usually this is [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Business Intelligence, Tools, Dirty Caveman Sex, Open-Source - Part 1</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-212</link>
		<dc:creator>Business Intelligence, Tools, Dirty Caveman Sex, Open-Source - Part 1</dc:creator>
		<pubDate>Tue, 10 Aug 2010 05:15:48 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-212</guid>
		<description>[...] done it several times before using some custom (and clever &#8211; if I do say so myself) PHP, Python, and a variety of databases &#8211; but that was after I already had an intimate familiarity with [...]</description>
		<content:encoded><![CDATA[<p>[...] done it several times before using some custom (and clever &#8211; if I do say so myself) PHP, Python, and a variety of databases &#8211; but that was after I already had an intimate familiarity with [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ry</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-130</link>
		<dc:creator>Ry</dc:creator>
		<pubDate>Tue, 04 May 2010 03:03:17 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-130</guid>
		<description>Hey Jorge, thanks for the comments!

I guess I&#039;ve just been spoiled over the past few years since I&#039;d create the datasource AND do the report afterward. :)

I&#039;ve also def warmed up to the traditional DW methods for a few larger projects recently. I guess sometimes we just get so used to doing everything &quot;one way&quot;, new (well, old) concepts seem like more effort than payoff.

Also, I LOVE how I can deploy some well-thought-out Analysis Service Cubes to my end-users, then they can manipulate them as pivot tables in Excel on their own. It saves me the trouble of writing several specific reports off the same data... (you know, the &quot;classic&quot; way)

:)

Cheers!</description>
		<content:encoded><![CDATA[<p>Hey Jorge, thanks for the comments!</p>
<p>I guess I&#8217;ve just been spoiled over the past few years since I&#8217;d create the datasource AND do the report afterward. :)</p>
<p>I&#8217;ve also def warmed up to the traditional DW methods for a few larger projects recently. I guess sometimes we just get so used to doing everything &#8220;one way&#8221;, new (well, old) concepts seem like more effort than payoff.</p>
<p>Also, I LOVE how I can deploy some well-thought-out Analysis Service Cubes to my end-users, then they can manipulate them as pivot tables in Excel on their own. It saves me the trouble of writing several specific reports off the same data&#8230; (you know, the &#8220;classic&#8221; way)</p>
<p>:)</p>
<p>Cheers!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jorge</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-129</link>
		<dc:creator>Jorge</dc:creator>
		<pubDate>Fri, 30 Apr 2010 18:36:10 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-129</guid>
		<description>I know this is an old post.

I come from the end of the spectrum of being Report Developer - I found that creating reports and writing SQL off of a set of tables following Kimbals methods - star schema,  as a positive experience!

Whats great about it is that it is a universal structure.  When I have encountered a star schema - I instantly know what tables I need to join and have hit the ground running - whereas compared to a table structure someone has invented - I always have to go to the inventor and ask how these tables are joined, whats the granularity and what not.

Also I found working with a cube as being positve thing to.  Writing a report thats crunching and aggregating year over year sales data coming from 2,500 plus stores is extremely fast (milleseconds) compared to the same SQL.</description>
		<content:encoded><![CDATA[<p>I know this is an old post.</p>
<p>I come from the end of the spectrum of being Report Developer &#8211; I found that creating reports and writing SQL off of a set of tables following Kimbals methods &#8211; star schema,  as a positive experience!</p>
<p>Whats great about it is that it is a universal structure.  When I have encountered a star schema &#8211; I instantly know what tables I need to join and have hit the ground running &#8211; whereas compared to a table structure someone has invented &#8211; I always have to go to the inventor and ask how these tables are joined, whats the granularity and what not.</p>
<p>Also I found working with a cube as being positve thing to.  Writing a report thats crunching and aggregating year over year sales data coming from 2,500 plus stores is extremely fast (milleseconds) compared to the same SQL.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Day</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-123</link>
		<dc:creator>Tim Day</dc:creator>
		<pubDate>Fri, 26 Feb 2010 06:32:45 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-123</guid>
		<description>Thanks a lot for the post.  I used pyodbc for connecting to an msaccess db and dragging things into oracle.  I really enjoyed not having to touch access.  I wonder if there is a way to get the table and column names so that we can also build the required target tables automatically?</description>
		<content:encoded><![CDATA[<p>Thanks a lot for the post.  I used pyodbc for connecting to an msaccess db and dragging things into oracle.  I really enjoyed not having to touch access.  I wonder if there is a way to get the table and column names so that we can also build the required target tables automatically?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Using XLWT and Python to export an Oracle dataset to Excel (Python Simple ETL Part 2) &#124; Ryan Robitaille</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-81</link>
		<dc:creator>Using XLWT and Python to export an Oracle dataset to Excel (Python Simple ETL Part 2) &#124; Ryan Robitaille</dc:creator>
		<pubDate>Tue, 08 Dec 2009 04:31:26 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-81</guid>
		<description>[...] are part of the standard Python distriution (I&#8217;m using 2.6 in this example), cx_Oracle I already discussed HERE, and xlwt can be found HERE (http://pypi.python.org/pypi/xlwt) grab the packages for your platform [...]</description>
		<content:encoded><![CDATA[<p>[...] are part of the standard Python distriution (I&#8217;m using 2.6 in this example), cx_Oracle I already discussed HERE, and xlwt can be found HERE (<a href="http://pypi.python.org/pypi/xlwt" rel="nofollow">http://pypi.python.org/pypi/xlwt</a>) grab the packages for your platform [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony Schmidt</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-77</link>
		<dc:creator>Tony Schmidt</dc:creator>
		<pubDate>Wed, 14 Oct 2009 15:24:28 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-77</guid>
		<description>Hi, Ry.  Thanks for your reply.

Yeah, I think you&#039;ve got the gist of my problem.

So, I had been thinking about using an ETL tool (such as Talend) - but some Python people had discouraged me from that, saying that ETL tools were for people who didn&#039;t know how to program, they came with their own product-specific learning curve, and they weren&#039;t necessarily more rapid than just writing your own scripts at the end of the day.

So what do you think of using an ETL tool vs. writing your own scripts (like you demo in this article).

Currently I&#039;m leaning toward building what I&#039;m calling a &quot;pseudo-datawarehouse&quot;, forgetting about Inman and Kimball, and not looking for a cross-vendor join solution.</description>
		<content:encoded><![CDATA[<p>Hi, Ry.  Thanks for your reply.</p>
<p>Yeah, I think you&#8217;ve got the gist of my problem.</p>
<p>So, I had been thinking about using an ETL tool (such as Talend) &#8211; but some Python people had discouraged me from that, saying that ETL tools were for people who didn&#8217;t know how to program, they came with their own product-specific learning curve, and they weren&#8217;t necessarily more rapid than just writing your own scripts at the end of the day.</p>
<p>So what do you think of using an ETL tool vs. writing your own scripts (like you demo in this article).</p>
<p>Currently I&#8217;m leaning toward building what I&#8217;m calling a &#8220;pseudo-datawarehouse&#8221;, forgetting about Inman and Kimball, and not looking for a cross-vendor join solution.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ry</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-76</link>
		<dc:creator>Ry</dc:creator>
		<pubDate>Wed, 30 Sep 2009 01:42:03 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-76</guid>
		<description>Hey Tony!

I guess it depends on what you&#039;re trying to do exactly. If your data is so spread out all over the place, it probably should all run from a main location (all in the same DB) - and then be pushed back out the the old locations on a scheduled basis (if they even need to exist anymore).

I&#039;ve gone down the rabbit hole on building data warehouses (and datamarts) and data cubes to report on and all that fancy &quot;best practice&quot; stuff... but I honestly think that for a lot of implementations its total overkill. That&#039;s just my humble opinion.

You could look at a more &quot;Access friendly&quot; ETL tool like SQL Server SSIS, but its costs a pretty penny. If you can find a desktop version of SQL Server 2000 somewhere - download and use that, back then they included the whole ETL tool in that version and i&#039;ve been using it like a &quot;data swiss army knife&quot; ever since.

Let me know if I understand the problem right. :)</description>
		<content:encoded><![CDATA[<p>Hey Tony!</p>
<p>I guess it depends on what you&#8217;re trying to do exactly. If your data is so spread out all over the place, it probably should all run from a main location (all in the same DB) &#8211; and then be pushed back out the the old locations on a scheduled basis (if they even need to exist anymore).</p>
<p>I&#8217;ve gone down the rabbit hole on building data warehouses (and datamarts) and data cubes to report on and all that fancy &#8220;best practice&#8221; stuff&#8230; but I honestly think that for a lot of implementations its total overkill. That&#8217;s just my humble opinion.</p>
<p>You could look at a more &#8220;Access friendly&#8221; ETL tool like SQL Server SSIS, but its costs a pretty penny. If you can find a desktop version of SQL Server 2000 somewhere &#8211; download and use that, back then they included the whole ETL tool in that version and i&#8217;ve been using it like a &#8220;data swiss army knife&#8221; ever since.</p>
<p>Let me know if I understand the problem right. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony Schmidt</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-75</link>
		<dc:creator>Tony Schmidt</dc:creator>
		<pubDate>Wed, 23 Sep 2009 17:58:26 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-75</guid>
		<description>Thanks for the post, Ry.

I&#039;ve been getting all worked up about data warehousing and ETL for my project, where I&#039;ve gotta do stuff like build an interface for creating new order and order_item entities that relate to members (MS Access), employees (MySQL) and products (flat file).

People have been telling me I need to read up on Inmon and Kimball and understand stuff like star schemas, data marts, EAV tables and so on.

Do you think all I need to do is write a bunch of python ETL scripts with event triggers (like &quot;create new order&quot; updates all the related tables in the warehouse)?

And why not bypass the ETL layer entirely and just perform cross-vendor joins on the application level?

Thanks in advance for any tips.</description>
		<content:encoded><![CDATA[<p>Thanks for the post, Ry.</p>
<p>I&#8217;ve been getting all worked up about data warehousing and ETL for my project, where I&#8217;ve gotta do stuff like build an interface for creating new order and order_item entities that relate to members (MS Access), employees (MySQL) and products (flat file).</p>
<p>People have been telling me I need to read up on Inmon and Kimball and understand stuff like star schemas, data marts, EAV tables and so on.</p>
<p>Do you think all I need to do is write a bunch of python ETL scripts with event triggers (like &#8220;create new order&#8221; updates all the related tables in the warehouse)?</p>
<p>And why not bypass the ETL layer entirely and just perform cross-vendor joins on the application level?</p>
<p>Thanks in advance for any tips.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ry</title>
		<link>http://ryrobes.com/featured-articles/using-a-simple-python-script-for-end-to-end-data-transformation-and-etl-part-1/comment-page-1/#comment-11</link>
		<dc:creator>Ry</dc:creator>
		<pubDate>Thu, 30 Jul 2009 02:14:41 +0000</pubDate>
		<guid isPermaLink="false">http://ryrobes.com/?p=91#comment-11</guid>
		<description>Hah, Thanks. I actually had a lot of modules in earlier (READ: really messy) versions of this particular script, and left &#039;string&#039; in for some romantic reason. :)

cursor.rowcount would be easier than the fetchone and var[0] printing - I might have to change that, since I&#039;m trying to make it as SIMPLE as possible. I guess my brain is hardwired to do a lot of things the &quot;hard way&quot;.

Thanks again, Speno!</description>
		<content:encoded><![CDATA[<p>Hah, Thanks. I actually had a lot of modules in earlier (READ: really messy) versions of this particular script, and left &#8216;string&#8217; in for some romantic reason. :)</p>
<p>cursor.rowcount would be easier than the fetchone and var[0] printing &#8211; I might have to change that, since I&#8217;m trying to make it as SIMPLE as possible. I guess my brain is hardwired to do a lot of things the &#8220;hard way&#8221;.</p>
<p>Thanks again, Speno!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
