Shane on January 18th, 2007

Loading your target incrementally offers a huge performance benefit over running full truncate/reloads, but there is a danger of missing inserts or updates in your source system. It can take hours or days to track down the source of these problems (if it can be done at all!) and problems are generally not found until [...]

Continue reading about Audit Your ETL With CHECKSUM

Shane on November 2nd, 2006

The holy grail of a former .NET web developer is how to combine the techniques of agile and test-driven development with ETL tools and data volumes. Mark Rittman shows that he is a brother-in-arms with my quest.

Continue reading about Testing Framework for ETL

Shane on November 2nd, 2006

This excellent piece in DMReview deliberates on some criteria for selecting (or justifying) an ETL tool. We are struggling with this right now. Our shop uses a combination of Informatica and SQL Server DTS for our ETL jobs. With Oracle Warehouse Builder 10gR2 and SQL Server Integration Services both released as viable alternatives, we need [...]

Continue reading about Why Do We Use Informatica?

Shane on October 12th, 2006

I went to a seminar on Oracle partitioning and data warehousing earlier this week call “Scaling to Infinity” by Tim Gorman. It gave me some new ideas, and reinforced some others, for where we should take our architecture at the U. Here are some notes that I will hopefully talk more about as I research [...]

Continue reading about Scaling to Infinity Seminar