Jan

Audit Your ETL With CHECKSUM

Loading your target incrementally offers a huge performance benefit over running full truncate/reloads, but there is a danger of missing inserts or updates in your source system. It can take hours or days to track down the source of these problems (if it can be done at all!) and problems are generally not found until the customer picks up on it.

So anything you can do to verify the target data with the source is good for your business. You can do simple rowcounts, but that’s an unreliable test. For example, if you’re missing one inserted row and one deleted row, your rowcount test will still pass.

A good checksum technique is a lightweight alternative that can be run asyncronously or run after the load is complete. In this example, I am querying a SQL Server target and using ODBC to connect to an Oracle source to get a list of keys that are not syncronized.

SELECT T.key, S.key, T.checksum, T.num_rows, S.checksum, S.num_rows FROM (SELECT key, count(*) num_rows, sum(len(char_col)+len(num_col)+num_col+...) checksum FROM target_tablegroup by key) T FULL OUTER JOIN OPENQUERY(source_server,' SELECT key, count(*) num_rows, sum(length(char_col)+length(num_col)+num_col+...) checksum FROM source_table GROUP BY key') S ON T.key = S.key WHERE S.num_rows <> T.num_rows OR S.checksum <> T.checksum

Tags: ETL, Integration Services, Other

Posted by Shane Other Subscribe to RSS feed

Audit Your ETL With CHECKSUM

Leave a Reply

Recent Index

Browse by Month

What I’m Reading