Thu 2 Nov 2006
Michael Ault on SQL Tuning
Posted by Shane under Oracle
No Comments
I didn’t get a chance to catch Michael Ault’s presentation at the recent TCOUG, but I heard it was a good talk. Thank you Michael for putting the slides online.
Thu 2 Nov 2006
Posted by Shane under Oracle
No Comments
I didn’t get a chance to catch Michael Ault’s presentation at the recent TCOUG, but I heard it was a good talk. Thank you Michael for putting the slides online.
Thu 2 Nov 2006
Posted by Shane under Informatica, Oracle, SQL Server
No Comments
The holy grail of a former .NET web developer is how to combine the techniques of agile and test-driven development with ETL tools and data volumes. Mark Rittman shows that he is a brother-in-arms with my quest.
Thu 2 Nov 2006
Posted by Shane under Informatica, Oracle, SQL Server
1 Comment
This excellent piece in DMReview deliberates on some criteria for selecting (or justifying) an ETL tool. We are struggling with this right now. Our shop uses a combination of Informatica and SQL Server DTS for our ETL jobs. With Oracle Warehouse Builder 10gR2 and SQL Server Integration Services both released as viable alternatives, we need to justify our continued six-figure annual investment in Informatica. Is it worth it?
Thu 12 Oct 2006
Posted by Shane under Oracle
No Comments
I went to a seminar on Oracle partitioning and data warehousing earlier this week call “Scaling to Infinity” by Tim Gorman. It gave me some new ideas, and reinforced some others, for where we should take our architecture at the U.
Here are some notes that I will hopefully talk more about as I research them.
TRUNC(date_posted) = '10/12/06'date_posted >= '10/12/06'
AND date_posted < '10/13/05'Tue 27 Jun 2006
Posted by Shane under Informatica, Oracle
[2] Comments
I got a question about PowerExchange throughput the other day. When we were first evaluating PowerExchange as a solution, this was a primary concern of ours. We devoted hundreds of hours designing, implementing, and refining a good test of our throughput. In the end, we over-tested. The throughputs that we dealt with in the test were so much more than our production systems handled, even during peak load, that the actual system handles its load with great ease. Of course, sometimes that’s the point of load testing: to determine theoretical and not actual limits.
Our physical environment
We have a source (PeopleSoft) database running on Oracle 9i/Solaris. A target (ODS) database running on Oracle 9i/Solaris. And we have PowerExchange running on the same Windows 2003 box as PowerCenter. In mid-July, we’ll be upgrading both source and target databases to 10g.
Our data environment
Our current change data capture mappings captures 10 source tables and writes to 30 targets. 3 for each source table. 6 of these tables are considered “wide” tables. These are also the tables with the most DML activity in our PeopleSoft system. One has 23 columns totalling 2.2 kb/row. Another has 148 columns totaling 1.9 kb/row. You get the picture. Lots of data per row. We are capturing every byte in our ODS, plus some extra LogMiner columns like cap_tm.
We don’t have a high, sustained throughput in our system. Our high throughput periods are when batch cycles complete. This is usually nights/weekends when source/ODS latency is not as much of a concern.
Our PowerExchange settings
After discussing our environment with Informatica, we’ve come up with a couple tuning parameters for our system. These are tuned for lower latency, not higher throughput.
So, how much throughput?
This varies by table, of course, because the wider tables have lower throughput than the narrower ones. We have bursts of throughput of up to 4700 rows per second. But that number doesn’t really tell you much. 4700 rows of a 23 byte table isn’t much data. We’re not sure if that number is actually updates of our 23 byte table. Actually, I should look into that.
The clearer measure of throughput is data volume or bytes/sec. Not rows/sec.
The most frequent batch job we see happens to update our widest source table. We regularly see throughputs of up to 1200 rows per second in the mornings when this job commits. That’s 2.6 MB/s. Not bad. Our “standard” throughput after batch windows is between 500-1000 rows per second. Which is still between 1-2MB/s.
In summary, our biggest concern in using PowerExchange turned out to work very well. We were able to get by tuning very few variables (just the source connection in the Workflow Manager) without having to look into source/target tuning.
I invite your comments via email scolin@pobox.com or here.
Wed 24 May 2006
Posted by Shane under Life, Oracle, Other
No Comments
I scan a handful of financial blogs pretty regularly. I also scan a handful of technical blogs relating to SQL Server and Oracle. It isn’t very often that a blog from one of these worlds links to a blog in the other. But here’s JLP from AllFinancialMatters linking to Don Burleson of oracle-dba.com fame. I guess we’re all just geeks.
Wed 12 Apr 2006
Posted by Shane under Oracle, SQL Tricks
No Comments
Since the PeopleSoft table naming convention is to use as many underscores as possible, I’ve run into many snafus since the underscore is a special character in SQL. Clark (via Linda) showed us some ways around this.
select count(*)
from dba_tables
where table_name like '%\_UM' escape '\'
COUNT(*)
----------
113
1 row selected
This query, below, would pick up table names like PS_EXT_ACAD_SUM.
select count(*)
from dba_tables
where table_name like '%_UM'
COUNT(*)
----------
132
1 row selected