Oracle


Some food for thought when working with NULL values.


select count(*)
from ps_dwsa_prog_dtl
COUNT(*)
----------
657662
1 row selected

select count(*)
from ps_dwsa_prog_dtl
where acad_sub_plan ='13'
COUNT(*)
----------
13
1 row selected

select count(*)
from ps_dwsa_prog_dtl
where acad_sub_plan <>‘13′
COUNT(*)
———-
148559
1 row selected

select count(*)
from ps_dwsa_prog_dtl
where acad_sub_plan is null
COUNT(*)
———-
509090
1 row selected

Actually, I’m not at the office, but I am doing what I would consider “work” today. I’m at the TDWI World Conference in Orlando.

Today I attended Stephen Brobst’s class on “Designing a High Performance Data Warehouse”. While the course came highly recommended, I was somewhat skeptical going in that, being software agnostic, it would be too theoretical to really put to use. But I was astounded by how much content was in this course.

We reviewed:

  1. Join Strategies
  2. Indexing Strategies
  3. Database Parallelism
  4. Partitioning Strategies
  5. OLAP & Aggregation

There was a whole lot of stuff in between. I’m going to need to bone up on employing partitions and parallelism both in the databases (Oracle and SQL Server) and in our ETL tools (Informatica and eventually SSIS). There is a lot that can be done architecturally that could help our processes scale. He had a particularly good illustration Amdahl’s Law, demonstrating the benefits of parallelism, that I will need to draw out and put up here. The point being that many processes need to be run sequentially, but you get a huge boost out of finding ones that can run in parallel and doing so.

I will hopefully have more details later, but I need to run to a session that Informatica is putting on. I will probably be coding tonight. Put some of this creative energy to work.

Starting with release 10g, Oracle is phasing out Long fields. I have two words about this: Thank God! Long fields are an ETL nightmare. You can’t use any string functions on them. The OLEDB/ODBC drivers can’t read them.

If you’re a PeopleSoft developer and you’re thinking about creating a LONG field, ask yourself if it’s really worth the headache you’re creating for everybody down the road. Is VARCHAR2(4000) adequate? Is that one in a thousand case really important enough to justify all the extra time it will take extract this data out in the future? If the answer is yes, get with the times and use CLOB instead.

Incidentally, Microsoft has followed the leader and is depreciating text and ntext data fields in SQL 2005. They have replaced them with VARCHAR(MAX).

« Previous Page