Joe,
You got it. The star schema for a true data warehouse usually dictates aggregates AND atomic data so one can drill down or up or just look at the aggregate data and build your reports/screens/dashboards from there. And, as you know, the ETL(Extract-Transform-Load) process is where you build aggregates, clean and make data usable/reportable, etc.
Often most folks/sites have to walk before they can run; unless you have a good budget. To this end, they go through a staged approach starting with an Operational Data Store(ODS) that, at least, is tuned for reporting. You'll also start to see what needs to happen in your ETL programs/tools to transform data so its more customer/user oriented.
I'll also say that a DW/DM/ODS won't replace all reporting from your production data base but it can get close. This usually because production will "report" immediate data via the OLTP application. For simplicity sake, things like invoicing, etc., would probably be better from the native OLTP app, using batch.
I have seen and done the "big bang"/full-blown/"start to build and go to finish" data warehouse/data mart with no ODS. But not, so far, in i5 shops. It tends to be found in large companies using IBM mainframes and/or large *ix boxes. But, this is not to say it couldn't be done with the i5. The problems is in the budget, not the box/OS.
Have you found that, and if so, how are you addressing it? Do you do exact mirroring, or do you sync data on a regular basis? Do you pre-calculate data for performance? <<
In this particular site's situation, we're going with the staged/slow approach. We do "syncing" and some transforming using IBM's DataPropagator (they already had it). It seems to work well and is not as expensive as many products out there.
Data not being in "real time" is not a problem since, as you point out, most users don't NEED real time for most things anyway. I will say, however, with DP I could make it near real time but have chosen to do some of the syncs every 30 minutes and some once per week. So far, this seems to be more than they have ever had before so they are happy...for now.
Data Mirroring, as with MIMIX or the like would not work as the site needs to have ALL (for now) history (12 years) PLUS frequent updates of what's happening in production. Hence the ability to do current or historical reporting from the DW. Data mirroring would wipe out the history in the DW.
Some pre-calculating is done and more will be done over time as the business learns and I move the design from ODS to a dimensional model (star schema). Yes, aggregates, do help performance tremendously. But, you know what also helps... as you might expect, indexes; plain ole radix tree indexes. Running your queries through Visual Explain will give one great insight into what the DBA AND query writer can do to tune for OLAP and not OLTP.
Hope that helps,
Dave
Joe Pluta <joepluta@xxxxxxxxxxxxxxxxx> 5/12/2008 13:54 >>>
Dave Odom wrote:
What about doing a Data Warehouse/Data Mart/Operational Data Store in another LPAR, if possible, have the users create their own reports with IBM DataQuant and DB2 Query Manager, or some such tools for interactive and batch reporting, and get yourself out of the report generation business as much as possible? That what I'm doing here.
This is the great benefit of Data Warehousing, to my mind, Dave. The
biggest problem has always been that often the data in the database
isn't perfectly suited for reporting; even in well designed databases,
data elements aren't always present in a reportable way.
Also, rather than constantly calculating values, it's sometimes better
from a performance standpoint to pre-summarize data. It may mean that
the data being reported is not entirely real time, but many reports are
completely acceptable on data hours old, or even longer.
Have you found that, and if so, how are you addressing it? Do you do
exact mirroring, or do you sync data on a regular basis? Do you
pre-calculate data for performance?
Joe
As an Amazon Associate we earn from qualifying purchases.