|
The Information
Technology Architects |
Too Much Power Could be a Hazard to Database ReportingRevealing a Step that should be Overlooked! We recently got involved in a complex data warehouse project with a diverse list of relatively complex reports. Everything through the design, development, and user-acceptance testing went on schedule. However, when the system was moved to production, nothing worked! What could have gone wrong? We re-examined all the necessary ingredients and they all seemed to be there:
Figure 1 shows a simplified view of the design. So what's the problem? In a nutshell, the problem is that "Aggregation" has been combined with "Reporting" in the same step. Most reporting programs today come with elaborate and sophisticated capabilities that allow all kinds of aggregations and summary functions to be embedded in them, and that's the trap that should be avoided at all costs! When aggregations and summary functions are delayed until the reporting time, they work fine for small sizes of sample data (which is exactly what had happened in this case). However, as soon as they hit volumes of tens or hundreds of thousands of records, they bring down the entire reporting process to a virtual halt. Your test report will deceive you and run in a few seconds, then the user signs off the system and gives you the go-ahead to go live, but the production aggregation of high volume of data may take hours and hours to run. That's what the problem is. So, what can be done? Let's look at Figure 2 for a possible solution: In this case, the "Aggregation" process is not delayed until the reporting time; it's done up-front during the OLTP process. Right? Wrong! This is also a bad solution, because OLTP is not a good place for aggregations either. During OLTP, when multitude of online users are pounding on the keyboard to enter the transactional data, there's no time for aggregations to take place. So if we shouldn't do aggregation at the beginning or at the end, when is a good time? Let's checkout Figure 3: In Figure 3, a 3rd step is added for aggregations, between OLTP and reporting. This way, OLTP will maintain it's performance speed for the online users and the reporting will also be quick since all the aggregation have already been done before reporting starts. The important thing to remember for the aggregation process, is not just where it fits in the sequence of events -- which is between OLTP and reporting -- but also "when" it should take place. Typically, aggregations are done in a batch mode during off hours (e.g. evenings or weekends). Therefore, depending on the operational schedule of your OLTP, your "aggregated" data may be in the same database as your OLTP or in a different one. If your OLTP is down during evenings and weekends, then you can use the same database for the aggregations. Otherwise, in a 24-7 environment where online transactions are being entered around the clock, then you just take a momentary break to switch your OLTP backup volumes and use the previous volume to load the OLTP data into a 2nd database, in order to the aggregation process to run. Depending on the size of your database, another step could also be added between aggregations and reporting, namely, the "Warehousing" step: There's no magic rule as to when the model in Figure 3 is the best solution and when you should resort to Figure 4. Just as a guideline for planning purposes, if you anticipate to have more than a million records sent to the reporting step, you should consider Figure 4. For less than a million records, Figure 3 should work out fine.
|
Copyright © 2008 Pointer Corporation
|