Pointer Corporation

The Information Technology Architects

Project Management and Database Solutions ...
From the Desktop to the World

Too Much Power Could be a Hazard to Database Reporting

Revealing a Step that should be Overlooked!


We recently got involved in a complex data warehouse project with a diverse list of relatively complex reports.  Everything through the design, development, and user-acceptance testing went on schedule.  However, when the system was moved to production, nothing worked!

What could have gone wrong?  We re-examined all the necessary ingredients and they all seemed to be there:

 

  • The On-Line Transaction Process (OLTP) was loading data into a fully normalized database;
  • Necessary aggregate numbers (such as year-to-date amounts) were being calculated;
  • Based on a sample load of about 100 records, all the report figures seemed to add-up perfectly.

Figure 1 shows a simplified view of the design.

So what's the problem?

In a nutshell, the problem is that "Aggregation" has been combined with "Reporting" in the same step.

Most reporting programs today come with elaborate and sophisticated capabilities that allow all kinds of aggregations and summary functions to be embedded in them, and that's the trap that should be avoided at all costs!

When aggregations and summary functions are delayed until the reporting time, they work fine for small sizes of sample data (which is exactly what had happened in this case).  However, as soon as they hit volumes of tens or hundreds of thousands of records, they bring down the entire reporting process to a virtual halt.  Your test report will deceive you and run in a few seconds, then the user signs off the system and gives you the go-ahead to go live, but the production aggregation of high volume of data may take hours and hours to run.

That's what the problem is.

So, what can be done?  Let's look at Figure 2 for a possible solution:

In this case, the "Aggregation" process is not delayed until the reporting time; it's done up-front during the OLTP process.  Right?

Wrong!  This is also a bad solution, because OLTP is not a good place for aggregations either.  During OLTP, when multitude of online users are pounding on the keyboard to enter the transactional data, there's no time for aggregations to take place.

So if we shouldn't do aggregation at the beginning or at the end, when is a good time?

Let's checkout Figure 3:

In Figure 3, a 3rd step is added for aggregations, between OLTP and reporting.  This way, OLTP will maintain it's performance speed for the online users and the reporting will also be quick since all the aggregation have already been done before reporting starts.

The important thing to remember for the aggregation process, is not just where it fits in the sequence of events -- which is between OLTP and reporting -- but also "when" it should take place.  Typically, aggregations are done in a batch mode during off hours (e.g. evenings or weekends).

Therefore, depending on the operational schedule of your OLTP, your "aggregated" data may be in the same database as your OLTP or in a different one.  If your OLTP is down during evenings and weekends, then you can use the same database for the aggregations.  Otherwise, in a 24-7 environment where online transactions are being entered around the clock, then you just take a momentary break to switch your OLTP backup volumes and use the previous volume to load the OLTP data into a 2nd database, in order to the aggregation process to run.

Depending on the size of your database, another step could also be added between aggregations and reporting, namely, the "Warehousing" step:

There's no magic rule as to when the model in Figure 3 is the best solution and when you should resort to Figure 4.  Just as a guideline for planning purposes, if you anticipate to have more than a million records sent to the reporting step, you should consider Figure 4.  For less than a million records, Figure 3 should work out fine.

 
Ben Aminnia, president of Pointer Corporation is a database architect with over 20 years of experience.  He's also the president of Los Angeles SQL Server Professionals Group www.sql.la and a board member of Los Angeles .NET Developers Group www.ladotnet.org.     
 

Copyright © 2008 Pointer Corporation