Saturday, January 2, 2010

Why datawarehouse and OLAP tools when there is data duplication?

The data warehouse used by OLAP tools has large quantities of integrated, normally summarised, historical data which is time-stamped. The data is normally added to the data warehouse on regular frequencies rather than being updated to form an enterprise-wide, integrated repository to support data mining. All the updates to the various transactional systems are incrementally added to the data warehouse to accurately reflect the reality on the date of last extract. In the OLTP system the data has to be absolutely accurate as it is dealing with the operational transactions and it has to respond within timely fashion. Also in OLTP systems the non-current data is archived to reduce storage needs and enhance performance. The time-stamped nature of the data in the warehouse means that the business reality on a defined date can be analysed for strategic purposes without worrying about the performance impact on the transactional systems. The historical data could span a number of years to facilitate trend analysis and to seek correlations. The OLAP tools allow business users to slice and dice data, discover anomalies and drill-down to the root causes. For example, the decline in a brand’s performance could be correlated to the rise of a new launch by a competitor or the decline in advertising expenditure to support the brand or even the changing economic climate. The powerful data mining tools can carry out statistical analysis, use artificial intelligence, neural networks, and machine learning etc to unearth unexpected correlations and anomalies. There is no way such an analysis could have been done in a transactional system as it would not have access to competitor’s information or macroeconomic data. Also the normal star schema of a data warehouse is optimised for analytical processing and, may, hold aggregates. Thus the data duplication in the warehouse is being used to support a different business objective from the one expected of OLTP system. The governance structure around the warehouse ensures accuracy of data on the date of last extract from transactional systems which is incrementally added. Apart from the data extraction overhead, the OLAP system doesn’t impact the OLTP system but allows a wider business objective of data analysis to be achieved. Thus making investment in data warehouse worthwhile, despite the seemingly duplication of data.

No comments:

Post a Comment