Loading The Data in a Data Warehouse
Once the data is extracted from the source systems, it is then typically loaded into a temporary data store in order for it to be cleaned up and made consistent. These checks can be quite complex, and identify consistency issues when integrating data from a number of data sources. In addition, as data changes over time, errors become apparent that have gone unnoticed because the day-to-day discrepancies were too small to detect. If 50 customers no longer appear in the customer details database week on week, we would expect to find 50 cijstomer events representing the cancellation of those subscriptions.
If this information does not appear in the customer events area of the data warehouse, the user can quite rightly expect this to be a significant inconsistency. In practice, the likelihood of this error occurring can be quite high, because the two source systems have significantly overlapping data sets. The data warehouse probably is the first time that consistency issues between the two separate systems become apparent. Put another way, if two source systems have overlapping data sets, the effort required to clean them both up will be much higher than twice the effort it takes to clean one up. In addition, the process must be capable of fully automatic running: that is, it has the intelligence to report errors in the load and move on, and/or request human intervention. Care should be taken when designing the load process to ensure that the error recovery is an integral part of the design.

! Kindly read our