An operational data store (ODS) is an architectural component of a data warehouse that is used for immediate reporting with current operational data. An ODS contains lightly transformed and lightly integrated operational data with a short time window. It is used for real time and near real time reporting.
Unlike data marts, an ODS is not refreshed from the data warehouse history tables. Rather it is directly loaded from operational data, staging area, or incoming files. It can optionally serve as a data source for the data warehouse.
Click on any image to see full size.
An ODS must be frequently refreshed so that it contains very current data. An ODS can be updated daily, hourly, or even immediately after transactions on operational data.
Transformation and Integration
The update frequency and currency of the data in the ODS is directly related to the amount of transformation and integration that is performed on the data. Choices are made by the development team based on how long various transformation processes will take to complete vs. how current the data must be for reporting. For example, if real time reports or dashboards require data within minutes or even seconds after data events, it may not be possible to do time-consuming transformation or integration processing. Also, dimensions and other reference data in the data warehouse may not be as current as new operational data.
Some degree of transformation and/or integration is usually required for reports. Bill Inman defines five ODS classes. The classes represent different levels of ease and speed of refresh vs. the degree of integration and transformation. For example, a Class I ODS would simply consist of direct replication of operational data (no transformation), where a Class V ODS would consist of highly integrated and aggregated data (highly transformed). A Class I ODS would be the quickest and simplest to refresh, while a Class V ODS would involve the most complex, time-consuming processing.
One option for refreshing an ODS with very current data is to use the transaction logs of the operational data to update data replicated in the ODS. Database systems use transaction logging to record all updates, inserts, and deletes to tables. Transaction logs are normally used for rolling back invalid transactions or for applying changes to data that were not completed due to a system failure. However, if the tables are replicated in the ODS, the transaction logs can also be used to refresh them. The replicated tables could then be used as staging for reporting tables. In SQL Server and in Oracle using transaction logs to update replicated data is called Change Data Capture (CDC). Those products provide system stored procedures and other tools to assist in applying transaction logging to replicated data.
 (SQL Server) or materialized views (Oracle). Indexed views and materialized views are similar to regular views except that they provide high performance when the view is queried. These high-performance views created in the ODS schema point at operational data. Views can provide a relatively simple way to keep an ODS very current. Views cannot be used when it is necessary to access data on remote servers.
When an ODS needs to be updated less frequently, conventional ETL processes can be used. For example, if an ODS is only updated once daily, it may be feasible to export operational data to files which can then be quickly loaded into the ODS.
An ODS can contain its own staging tables as well as transformed tables for reporting. It is common for an ODS to utilize separate staging tables from the rest of the data warehouse because the ODS is refreshed by separate processes than the data warehouse.
Reporting tables are limited by the fact that there may not be as full integration and transformation as there is for data warehouse tables. This is because time consuming ETL processing may not allow for fast access to very current data. Because of this, dimensional modeling (star schema) is not always possible.
The main purpose of an ODS is to provide reporting and querying on very current operational data. Reports can only be created on very short time windows of data. To query history, the data warehouse and application-specific data marts must be used instead of the ODS.
Reports may also be affected by the limited transformation and integration of data in an ODS. For example, data may not contain the surrogate keys necessary for joins on dimensions in the data warehouse. Data from multiple sources may not be completely integrated into consistent structures and attribute values.
An ODS can provide access to current operational data for reporting. An ODS is loaded directly from operational data and not from the data warehouse history tables. An ODS only contains a short time window of data. If history is required, the ODS can be a data source for the data warehouse.
An ODS must balance the frequency of refresh, the degree of transformation and integration, and how current data in the ODS must be. A number of refresh options should be considered based on application requirements.
An ODS can be a database or a schema within a database that contains both staging as well as reporting tables. There may be limitations on the reporting tables compared to data mart tables due to the fact that there is less transformation and integration of operational data.