<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-518173685570935396</id><updated>2012-02-16T02:52:17.143-08:00</updated><category term='pig'/><category term='cloud services'/><category term='MPP'/><category term='unstructured database'/><category term='appliance'/><category term='cloud database'/><category term='SQL Server'/><category term='google file system'/><category term='column-based'/><category term='data warehouse'/><category term='column based'/><category term='Oracle'/><category term='Business Intelligence'/><category term='big data'/><category term='parallel processing'/><category term='emerging technology'/><category term='massively parallel processing'/><category term='map reduce'/><category term='Hadoop'/><category term='BI'/><category term='database appliance'/><category term='hive'/><category term='column-oriented'/><category term='data warehouse appliance'/><category term='column oriented'/><category term='database'/><title type='text'>Clarity in Data Warehousing</title><subtitle type='html'>From the point of view of Randy Grenier</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>13</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-2860687963155616454</id><published>2011-11-20T05:56:00.000-08:00</published><updated>2011-11-20T06:07:37.068-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cloud services'/><category scheme='http://www.blogger.com/atom/ns#' term='cloud database'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><title type='text'>Rapidly Emerging Technology Series: Cloud Databases</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;span style="line-height: 115%;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses cloud databases.&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A growing trend in database systems is storing data in the cloud. The cloud metaphor is taken from the flowchart symbol of a cloud that represents data movement over the Internet (hence not visible to the flowchart….in the cloud).&amp;nbsp; Companies including Microsoft, Oracle, IBM and Amazon offer cloud storage services on vast server and storage complexes.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Advantages of storing data with a cloud service is that they will perform all administrative services, replicate data on various severs for instantly available backups, and automatically scale growing databases and applications.&amp;nbsp; These services could result in considerable cost savings for some organizations.&amp;nbsp; Database development can be done without a large commitment to hardware purchases and administrative staff, and purchase and hiring decisions can be postponed for later assessment.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;For example, Microsoft offers a product called &lt;i&gt;SQL Azure&lt;/i&gt; which is a cloud-based SQL Server service.&amp;nbsp; SQL Azure makes it possible to store and access SQL Server data on cloud-based servers as well as integrate cloud and local data in a hybrid database.&amp;nbsp; SQL Azure databases can contain SQL Server tables, indexes, stored procedures, and other common database objects.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Cloud servers can be &lt;i&gt;public&lt;/i&gt; or &lt;i&gt;private&lt;/i&gt;.&amp;nbsp; A public cloud is a service purchased from a cloud services vendor.&amp;nbsp; A private cloud is set up by an organization with its own servers and storage.&amp;nbsp; A public cloud has the advantage of very sophisticated administrative and scaling capabilities that would be very complicated and expensive for smaller organizations to set up themselves.&amp;nbsp; A private cloud may be used when regulatory requirements are needed above and beyond the security provided by public clouds.&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A &lt;i&gt;cloud hybrid database&lt;/i&gt; is one where data is stored on both cloud and local servers. This may be done for a number of reasons including replication of critical data on separate servers, separating processing of various data sources, providing separate access to various data marts, and other data distribution and access requirements.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-2860687963155616454?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/2860687963155616454/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_8686.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/2860687963155616454'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/2860687963155616454'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_8686.html' title='Rapidly Emerging Technology Series: Cloud Databases'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-5430982363866715081</id><published>2011-11-20T05:51:00.000-08:00</published><updated>2011-11-20T06:07:54.324-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Business Intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='BI'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><title type='text'>Rapidly Emerging Technology Series: Self-Service BI</title><content type='html'>&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses self-service BI.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A very rapidly evolving area of information access is called &lt;i&gt;Self-service Business Intelligence&lt;/i&gt; or &lt;i&gt;Self-service BI&lt;/i&gt;.&amp;nbsp; Business users want more flexible access to information than standard reports can provide and not have to rely on analysts who can write code.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;For many years, companies like Cognos and MicroStrategies have provided BI tools that work on top of relational databases.&amp;nbsp; Typically, the database team would set up a star schema data mart, and BI developers would create a browser-based front end would allow easy grouping and aggregation (slice-and-dice) access to the data with sums and counts.&amp;nbsp; BI products also provide tools to build dashboards that give very current views of data—typically in an operational data store (ODS).&amp;nbsp; Dashboards can also include data visualization in the form of graphs and gauges.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;However, these tools are evolving to include much more sophisticated access for users.&amp;nbsp; Unlike older BI tools that sat on top of a pre-defined data mart, newer self-service BI products provide access to multiple data sources including enterprise databases as well as publicly accessible databases and subscription databases on the web.&amp;nbsp; The BI tool allows for integration of these diverse data sources at the user level for complex reports and graphs.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Most analyses involve more than single queries.&amp;nbsp; There is usually a discovery process which leads to more questions and subsequently more analyses.&amp;nbsp; Newer BI products support this discovery process.&amp;nbsp; Self-service BI allows an analysis to be built upon by easily changing variables, adding data sources, re-use of results, and so on.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Newer BI tools are being developed that also allow mobile access from a variety of devices. Users will increasingly need to access and visualize data from their smart phones and tablets.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Microsoft has invested heavily in Self-service BI.&amp;nbsp; Their SQL Server product contains tools that allow tight integration with Excel and all of its analytical and graphing capabilities as well as with SharePoint to promote various types of live data access throughout an organization.&amp;nbsp; SQL Server 2012 which will be released in coming months has a tool called PowerView which is a very flexible and powerful BI front end.&amp;nbsp; It supports integration of diverse data sources at the user level, self-service alerts, sophisticated graphing, document authoring with live data updates, and many other analytical tools. &lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;(A pre-release version of SQL Server 2012 is available for free download at &lt;a href="http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx&lt;/a&gt;.)&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-5430982363866715081?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/5430982363866715081/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series-self.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/5430982363866715081'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/5430982363866715081'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series-self.html' title='Rapidly Emerging Technology Series: Self-Service BI'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-7209688196159582723</id><published>2011-11-20T05:47:00.000-08:00</published><updated>2011-11-21T04:04:58.762-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='google file system'/><category scheme='http://www.blogger.com/atom/ns#' term='map reduce'/><category scheme='http://www.blogger.com/atom/ns#' term='hive'/><category scheme='http://www.blogger.com/atom/ns#' term='pig'/><category scheme='http://www.blogger.com/atom/ns#' term='unstructured database'/><category scheme='http://www.blogger.com/atom/ns#' term='big data'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><category scheme='http://www.blogger.com/atom/ns#' term='Hadoop'/><title type='text'>Rapidly Emerging Technology Series: Unstructured Databases</title><content type='html'>&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses unstructured databases.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;When data grows to many terabytes, exabytes, and zettabytes, it can no longer be managed or queried using traditional relational databases.&amp;nbsp; Examples are the massive amounts of data managed by companies like Google or Facebook.&amp;nbsp; Examples of data that can occupy massive amounts of storage in health care are genomics data and data from electronic patient monitoring systems.&amp;nbsp; Because of the massive amounts of data in these databases, they cannot be structured according to relational database models and are hence called &lt;i&gt;unstructured databases&lt;/i&gt;.&amp;nbsp; The term &lt;i&gt;big data&lt;/i&gt; is frequently used to describe unstructured databases.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A pioneer in developing new ways of managing big data was Google.&amp;nbsp; They created a framework for distributing and processing huge amounts of data over many nodes including a storage system called &lt;i&gt;Google File System&lt;/i&gt;, and a language called &lt;i&gt;MapReduce&lt;/i&gt; for querying distributed data.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A popular big data solution based on Google’s is called &lt;i&gt;Hadoop&lt;/i&gt;.&amp;nbsp; Versions of Hadoop have been adapted by Yahoo, Facebook, and Amazon.&amp;nbsp; Hadoop is being incorporated into commercial relational database products, and it is freely available as open source software.&amp;nbsp; Both Microsoft and Oracle have announced Hadoop integration with their newly released relational database products.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;When large data providers Yahoo and Facebook implemented Hadoop, they added higher level programming tools than MapReduce called &lt;i&gt;Hive&lt;/i&gt; (Facebook) and &lt;i&gt;Pig&lt;/i&gt; (Yahoo).&amp;nbsp; These languages mimic SQL but have the ability to distribute queries on distributed nodes.&amp;nbsp; Because big data databases use non-relational queries, they are frequently referred to as &lt;i&gt;NoSQL databases&lt;/i&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Unstructured databases are very good at quickly storing and accessing extremely large amounts of data.&amp;nbsp; However, they do not have the more complete functionality of relational database management systems.&amp;nbsp; For example, Hadoop does not support complex data models, complex analytical queries, referential integrity, and other RDBMS capabilities.&amp;nbsp; NoSQL databases only provide very simple data organization and simple text-based querying.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-7209688196159582723?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/7209688196159582723/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_3592.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7209688196159582723'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7209688196159582723'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_3592.html' title='Rapidly Emerging Technology Series: Unstructured Databases'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-4045224087724186449</id><published>2011-11-20T05:44:00.000-08:00</published><updated>2011-11-20T06:08:35.368-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='parallel processing'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='MPP'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><category scheme='http://www.blogger.com/atom/ns#' term='massively parallel processing'/><title type='text'>Rapidly Emerging Technology Series: Massively Parallel Processing</title><content type='html'>&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses massively parallel processing.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;i&gt;Parallel processing&lt;/i&gt; distributes a process into multiple &lt;i&gt;threads&lt;/i&gt; so that they can be performed simultaneously.&amp;nbsp; In database systems, parallel processing allows queries to be performed on data distributed in different locations at the same time and then combine the results.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Database systems with &lt;i&gt;massively parallel processing&lt;/i&gt; &lt;i&gt;(MPP)&lt;/i&gt; can distribute processes across hundreds of &lt;i&gt;nodes&lt;/i&gt;.&amp;nbsp; A &lt;i&gt;node&lt;/i&gt; is a separate server with its own software and storage.&amp;nbsp; A very large table could be distributed across the nodes so that a query can process against all of the nodes at the same time.&amp;nbsp; The results are then combined and returned to the requestor.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;MPP systems have a controller node which does the work of distributing data and processes. &amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A developer could create DDL for a table without consideration of how the data would be distributed across the system, and the controller would automatically allocate storage on each of the nodes for this table.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;When inserting data, the system has the intelligence to hash the data and create a balanced distribution.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;When querying the data, the controller instantly converts the query into code that would run against all of the nodes simultaneously.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;MPP databases only make sense for data warehousing and OLAP.&amp;nbsp; There would be no performance gains from the MPP when performing OLTP operations such as inserting or updating individual records or querying for small sets of records.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-4045224087724186449?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/4045224087724186449/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_7159.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/4045224087724186449'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/4045224087724186449'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_7159.html' title='Rapidly Emerging Technology Series: Massively Parallel Processing'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-8975485943054454038</id><published>2011-11-20T05:40:00.000-08:00</published><updated>2011-11-20T06:09:36.386-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='column based'/><category scheme='http://www.blogger.com/atom/ns#' term='column-based'/><category scheme='http://www.blogger.com/atom/ns#' term='column-oriented'/><category scheme='http://www.blogger.com/atom/ns#' term='column oriented'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><title type='text'>Rapidly Emerging Technology Series: Column-Oriented Databases</title><content type='html'>&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses column-oriented databases.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Data in a typical relational database is organized in tables by row.&amp;nbsp; The row paradigm is used for physical storage as well as the logical organization of data.&amp;nbsp; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;i&gt;Column-Oriented&lt;/i&gt; or &lt;i&gt;column-based&lt;/i&gt; databases physically organize data by column while still presenting data as rows.&amp;nbsp; Organizing data by column makes it self-indexing since each column becomes the equivalent of an index key.&amp;nbsp; Each column has table and row identifiers so that columns can be combined to produce rows of data in a table.&amp;nbsp; Any column can be used in the WHERE clause of a query, the data elements can quickly be found in that column, and columns with the same row identifiers are assembled for the query results.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Microsoft SQL Server 2012 (to be released in 2012) has a new feature called Columnstore Indexes which provide column-oriented storage for tables.&amp;nbsp; Microsoft claims that these can speed up queries up to 10 times.&amp;nbsp; A major advantage of the new SQL Server Columnstore Indexes is that you can index all columns in a large fact table.&amp;nbsp; This enables very fast performance when joining foreign keys in very large fact tables to primary keys in dimensions for star joins.&amp;nbsp; With B-tree indexes, there are penalties for indexing too many columns.&amp;nbsp; In a typical fact table, most columns other than the primary key are foreign keys to dimension tables, and Columnstore indexes will allow all of those columns to be indexed using column-oriented storage.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Oracle 11g supports column-oriented storage of data but only on its proprietary Exadata storage servers or Exadata appliances.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Informative links for information about column-oriented Databases:&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;http://en.wikipedia.org/wiki/Column-oriented_DBMS&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;a href="http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-8975485943054454038?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/8975485943054454038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_20.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8975485943054454038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8975485943054454038'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series_20.html' title='Rapidly Emerging Technology Series: Column-Oriented Databases'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-7543539077453662992</id><published>2011-11-20T05:36:00.000-08:00</published><updated>2011-11-20T06:10:51.604-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='parallel processing'/><category scheme='http://www.blogger.com/atom/ns#' term='appliance'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse appliance'/><category scheme='http://www.blogger.com/atom/ns#' term='Oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL Server'/><category scheme='http://www.blogger.com/atom/ns#' term='emerging technology'/><category scheme='http://www.blogger.com/atom/ns#' term='database appliance'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><title type='text'>Rapidly Emerging Technology Series: Database Appliances</title><content type='html'>&lt;div class="MsoNoSpacing"&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.&amp;nbsp; This posting discusses database appliances.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A &lt;/span&gt;&lt;i style="font-family: Arial, Helvetica, sans-serif;"&gt;database appliance&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt; is an integrated, preconfigured package of RDBMS software and hardware.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Most major database vendors including Microsoft, Oracle, IBM, and TeraData package and sell database appliances.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;i style="font-family: Arial, Helvetica, sans-serif;"&gt;Data warehouse appliances&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt; are the biggest selling database appliances.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Database systems utilize memory, I/O, processing cores, and storage for database processes, and they need to formulate execution plans that will utilize these resources efficiently. &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Hardware configurations for database performance—particularly data warehousing—are not necessarily the same as configurations for other purposes.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;In fact, sometimes database performance isn’t even considered when purchasing and configuring hardware.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;In those situations, even the most experienced DBA's and systems administrators aren't always able to optimize systems to get satisfactory performance.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;A database appliance is a pre-configured hardware and software solution.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Most database appliances are designed for specialized applications such as OLTP or data warehousing. The servers, storage, OS, and RDBMS software are integrated and optimized for performance.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Some database appliances utilize parallel processing to distribute workloads across server nodes. Multi-node systems can be &lt;/span&gt;&lt;i style="font-family: Arial, Helvetica, sans-serif;"&gt;share-everything&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt; allowing multiple servers to share storage, or &lt;/span&gt;&lt;i style="font-family: Arial, Helvetica, sans-serif;"&gt;share-nothing&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt; where each server has its own storage.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Share-everything systems tend to be more expensive yet allow the same data to be accessed by several servers. Share-nothing systems can distribute data from the same tables across multiple nodes so that queries can be processed in parallel.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; A s&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;hare-nothing system is useful for querying very large fact tables in a data warehouse.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;Database appliances generally do not scale well outside of the initial configuration.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;For example, you generally don’t add storage to a database appliance.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; line-height: 18px;"&gt;Data warehouse appliances are available to support from about 5 terabytes to 100’s of terabytes of data.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span style="line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Database appliances can also be very costly.&amp;nbsp; In many situations, it may be possible to get satisfactory database performance with much less expensive hardware purchases.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Microsoft:&lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/appliances.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/appliances.aspx&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Oracle:&amp;nbsp;&lt;a href="http://www.oracle.com/us/products/database/database-appliance/index.html"&gt;http://www.oracle.com/us/products/database/database-appliance/index.html&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;IBM/Netezza: &lt;a href="http://www.netezza.com/data-warehouse-appliance-products/index.aspx"&gt;http://www.netezza.com/data-warehouse-appliance-products/index.aspx&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Teradata: &lt;a href="http://www.teradata.com/data-appliance-data-warehouse/"&gt;http://www.teradata.com/data-appliance-data-warehouse/&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNoSpacing"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Wikipedia article on Data Warehouse Appliances: &lt;span style="line-height: 115%;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Data_warehouse_appliance"&gt;http://en.wikipedia.org/wiki/Data_warehouse_appliance&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-7543539077453662992?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/7543539077453662992/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7543539077453662992'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7543539077453662992'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/11/rapidly-emerging-technology-series.html' title='Rapidly Emerging Technology Series: Database Appliances'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-2518305389196545812</id><published>2011-09-19T04:26:00.000-07:00</published><updated>2011-09-19T04:27:07.584-07:00</updated><title type='text'>The Dimensional Model</title><content type='html'>&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;The&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;dimensional model&lt;/span&gt;&lt;/i&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a way of organizing data to facilitate queries and analyses.&amp;nbsp; When implemented correctly, queries are simplified and errors are less likely.&amp;nbsp; Dimensional modeling has very specific requirements without a lot of room for ambiguity.&amp;nbsp; The&lt;/span&gt;&amp;nbsp;&lt;u&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Data Warehouse Toolkit&lt;/span&gt;&lt;/u&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;by Ralph Kimball is a definitive primer on dimensional modeling. This posting attempts to give a brief overview.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&amp;nbsp;A dimensional model consists of&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;fact&lt;/span&gt;&amp;nbsp;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;tables and&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;dimension&lt;/span&gt;&lt;/i&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;tables in what is called a&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;star schema&lt;/span&gt;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;.&amp;nbsp; The star metaphor is derived from the typical graphic of a fact table in the center surrounded by multiple dimensions like points on a star.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-8AgiZFI9fCc/Taxzkkp8RCI/AAAAAAAAAH8/2OjGwD1hKyU/s1600/star.GIF" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="143" src="http://2.bp.blogspot.com/-8AgiZFI9fCc/Taxzkkp8RCI/AAAAAAAAAH8/2OjGwD1hKyU/s320/star.GIF" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div align="center" class="MsoNormal" style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="background-color: white;"&gt;The fact table typically represents a transaction, measure or event while dimension tables contain descriptive attributes for the fact.&amp;nbsp; For example, a prescription dispensing could exist in a fact table with the cost of the prescription being the actual “fact,” while dimensions could contain descriptive information about the patient, the pharmacy, the health plan, and the prescriber.&amp;nbsp; Selection and groupings can be done on any attribute of the dimension.&amp;nbsp; The patient dimension could contain DOB for selecting or grouping on age, gender for selecting or grouping by sex and zip code for selecting or grouping by location.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;This is not unlike common joins of transaction data with reference data to obtain descriptive attributes for the transactions.&amp;nbsp; However, dimensions can contain descriptions at multiple levels or &lt;i&gt;hierarchies&lt;/i&gt; that would have been normalized in transaction processing.&amp;nbsp; For example, a drug dimension table can contain several levels of drug classifications, each of which can be used for selection or grouping in a query.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-top: 10.0pt; mso-outline-level: 2; mso-pagination: widow-orphan lines-together; page-break-after: avoid;"&gt;&lt;b&gt;&lt;span style="color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Fact Tables&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;A&amp;nbsp;&lt;i&gt;fact&lt;/i&gt;&amp;nbsp;is an entity, event, transaction, or measure that can be represented as a single row in a relational database table.&amp;nbsp; It has a primary key consisting of a single column which uniquely identifies the fact.&amp;nbsp; A fact table is always&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Third_normal_form"&gt;&lt;span style="color: #29aae1; text-decoration: none;"&gt;Third Normal Form (3NF)&lt;/span&gt;&lt;/a&gt;.&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-size: 13.5pt;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;The other columns of a fact table can contain one of three types of values:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-top: 0in;" type="disc"&gt;&lt;li class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;A single descriptive value which is      intransitively functionally dependent on the primary key.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;A single-column foreign key which is      intransitively functionally dependent on the primary key.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;li class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;A numeric value containing the actual measure or      fact.&amp;nbsp;&amp;nbsp; For example, the money amount of a transaction or the      cost of product.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;The phrase&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;intransitively functionally dependent&lt;/span&gt;&lt;/i&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is from relational database theory.&amp;nbsp; It means that the column can only contain one possible value which is determined by the primary key and is not dependent on the values of any other columns.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;An example of a fact is a record of a patient visit to a doctor’s office.&amp;nbsp; (This is a simplified example used to explain dimensional modeling and is not likely to be complete enough for a real world application.)&amp;nbsp; The visit fact table is named FactVisit.&amp;nbsp;&amp;nbsp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;FactVisit (&lt;u&gt;FactVisitID&lt;/u&gt;, VisitDateID, VisitMinutes, ClinicianID, PatientID, ProblemID, CopayAmount, BillAmount)&lt;/span&gt;&lt;/b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;FactVisitID&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is the primary key and uniquely identifies the patient visit.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;VisitDateID&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a foreign key referencing the primary key in a date dimension.&amp;nbsp; The date dimension contains various date attributes such as quarter, fiscal year, and so on.&amp;nbsp; (Date dimensions are explained later.)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;VisitMinutes&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a value containing the number of minutes of the patient visit. It is a descriptive attribute of the fact.&amp;nbsp; It is not a foreign key to a dimension table because it is simply a value with no additional attributes of its own.&amp;nbsp; A descriptive column with no additional attributes that doesn’t reference a dimension table is called a&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;degenerative dimension&lt;/span&gt;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;ClinicianID&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a foreign key referencing the primary key in a clinician dimension containing attributes of doctors and nurse practitioners at the practice.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;PatientID&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a foreign key referencing the primary key in a patient dimension containing attributes of patients.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;ProblemID&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is a foreign key that references the primary key in a problem dimension which contains various problem/diagnosis attributes.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;CopayAmount&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;and&lt;/span&gt;&amp;nbsp;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;BillAmount&lt;/span&gt;&lt;/b&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;are the facts.&amp;nbsp; These are the patient’s copay amount and the amount billed to the patient’s insurance respectively.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Note that each of the attribute columns is one of the three types of attributes previously described.&amp;nbsp; Again, it is important that there are no transitive dependencies.&amp;nbsp; For example, if there was a ProblemID and a DiagnosisID, then there may be a dependency between these two columns because a specific problem as defined by the practice may be assigned one or more diagnosis codes.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-top: 10.0pt;"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Dimensions&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Dimensions are values that have multiple descriptive attributes.&amp;nbsp; An easy to understand example is a&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;date dimension&lt;/span&gt;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;.&amp;nbsp; For any given date, there are multiple attributes such as Year, Month, Day of Week, Fiscal Year, Quarter, Weekend vs. Weekday, whether the date is a holiday, and so on.&amp;nbsp; For example:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;DimDate (DateID, Date, Year, Month, Fiscal Year, Quarter, BusinessDay)&lt;/span&gt;&lt;/b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;FactVisit could be joined to DimDate to obtain attributes of the date.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New';"&gt;SELECT d.Year, d.Month, f.PatientID, COUNT(*) AS Visits&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New';"&gt;FROM factVisits f JOIN dimDate d ON d.DateID = f.VisitDateID&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New';"&gt;GROUP BY d.Year, d.Month, f.PatientID&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #333333;"&gt;This query returns the number of patient visits by year, month, and patient. Notice that we are not grouping on the actual date, but on the Year and Month attributes of the date in the dimension table.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #333333;"&gt;A dimension is&lt;/span&gt;&lt;span style="color: #333333;"&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Second_normal_form"&gt;&lt;span style="color: #29aae1; text-decoration: none;"&gt;Second Normal Form (2NF)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;because transitive dependencies are allowed.&amp;nbsp; For example, the Quarter is transitively dependent on Month (i.e. the Month can be used to determine the Quarter).&amp;nbsp; If the data modeler chooses to further normalize a dimension table (i.e. to 3NF), then the star schema becomes a&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;snowflake schema&lt;/span&gt;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;.&amp;nbsp; Kimball recommends “resisting the urge to snowflake” because of the unnecessary complexity and additional performance issues.&amp;nbsp; The fact that dimensions are not further normalized makes it simple to group and aggregate which is what dimensional modeling is all about.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #333333;"&gt;Note that the DimDate dimension table can be used by any other fact table that needs to reference date attributes.&amp;nbsp; A dimension that can be used by multiple facts in multiple star schemas is called a&lt;/span&gt;&lt;span style="color: #333333;"&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;conformed dimension&lt;/span&gt;&lt;/i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-top: 10.0pt;"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Primary and Foreign Key Relationships&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;It is very important that any foreign key in a fact joins on the primary key in a dimension to return exactly one row from the dimension. Otherwise there would be duplication and/or erroneous sums and counts caused by the join.&amp;nbsp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;When assigning the dimension key as a foreign key in a fact, all details of the dimension must be correct for that individual fact record.&amp;nbsp; To demonstrate this requirement we will use a dimension named DimProblem.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;DimProblem (&lt;u&gt;ProblemID&lt;/u&gt;, ICD9, Snomed, ShortDescription, LongDescription)&lt;/span&gt;&lt;/b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;DimProblem&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; mso-padding-alt: 0in 0in 0in 0in;"&gt;&lt;tbody&gt;&lt;tr&gt;   &lt;td style="border: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;ProblemID&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-left-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 81.0pt;" valign="top" width="108"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;ICD9&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-left-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 63.0pt;" valign="top" width="84"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;Snomed&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-left-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 1.5in;" valign="top" width="144"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;ShortDescription&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-left-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 153.9pt;" valign="top" width="205"&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;LongDescription&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;tr&gt;   &lt;td style="border-top: none; border: solid windowtext 1.0pt; mso-border-top-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"&gt;&lt;div class="MsoNormal"&gt;100223&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 81.0pt;" valign="top" width="108"&gt;&lt;div class="MsoNormal"&gt;401.0&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 63.0pt;" valign="top" width="84"&gt;&lt;div class="MsoNormal"&gt;10020781&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 1.5in;" valign="top" width="144"&gt;&lt;div class="MsoNormal"&gt;Hypertension&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 153.9pt;" valign="top" width="205"&gt;&lt;div class="MsoNormal"&gt;Hypertension, malignant&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;tr&gt;   &lt;td style="border-top: none; border: solid windowtext 1.0pt; mso-border-top-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"&gt;&lt;div class="MsoNormal"&gt;100224&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 81.0pt;" valign="top" width="108"&gt;&lt;div class="MsoNormal"&gt;401.1&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 63.0pt;" valign="top" width="84"&gt;&lt;div class="MsoNormal"&gt;10015489&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 1.5in;" valign="top" width="144"&gt;&lt;div class="MsoNormal"&gt;Hypertension&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 153.9pt;" valign="top" width="205"&gt;&lt;div class="MsoNormal"&gt;Hypertension, benign&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;tr&gt;   &lt;td style="border-top: none; border: solid windowtext 1.0pt; mso-border-top-alt: solid windowtext 1.0pt; padding: 0in 5.4pt 0in 5.4pt; width: 72.9pt;" valign="top" width="97"&gt;&lt;div class="MsoNormal"&gt;100225&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 81.0pt;" valign="top" width="108"&gt;&lt;div class="MsoNormal"&gt;401.9&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 63.0pt;" valign="top" width="84"&gt;&lt;div class="MsoNormal"&gt;10015489&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 1.5in;" valign="top" width="144"&gt;&lt;div class="MsoNormal"&gt;Hypertension&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; padding: 0in 5.4pt 0in 5.4pt; width: 153.9pt;" valign="top" width="205"&gt;&lt;div class="MsoNormal"&gt;Hypertension, Unspecified&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;ICD9 and Snomed are standardized coding systems for medical diagnoses. If a patient visit is for hypertension, then there could be multiple ICD9 codes and Snomed codes for the problem associated with the visit.&amp;nbsp; To simply know that the problem is hypertension for a given record in a fact table is not enough.&amp;nbsp; The other details such as specific ICD9 and Snomed codes must also be known before inserting the fact record so that the correct ProblemID will be used.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Many-to-One Relationships&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;In the above example there are multiple ICD9 and Snomed values for one ShortDescription. A data modeler with a relational database modeling background may be tempted to normalize or snowflake the dimension to handle the many-to-one relationships.&amp;nbsp; However, a correctly designed star schema requires that a record in the DimProblem table contains not one record for a given problem description, but rather one record for the unique combination of all columns.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;In this case, facts that are associated with all of the relevant ICD9 or Snomed values can be selected or grouped using the ShortDescription.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;SELECT f.ClinicianID, f.PatientID, COUNT(*) AS BPVisits&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;FROM FactVisit f JOIN DimProblem d ON f.ProblemID = d.ProblemID&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;WHERE d.ShortDescription = ‘Hypertension’&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;GROUP BY f.ClinicianID, f.PatientID&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;In the above query, all ICD9 and Snomed codes with ShortDescription = ‘Hypertension’ will be selected.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Another common design error is to create the dimension so that there would have to be further join conditions on the dimension record.&amp;nbsp; For example, if the dimension key in a fact table references more than one record in the dimension table, then an additional qualifier in the WHERE clause would have to be added to a query to make sure that the right dimension row gets used.&amp;nbsp; As a rule, there shouldn’t be any additional qualifiers for a dimension other than the primary-foreign key relationship with the fact.&amp;nbsp; A given fact record should never be able to join on multiple records in a given dimension.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-top: 10.0pt;"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Star Joins&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;A&lt;/span&gt;&amp;nbsp;&lt;i&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;star join&lt;/span&gt;&lt;/i&gt;&amp;nbsp;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;is simply a set of joins of the foreign keys in the fact table with primary keys in the dimensions in order to select or group on the various attributes in the dimensions.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;A full star join which utilizes the full star schema could look like this.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;SELECT dc.ClinicianName,&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;dpt.PatientName,&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;dpr.ShortDescription AS Problem,&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(fv.Copay) AS TotalCopays,&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SUM(fv.BillAmount) As TotalBilled&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;FROM FactVisit fv&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;JOIN DimClinicians dc ON dc.ClinicianID = fv.ClinicianID&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;JOIN DimPatients dpt ON dpt.PatientID = fv.PatientID&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;JOIN DimProblems dpr ON dpr.ProblemID = fv.ProblemID&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;WHERE dpr.ShortDescription = ‘Hypertension’&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; font-family: 'Courier New';"&gt;GROUP BY dc.ClinicName, dpt.PatientName, dpr.ShortDescription&lt;/span&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Developers and architects must be cautious of performance issues that could result from star joins on multiple dimensions.&amp;nbsp; This is especially true if a dimension is very large.&amp;nbsp; For example, in the above query there could be several million rows in DimPatient joining tens of millions of rows in FactVisits.&amp;nbsp; Since star joins are typically on upwards of 4 or 5 tables, the performance issues increase significantly.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-top: 10.0pt;"&gt;&lt;b&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; color: #4f81bd; font-family: Cambria; font-size: 13pt;"&gt;Non-Dimensional Tables&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;When Kimball described the dimensional model, it was before the way many modern enterprise data warehouses are generally architected today.&amp;nbsp; The star schemas in his examples that he called “data warehouses” would be called “data marts by many data warehouse architects” &amp;nbsp;&amp;nbsp;A data warehouse typically contains large denormalized history table and reference data that are the data sources for data marts, and these tables are rarely dimensional model.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;Data in Operational Data Stores (ODS) also may not be in the dimensional model because the time it takes to build or update a star schema may not allow near-real-time data requirements of an ODS.&amp;nbsp; Other non-dimensional tables can include data used in statistical analyses (e.g. SAS data sets) that requires different structures.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;br /&gt;Typically, dimensional modeling and star schema are only used in data marts that support specific applications. &amp;nbsp;The dimensional tables and non-dimensional table types can be organized and easily identified using separate schemas and consistent naming conventions.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-2518305389196545812?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/2518305389196545812/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/09/dimensional-model.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/2518305389196545812'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/2518305389196545812'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/09/dimensional-model.html' title='The Dimensional Model'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-8AgiZFI9fCc/Taxzkkp8RCI/AAAAAAAAAH8/2OjGwD1hKyU/s72-c/star.GIF' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-5594027137993173586</id><published>2011-04-18T10:21:00.000-07:00</published><updated>2011-08-22T05:54:02.389-07:00</updated><title type='text'></title><content type='html'>&lt;h1&gt;Slowly Changing Dimensions&lt;/h1&gt;&lt;div class="MsoNormal"&gt;Slowly changing dimensions in dimensional modeling are commonly misunderstood.&amp;nbsp; Chapter 6 of &lt;span class="BookTitle"&gt;&lt;span style="font-family: Calibri;"&gt;The Data Warehouse Toolkit&lt;/span&gt;&lt;/span&gt; by Ralph Kimball describes slowly changing dimensions in detail.&amp;nbsp; This posting attempts to provide a summary explanation.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;A &lt;i style="mso-bidi-font-style: normal;"&gt;slowly changing dimension&lt;/i&gt; is a dimension for which any of its attributes (column values) can change over time.&amp;nbsp; The attributes change infrequently.&amp;nbsp; Since the whole purpose of a dimension table is to aggregate on its attributes, there must be a systematic way of handling the changes.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;For example, suppose there is a patient dimension with health plan information columns for each patient.&amp;nbsp; If a patient changes health plans, then the values for the health plan related columns will change.&amp;nbsp; For example:&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="text-indent: .5in;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&amp;nbsp;Medical Record #&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Plan Name&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Plan ID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Effective Date &lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Before&amp;nbsp;&amp;nbsp;&amp;nbsp;5552223333&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Tufts Health Plan&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2233221122&amp;nbsp;&amp;nbsp; &amp;nbsp; 1/1/2005&lt;/div&gt;&lt;div class="MsoNormal"&gt;After&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;5552223333&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BCBSMA&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4442227777&amp;nbsp;&amp;nbsp; &amp;nbsp; 1/1/2009&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;To demonstrate the different approaches to making changes to slowly changing dimensions, we will create an example table DimPatient.&amp;nbsp; The primary key is PatientID, and the plan name, plan ID, and effective date are the slowly changing attributes. (Of course, in a real-world patient dimension there would probably be many more attributes.)&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;DimPatient (&lt;u&gt;PatientID&lt;/u&gt;, MRN, PlanName, PlanID, EffDate)&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Ralph Kimball proposes three types of slowly changing dimensions where the definition of each type is based on how the data is updated:&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Type I&lt;/b&gt;:&amp;nbsp; Simply update the changed attributes.&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Type II&lt;/b&gt;: Add a new dimension record with a new primary key and the new attribute values.&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Type III&lt;/b&gt;: Create separate columns for the original and current values (e.g. OriginalPlanID, CurrentPlanID)&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;There are obvious advantages and disadvantages for each of Kimball’s types of slowly changing dimensions.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Type I Slowly Changing Dimension&lt;/h3&gt;&lt;div class="MsoNormal"&gt;The advantage of this type of slowly changing dimension is that it is the easiest to update and maintain.&amp;nbsp; For any given change, there is only one record that needs to be updated and that is identified by the primary key.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New'; font-size: 10pt;"&gt;UPDATE DimPatient &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New'; font-size: 10pt;"&gt;SET PlanName = ‘BCBSMA’, PlanID = ‘4442227777’, EffDate = ‘1/1/2009’&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New'; font-size: 10pt;"&gt;WHERE PatientID = 111222333&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The main disadvantage of a Type I slowly changing dimension is that history is lost.&amp;nbsp; Suppose there is a query to get amounts charged to insurance companies by year.&amp;nbsp; To demonstrate this we will include example tables FactVisits and DimDate.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;strong&gt;&lt;span style="font-family: Calibri;"&gt;FactVisit (&lt;u&gt;FactVisitID&lt;/u&gt;, VisitDateID, VisitMinutes, ClinicianID, PatientID, ProblemID, CopayAmount, BillAmount)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;DimDate (DateID, Year, Month, Fiscal Year, Quarter, BusinessDay)&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;A star join uses DimPatient to see which health plan was billed by year.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: 'Courier New';"&gt;SELECT dd.Year, dp.HealthPlan, SUM(fv.BillAmount)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;FROM FactVisit fv&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;JOIN DimDate dd ON dd.DateID = fv.VisitDateID&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;JOIN DimPatient dp ON dp.PatientID = fv.PatientID&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;GROUP BY dd.Year, dp.HealthPlan&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Because a Type I slowly changing dimension looses history, all amounts billed for our example patient will show up billed to ‘BCBSMA’ even though amounts billed prior to 2009 were billed to another plan.&amp;nbsp; For this reason, Type I slowly changing dimensions are rarely useful in a dimensional model.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Type II Slowly Changing Dimension&lt;/h3&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;This type of slowly changing dimension simply adds a new record with a new primary key.&amp;nbsp; All things considered, this is the most useful type of slowly changing dimension because the history is automatically partitioned, and nothing else needs to be done when creating star joins.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;If the DimPatient dimension is implemented as a Type II slowly changing dimension, another record would simply be added when a patient’s health plan changes.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 480;"&gt;&lt;tbody&gt;&lt;tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;"&gt;   &lt;td style="border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;PatientID&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;MRN&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;PlanName&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;PlanID&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.8pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;EffDate&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;tr style="mso-yfti-irow: 1;"&gt;   &lt;td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;1000222&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;5552223333&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Tufts Health Plan&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;2233221122&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.8pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;1/1/2005&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;tr style="mso-yfti-irow: 2; mso-yfti-lastrow: yes;"&gt;   &lt;td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;2000444&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;5552223333&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;BCBSMA&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.75pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;4442227777&lt;/div&gt;&lt;/td&gt;   &lt;td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 95.8pt;" valign="top" width="128"&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;1/1/2009&lt;/div&gt;&lt;/td&gt;  &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Note that the new record has a different value in the primary key column PatientID.&amp;nbsp; The MRN is used to identify the patient and is unchanged.&amp;nbsp; The “slowly changing” attributes PlanName and PlanID contain the new values.&amp;nbsp; The effective date is for reference purposes only and does not need to be included in the join conditions.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;In the previous query, the billing amounts for the example patient will show up billed to ‘Tufts Health Plan’ for all charges prior to 2009, and to ‘BCBSMA’ for all charges since the beginning of 2009.&amp;nbsp; In Kimball’s words:&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="Quote"&gt;The Type Two slowly changing dimension automatically partitions history and an application must not be required to place any time constraints on effective dates in the dimension.&lt;a href="file://sfa28/qdwmart$/RandyGrenier/Misc/Slowly_Changing_Dimensions.doc#_ftn1" name="_ftnref1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri;"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;span style="color: black; font-family: Calibri; font-size: 12pt;"&gt;[1]&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="Quote"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;One with entity-relationship modeling background may protest that we are creating two records for the same entity with different primary keys (i.e. two PatientID’s).&amp;nbsp; However, for dimension tables the primary key uniquely identifies the &lt;u&gt;combination of all attributes&lt;/u&gt; in the row.&amp;nbsp; The patient in this case is actually identified by the Medical Record Number (MRN) which will contain the same value in both records.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Type III Slowly Changing Dimension&lt;/h3&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;This type would contain “original” and “current” columns for the changed attribute.&amp;nbsp; In our example, we would add the following columns to our dimension:&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; tab-stops: 106.5pt;"&gt;OriginalPlanName&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; tab-stops: 106.5pt;"&gt;CurrentPlanName&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; tab-stops: 106.5pt;"&gt;OriginalPlanID&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; tab-stops: 106.5pt;"&gt;CurrentPlanID&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Although this may be satisfactory for some simple data mart applications, it is a rarely useful type of slowly changing dimension with several disadvantages.&amp;nbsp; &lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;A Type III slowly changing dimension does not partition by time.&amp;nbsp; We couldn’t use effective date columns in queries because we would be adding additional qualifiers on the dimension to be used in a star join which is not correct design.&amp;nbsp; Even if we did use effective dates,&amp;nbsp; that would only give us the starting point for the current value.&amp;nbsp; &lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Also, adding more and more columns to the dimension could potentially get out of control.&amp;nbsp; Keep in mind that the whole purpose of dimensional modeling is to simplify queries.&amp;nbsp; Analysts and ETL developers would have to consider each changing attribute of each slowly changing dimension for every query.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Another disadvantage of a Type III slowly changing dimension is that it only allows for a single change to an attribute.&amp;nbsp; Suppose in our example that a patient changed health plans three or four times over the time window of the data.&amp;nbsp; A Type III slowly changing dimension would not work.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3&gt;Using Changing Dimension Attributes in WHERE Clauses&lt;/h3&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Using changing attributes of slowly changing dimensions in a WHERE clause of a star join has the risk of either loosing data we want to include or including data that we don’t want to include.&amp;nbsp; When we query on a previous value of a dimension attribute, we loose new data, and when we query on a newer value, we loose previous data.&amp;nbsp; &lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;For example, a dimension named DimDiagnosis contains diagnosis codes and descriptions.&amp;nbsp; The descriptions are slowly changing attributes. &lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;DimDiagnosis (&lt;u&gt;DiagnosisID&lt;/u&gt;, ICD9Code, Description)&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Previous Values&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;DiagnosisID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ICD9Code&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Description&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;10000222&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 401.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hypertension Benign&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&amp;nbsp;New Values&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;DiagnosisID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ICD9Code&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Description&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;10000244&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 401.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hypertension, benign&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Note that the Description value has changed slightly.&amp;nbsp; If a WHERE clause condition on this dimension contains previous description, it will lose data for the new description.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;WHERE dimDiagnosis.Description = ‘Hypertension Benign’&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;One way to address this problem is to include a column with a stable value.&amp;nbsp; For example, a column could be added to the DimDiagnosis dimension named DimDescription which would contain a value that doesn’t change when the Description column changes.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Previous Values&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;DiagnosisID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ICD9Code&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Description&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DimDescription&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;10000222&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 401.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hypertension Benign&amp;nbsp; HYPERTENSION BENIGN&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&amp;nbsp;New Values&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;DiagnosisID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ICD9Code&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Description&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DimDescription&lt;o:p&gt;&lt;/o:p&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;10000244&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 401.1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hypertension, benign&amp;nbsp; HYPERTENSION BENIGN&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Then the where clause of a query could use the unchanging attribute DimDescription&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;WHERE dimDiagnosis.DimDescription = ‘HYPERTENSION BENIGN’&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;The changed attributes can still be used in groupings and aggregations.&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;SELECT d.Description, f.VisitDate, COUNT(*)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;FROM FactVisits v &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;JOIN DimDiagnosis d ON d.DiagnosisID = f.DiagnosisID&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;WHERE dimDiagnosis.DimDescription = ‘HYPERTENSION BENIGN’&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;span style="font-family: 'Courier New';"&gt;GROUP BY d.Description, f.VisitDate&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Note that the changed Description value is used in the grouping while the unchanged DimDescription column is used in the WHERE clause.&amp;nbsp; &lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="tab-stops: 106.5pt;"&gt;Restraint should be used when adding additional columns to any dimension so as to avoid adding unnecessary complexity complicating queries rather than simplifying them.&lt;/div&gt;&lt;div style="mso-element: footnote-list;"&gt;&lt;br /&gt;&lt;hr align="left" size="1" width="33%" /&gt;&lt;div id="ftn1" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file://sfa28/qdwmart$/RandyGrenier/Misc/Slowly_Changing_Dimensions.doc#_ftnref1" name="_ftn1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri;"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri; font-size: 10pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri;"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri; font-size: 10pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; Kimball, Ralph, &lt;u&gt;The Data Warehouse Toolkit&lt;/u&gt;. &lt;st1:place w:st="on"&gt;&lt;st1:state w:st="on"&gt;New York&lt;/st1:state&gt;&lt;/st1:place&gt;: John Wiley &amp;amp; Sons, Inc., 1996, p. 103&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-5594027137993173586?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/5594027137993173586/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/04/slowly-changing-dimensions-slowly.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/5594027137993173586'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/5594027137993173586'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/04/slowly-changing-dimensions-slowly.html' title=''/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-8923212938966153896</id><published>2011-03-08T12:03:00.000-08:00</published><updated>2011-03-12T10:19:21.073-08:00</updated><title type='text'>Partitioning Data Warehouse Data</title><content type='html'>&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Introduction&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;W.H. Inman describes &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;i&gt;partitioning &lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;in data warehousing as "the breakup of data into separate physical units that can be handled independently." &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;a href="file:///C:/Documents%20and%20Settings/rg738/My%20Documents/Personal/Blog_partitioning.doc#_ftn1" name="_ftnref1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Arial; font-size: 12pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; For very large data sets, rows of data of the same type are separated into subsets called &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;i&gt;partitions&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;.&amp;nbsp; Partitions provide advantages for both database administration and for query performance.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Partitioning and Database Administration&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;Database administration tasks such as moving, replicating, exporting, and archiving data can be greatly facilitated by partitioning.&amp;nbsp; An obvious fact is that it is easier to work with several small units than one massive unit.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;For example, if a data warehouse contains a rolling 5-year window of sales order data, each month the the oldest partition (older than current month minus 5 years) can be archived and dropped without affecting the other partitions.&amp;nbsp; No massive deletes.&amp;nbsp; No index rebuilds.&amp;nbsp; No regeneration of system statistics. Similarly, new partitions can be created and new incoming data can be inserted into those partitions without affecting the indexing or the system statistics of the older partitions.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Other partitioning features that can facilitate database administration include the ability to easily relocate partitions to different file systems and disks.&amp;nbsp; For example, Oracle allows a DBA to easily move data between partitions and individual tables and between different tablespaces.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Partitioning and Query Performance&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;For data warehouse developers and users, partitioning is very important for improving the performance of queries and ETL against very large data sets.&amp;nbsp; Queries can be run against separate partitions, and in modern RDBMS systems such as Oracle and SQL Server, intelligence is built-in so that queries against very large data sets can select only the relevant partitions.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Before partitioning technology was introduced in Oracle 8, data warehouses frequently broke very large data sets into separate tables.&amp;nbsp; For example, instead of a very large SalesOrders table containing several years of data, smaller tables each containing one month of data could be created. &amp;nbsp;Tables could be named SalesOrders199901, SalesOrders199902, SalesOrders199903, and so on, with the YYYYMM portion of the name indicating which month of data the table contains. &amp;nbsp;Queries could be run against each monthly table (i.e. partition) for the relevant date ranges, and the separate results could be gathered with UNION operators. &amp;nbsp;For example:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders199901 WHERE RegionCode = 5572&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&lt;st1:place w:st="on"&gt;&lt;span style="font-family: Arial;"&gt;UNION&lt;/span&gt;&lt;/st1:place&gt;&lt;span style="font-family: Arial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders199902 WHERE&amp;nbsp; RegionCode = 5572&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&lt;st1:place w:st="on"&gt;&lt;span style="font-family: Arial;"&gt;UNION&lt;/span&gt;&lt;/st1:place&gt;&lt;span style="font-family: Arial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders199903 WHERE&amp;nbsp; RegionCode = 5572 ;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Alternatively, each result could be inserted into a temporary table, and the temporary tables can then be gathered with&amp;nbsp; UNION operators.&amp;nbsp; That may reduce system memory management requirements and further improve performance.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Either way, results would be returned much more quickly by querying only the relevant months rather than querying a very large table containing multiple years of data.&amp;nbsp; This performance advantage would be realized in complex real world queries and over many partitions as well.&amp;nbsp; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Now that Oracle Enterprise Edition includes partitioning capabilities, single tables can now be created containing multiple partitions.&amp;nbsp; For example, instead of using separate monthly sales order tables as in the example above, the data could be put into one large partitioned table named SalesOrder.&amp;nbsp; The SalesOrder table could contain multiple partitions such as P199901, P199902, P199903, and so on based on the OrderDate column. Partition P199901 would contain only data where OrderDate is in January, 1999, P199902 would contain February 1999 data, and so on.&amp;nbsp; If a query against the SalesOrder table contains OrderDate in the WHERE clause, only the relevant partitions would be queried.&amp;nbsp; Oracle calls this &lt;i&gt;partition pruning&lt;/i&gt;.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders WHERE OrderDate BETWEEN '01/01/1999' AND '03/31/1999'&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;For the above query, only the P199901, P199902, and P19993 partitions would be queried.&amp;nbsp; Without partition pruning, many years of data may have to be scanned to find data with the appropriate date range.&amp;nbsp; For queries that return large numbers of rows in the results, partition pruning provides much better query performance than B-tree indexes.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Frequently, the column value on which a table is partitioned is not useful in the query.&amp;nbsp; Perhaps an entire 5 years of data needs to be queried, or perhaps there are other selection criteria in the WHERE clause that have nothing to do with the column used for partitioning.&amp;nbsp; In these cases, better performance can be gained by writing the query against each partition.&amp;nbsp; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders PARTITION P199901 WHERE RegionCode = 5572 AND OrderType = 'V' &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&lt;st1:place w:st="on"&gt;&lt;span style="font-family: Arial;"&gt;UNION&lt;/span&gt;&lt;/st1:place&gt;&lt;span style="font-family: Arial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders PARTITION P199902 WHERE RegionCode = 5572 AND OrderType = 'V' &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&lt;st1:place w:st="on"&gt;&lt;span style="font-family: Arial;"&gt;UNION&lt;/span&gt;&lt;/st1:place&gt;&lt;span style="font-family: Arial;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT * FROM SalesOrders PARTITION P199903 WHERE RegionCode = 5572 AND OrderType = 'V' &lt;/span&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;Notice that this query is very much like writing against separate tables (i.e. SalesOrders199901, SalesOrders199102, and SalesOrders199903).&amp;nbsp; One of the main advantages of the partitioning feature in an RDBMS is to simplify queries, but this example demonstrates that that doesn't always turn out to be the case.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;In addition to partition pruning, performance is sometimes improved by the use of &lt;i&gt;parallel processing&lt;/i&gt;.&amp;nbsp; Queries written to use parallelism can query multiple partitions at the same time.&amp;nbsp; Even better performance can be gained by parallel processes against partitions stored on separate disks.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Partitioning with Oracle&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;Oracle has rich &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b32024/partition.htm"&gt;partitioning &lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;features including multiple partition types (range, list, hash), automatic partition pruning, and &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;i&gt;subpartitioning&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="font-family: Arial;"&gt;&lt;br /&gt;S&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;ubpartitioning is simply the partitioning of partitions. &amp;nbsp;A subpartition can be created using a different column than the main partition. &amp;nbsp;For example, the SalesOrder table which is partitioned on OrderDate can contain a subpartition on RegionCode which would further improve performance on the example query above.&lt;/span&gt;&lt;span style="font-family: Arial;"&gt;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;Partitions are &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_7002.htm#i2215406"&gt;created &lt;/a&gt;using the CREATE TABLE syntax when creating a table.&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;Partitions can be &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2085640"&gt;altered&lt;/a&gt;, &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2131064"&gt;dropped&lt;/a&gt;&amp;nbsp;, &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#BABEDEJE"&gt;renamed&lt;/a&gt;, &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2131210"&gt;truncated&lt;/a&gt;, or &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2131032"&gt;moved&amp;nbsp;&lt;/a&gt;using the ALTER TABLE syntax. Data can be &lt;a href="http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_3001.htm#i2131250"&gt;exchanged &lt;/a&gt;between partitions and other tables also using the ALTER TABLE syntax.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Partitioning with SQL Server&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;SQL Server implements partitioning differently.&amp;nbsp; Initially, SQL Server implemented partitioning as &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;i&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/aa933141(v=sql.80).aspx"&gt;partitioned views&lt;/a&gt;&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;.&amp;nbsp; This method utilized separate tables for each partition, but those tables are brought together in a view with UNION operations. First you create the tables (i.e. partitions) with the CREATE TABLE syntax.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;CREATE TABLE SalesOrders199901&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;(OrderID int, CustomerID int, PartID int, OrderDate datetime, OrderAmt money,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT CK_OrderDate199901 CHECK (OrderDate BETWEEN '01/01/1999' AND '01/31/1999') ,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT PK_SalesOrders19991 PRIMARY KEY (OrderID, OrderDate) &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;CREATE TABLE SalesOrders199902&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;(OrderID int, CustomerID int, PartID int, OrderDate datetime, OrderAmt money,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT CK_OrderDate199902 CHECK (OrderDate BETWEEN '02/01/1999' AND '02/28/1999') ,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT PK_SalesOrders19992 PRIMARY KEY (OrderID, OrderDate) &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;CREATE TABLE SalesOrders199903&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;(OrderID int, CustomerID int, PartID int, OrderDate datetime, OrderAmt money,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT CK_OrderDate199903 CHECK (OrderDate BETWEEN '03/01/1999' AND '03/31/1999') ,&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CONSTRAINT PK_SalesOrders19993 PRIMARY KEY (OrderID, OrderDate) &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="color: black; font-family: Arial;"&gt;Then the partitioned view is created on the range tables using CREATE VIEW.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;CREATE VIEW SalesOrders WITH SCHEMABINDING AS&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT OrderID, CustomerID, PartID, OrderDate, OrderAmt FROM SalesOrders199901&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;UNION ALL&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT OrderID, CustomerID, PartID, OrderDate, OrderAmt FROM SalesOrders199902&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;UNION ALL&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span style="color: black; font-family: Arial;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;SELECT OrderID, CustomerID, PartID, OrderDate, OrderAmt FROM SalesOrders199903&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span style="color: black; font-family: Arial;"&gt;The view becomes an &lt;i&gt;updateable view&lt;/i&gt;.&amp;nbsp; Inserted data will automatically go into the correct partitions, and queries using OrderDate in the WHERE clause will automatically select only the relevant partitions.&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;br /&gt;In SQL Server 2005, real &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/ms345146(v=sql.90).aspx"&gt;table partitioning&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt; was introduced, but it is awkward and more difficult to implement than partitioning in Oracle.&amp;nbsp; Table partitioning in SQL Server is done in three steps.&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; mso-layout-grid-align: none; mso-list: l0 level1 lfo1; mso-pagination: none; text-autospace: none; text-indent: -.5in;"&gt;&lt;span style="font-family: Symbol;"&gt;·&lt;span style="font: normal normal normal 7pt/normal 'Times New Roman';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: Arial;"&gt;Create a &lt;i&gt;partition function &lt;/i&gt;which defines the range or list values for each partition (&lt;a href="http://msdn.microsoft.com/en-us/library/ms187802.aspx"&gt;CREATE PARTITION FUNCTION&lt;/a&gt;).&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; mso-layout-grid-align: none; mso-list: l0 level1 lfo1; mso-pagination: none; text-autospace: none; text-indent: -.5in;"&gt;&lt;span style="font-family: Symbol;"&gt;·&lt;span style="font: normal normal normal 7pt/normal 'Times New Roman';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: Arial;"&gt;Create a &lt;i&gt;partition scheme &lt;/i&gt;which maps partition to file groups on disk (&lt;a href="http://msdn.microsoft.com/en-us/library/ms179854.aspx"&gt;CREATE PARTITION SCHEME&lt;/a&gt;).&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-left: .5in; mso-layout-grid-align: none; mso-list: l0 level1 lfo1; mso-pagination: none; text-autospace: none; text-indent: -.5in;"&gt;&lt;span style="font-family: Symbol;"&gt;·&lt;span style="font: normal normal normal 7pt/normal 'Times New Roman';"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: Arial;"&gt;Create a table that specifies the partition scheme and which columns are used in the partitioning (&lt;a href="http://msdn.microsoft.com/en-us/library/aa258255(v=sql.80).aspx"&gt;CREATE TABLE&lt;/a&gt;).&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;br /&gt;The complexity of implementing table partitioning in SQL Server is unfortunate and probably deters widespread use.&amp;nbsp; Partitioned views are currently easier to implement and manage in SQL Server.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;b&gt;&lt;span style="font-family: Arial;"&gt;Summary&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;Partitioning data into smaller sets of rows is useful if not essential for large data warehouses and analytical databases.&amp;nbsp; Partitioning can simplify location and movement of data for database administrators.&amp;nbsp; It provides significant performance gains for queries and ETL processes especially when strategically placing partitions across file systems and when used in parallel processing.&amp;nbsp; Oracle provides more extensive partitioning capabilities which are relatively easiy to implement.&amp;nbsp; Partitioning in SQL Server has less features and is more difficult to implement.&amp;nbsp; However, partitioned views are easy to implement in SQL Server and can provide most of the advantages of partitioning.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="mso-layout-grid-align: none; mso-pagination: none; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="mso-element: footnote-list;"&gt;&lt;hr align="left" size="1" width="33%" /&gt;&lt;div id="ftn1" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file:///C:/Documents%20and%20Settings/rg738/My%20Documents/Personal/Blog_partitioning.doc#_ftnref1" name="_ftn1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: 'Times New Roman'; font-size: 10pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; W.H. Inman, &lt;i style="mso-bidi-font-style: normal;"&gt;Building the Data Warehouse, Second Edition&lt;/i&gt; (&lt;st1:state w:st="on"&gt;&lt;st1:place w:st="on"&gt;New York&lt;/st1:place&gt;&lt;/st1:state&gt;: John Wiley &amp;amp; Sons, Inc., 1996), 55.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-8923212938966153896?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/8923212938966153896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/03/partitioning-data-warehouse-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8923212938966153896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8923212938966153896'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/03/partitioning-data-warehouse-data.html' title='Partitioning Data Warehouse Data'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-3392402306161140892</id><published>2011-02-05T11:13:00.000-08:00</published><updated>2011-03-12T10:22:43.664-08:00</updated><title type='text'>Operational Data Stores (ODS)</title><content type='html'>&lt;h2&gt;Introduction&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;An &lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;operational data store (ODS)&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt; is an architectural component of a data warehouse that is used for immediate reporting with current operational data.&amp;nbsp; An ODS contains lightly transformed and lightly integrated operational data with a short time window.&amp;nbsp; It is used for real time and near real time reporting.&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;Unlike data marts, an ODS is not refreshed from the data warehouse history tables.&amp;nbsp; Rather it is directly loaded from operational data, staging area, or incoming files.&amp;nbsp; It can optionally serve as a data source for the data warehouse.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;&lt;i&gt;Click on any image to see full size.&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_RROpblKT0Fw/TU2hTvkkVlI/AAAAAAAAAGs/N3HjmjMQt_A/s1600/ODS1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://3.bp.blogspot.com/_RROpblKT0Fw/TU2hTvkkVlI/AAAAAAAAAGs/N3HjmjMQt_A/s320/ODS1.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;An ODS must be frequently refreshed so that it contains very current data.&amp;nbsp; An ODS can be updated daily, hourly, or even immediately after transactions on operational data.&lt;/div&gt;&lt;h2&gt;Transformation and Integration&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;The update frequency and currency of the data in the ODS is directly related to the amount of transformation and integration that is performed on the data.&amp;nbsp; Choices are made by the development team based on how long various transformation processes will take to complete vs. how current the data must be for reporting.&amp;nbsp; For example, if real time reports or dashboards require data within minutes or even seconds after data events, it may not be possible to do time-consuming transformation or integration processing.&amp;nbsp; Also, dimensions and other reference data in the data warehouse may not be as current as new operational data.&amp;nbsp;&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;Some degree of transformation and/or integration is usually required for reports.&amp;nbsp; Bill Inman defines five ODS classes&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftn1" name="_ftnref1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 12pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;.&amp;nbsp; The classes represent different levels of ease and speed of refresh vs. the degree of integration and transformation.&amp;nbsp; For example, a Class I ODS would simply consist of direct replication of operational data (no transformation), where a Class V ODS would consist of highly integrated and aggregated data (highly transformed).&amp;nbsp; A Class I ODS would be the quickest and simplest to refresh, while a Class V ODS would involve the most complex, time-consuming processing.&lt;/div&gt;&lt;h2&gt;Refresh Options&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;One option for refreshing an ODS with very current data is to use the &lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;transaction logs&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt; of the operational data to update data replicated in the ODS.&amp;nbsp; Database systems use transaction logging to record all updates, inserts, and deletes to tables. Transaction logs are normally used for rolling back invalid transactions or for applying changes to data that were not completed due to a system failure.&amp;nbsp; However, if the tables are replicated in the ODS, the transaction logs can also be used to refresh them.&amp;nbsp; The replicated tables could then be used as staging for reporting tables.&amp;nbsp; In SQL Server and in Oracle using transaction logs to update replicated data is called &lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;Change Data Capture (CDC).&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftn2" name="_ftnref2" style="mso-footnote-id: ftn2;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 12pt;"&gt;[2]&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;&amp;nbsp; Those products provide system stored procedures and other tools to assist in applying transaction logging to replicated data.&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_RROpblKT0Fw/TU2iRj_RmEI/AAAAAAAAAGw/LYyYk4MNKMk/s1600/ODS2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="198" src="http://1.bp.blogspot.com/_RROpblKT0Fw/TU2iRj_RmEI/AAAAAAAAAGw/LYyYk4MNKMk/s320/ODS2.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Another option for keeping current data in an ODS is the use of &lt;i style="mso-bidi-font-style: normal;"&gt;indexed views&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftn3" name="_ftnref3" style="mso-footnote-id: ftn3;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 12pt;"&gt;[3]&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/i&gt; (SQL Server) or &lt;i style="mso-bidi-font-style: normal;"&gt;materialized views&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftn4" name="_ftnref4" style="mso-footnote-id: ftn4;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 12pt;"&gt;[4]&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;/i&gt; (Oracle).&amp;nbsp; Indexed views and materialized views are similar to regular views except that they provide high performance when the view is queried.&amp;nbsp; These high-performance views created in the ODS schema point at operational data.&amp;nbsp; Views can provide a relatively simple way to keep an ODS very current.&amp;nbsp; Views cannot be used when it is necessary to access data on remote servers.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_RROpblKT0Fw/TU2iwfJTgKI/AAAAAAAAAG0/gsfh99J6OF0/s1600/ODS3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="254" src="http://3.bp.blogspot.com/_RROpblKT0Fw/TU2iwfJTgKI/AAAAAAAAAG0/gsfh99J6OF0/s320/ODS3.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;File system mirroring&lt;/i&gt; works at a file system level and uses mirrored data to refresh the ODS.&amp;nbsp; For example, EMC’s proprietary &lt;i style="mso-bidi-font-style: normal;"&gt;Business Control Volume (BCV)&lt;/i&gt; disconnects the mirror so that it contains a snapshot of data at a given point in time.&amp;nbsp; After the snapshot has been used to update the ODS, it is reconnected to the source data and brought back into synch by the system.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_RROpblKT0Fw/TU2jGUJZt3I/AAAAAAAAAG4/qJ3IuWD0lNA/s1600/ODS4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://4.bp.blogspot.com/_RROpblKT0Fw/TU2jGUJZt3I/AAAAAAAAAG4/qJ3IuWD0lNA/s320/ODS4.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;Update triggers&lt;/i&gt; can be created on operational data tables to write to the ODS whenever the data is updated. Triggers are infrequently used for ODS refresh because they require modification of the operational data to add the trigger code.&amp;nbsp; Also, because triggers can affect performance, they are not feasible when there is high volume transaction processing.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;When an ODS needs to be updated less frequently, conventional ETL processes can be used.&amp;nbsp; For example, if an ODS is only updated once daily, it may be feasible to export operational data to files which can then be quickly loaded into the ODS.&amp;nbsp;&lt;/div&gt;&lt;h2&gt;ODS Tables&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;An ODS can contain its own staging tables as well as transformed tables for reporting.&amp;nbsp; It is common for an ODS to utilize separate staging tables from the rest of the data warehouse because the ODS is refreshed by separate processes than the data warehouse.&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;Reporting tables are limited by the fact that there may not be as full integration and transformation as there is for data warehouse tables.&amp;nbsp; This is because time consuming ETL processing may not allow for fast access to very current data.&amp;nbsp; Because of this, dimensional modeling (star schema) is not always possible.&lt;/div&gt;&lt;h2&gt;Reporting&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;The main purpose of an ODS is to provide reporting and querying on very current operational data.&amp;nbsp; Reports can only be created on very short time windows of data.&amp;nbsp; To query history, the data warehouse and application-specific data marts must be used instead of the ODS.&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;Reports may also be affected by the limited transformation and integration of data in an ODS.&amp;nbsp; For example, data may not contain the surrogate keys necessary for joins on dimensions in the data warehouse.&amp;nbsp; Data from multiple sources may not be completely integrated into consistent structures and attribute values.&lt;/div&gt;&lt;h2&gt;Summary&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: small; font-weight: normal;"&gt;An ODS can provide access to current operational data for reporting.&amp;nbsp; An ODS is loaded directly from operational data and not from the data warehouse history tables.&amp;nbsp; An ODS only contains a short time window of data.&amp;nbsp; If history is required, the ODS can be a data source for the data warehouse.&lt;/span&gt;&lt;/h2&gt;&lt;div class="MsoNormal"&gt;An ODS must balance the frequency of refresh, the degree of transformation and integration, and how current data in the ODS must be.&amp;nbsp; A number of refresh options should be considered based on application requirements.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;An ODS can be a database or a schema within a database that contains both staging as well as reporting tables.&amp;nbsp; There may be limitations on the reporting tables compared to data mart tables due to the fact that there is less transformation and integration of operational data.&lt;/div&gt;&lt;div style="mso-element: footnote-list;"&gt;&lt;br /&gt;&lt;hr align="left" size="1" width="33%" /&gt;&lt;div id="ftn1" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftnref1" name="_ftn1" style="mso-footnote-id: ftn1;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 10pt;"&gt;[1]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; &lt;a href="http://www.information-management.com/issues/20000101/1749-1.html"&gt;http://www.information-management.com/issues/20000101/1749-1.html&lt;/a&gt; &lt;/div&gt;&lt;/div&gt;&lt;div id="ftn2" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftnref2" name="_ftn2" style="mso-footnote-id: ftn2;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 10pt;"&gt;[2]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; &lt;a href="http://msdn.microsoft.com/en-us/library/cc645937.aspx"&gt;http://msdn.microsoft.com/en-us/library/cc645937.aspx&lt;/a&gt; &lt;/div&gt;&lt;/div&gt;&lt;div id="ftn3" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftnref3" name="_ftn3" style="mso-footnote-id: ftn3;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 10pt;"&gt;[3]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; &lt;a href="http://technet.microsoft.com/en-us/library/cc917715.aspx"&gt;http://technet.microsoft.com/en-us/library/cc917715.aspx&lt;/a&gt; &lt;/div&gt;&lt;/div&gt;&lt;div id="ftn4" style="mso-element: footnote;"&gt;&lt;div class="MsoFootnoteText"&gt;&lt;a href="file:///C:/Users/Randy/Documents/Partners/Operational%20Data%20Stores2.docx#_ftnref4" name="_ftn4" style="mso-footnote-id: ftn4;" title=""&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span class="MsoFootnoteReference"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 10pt;"&gt;[4]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Materialized_view"&gt;http://en.wikipedia.org/wiki/Materialized_view&lt;/a&gt; &lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-3392402306161140892?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/3392402306161140892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2011/02/operational-data-stores-ods.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/3392402306161140892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/3392402306161140892'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2011/02/operational-data-stores-ods.html' title='Operational Data Stores (ODS)'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_RROpblKT0Fw/TU2hTvkkVlI/AAAAAAAAAGs/N3HjmjMQt_A/s72-c/ODS1.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-7201168289750873958</id><published>2010-07-22T18:02:00.000-07:00</published><updated>2011-03-12T10:29:01.032-08:00</updated><title type='text'>Modeling the Data Warehouse</title><content type='html'>&lt;div class="MsoNormal"&gt;If Ralph Kimball were to publish a new revision of his book &lt;i style="mso-bidi-font-style: normal;"&gt;The Data Warehouse Toolkit&lt;/i&gt;, he should globally replace the words “data warehouse” with “data mart.”&amp;nbsp; When he wrote this book in the early 90’s, the nomenclature we use today wasn’t fully clarified.&amp;nbsp; This has caused a lot of confusion in the modeling of data warehouses. &lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Kimball advocated the use of &lt;i style="mso-bidi-font-style: normal;"&gt;dimensional modeling&lt;/i&gt; which utilizes mostly &lt;i style="mso-bidi-font-style: normal;"&gt;star schemas&lt;/i&gt; for use in analytical applications.&amp;nbsp; To put it simply, star schema supports groupings on various &lt;i style="mso-bidi-font-style: normal;"&gt;dimensions&lt;/i&gt; for summarization and reporting.&amp;nbsp;&amp;nbsp;For example, a fact table could contain a transaction date while a date dimension table would contain various attributes for that date including calendar year, fiscal year, quarter, day-of-week, or whatever.&amp;nbsp; Queries could join on this dimension to summarize on these various date attributes.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The problem is that in modern data warehouses, we typically don’t directly query the main data warehouse.&amp;nbsp; Rather we query various data marts that are derived from the data warehouse.&amp;nbsp;&amp;nbsp; In most cases, it makes absolutely no sense to use star schema when modeling a data warehouse. &amp;nbsp;When data can consist upwards of &lt;o:p&gt;a billion records, joining on multiple dimension tables is usually not feasible.&lt;/o:p&gt;&lt;br /&gt;&lt;br /&gt;There is no "one size fits all" that is appropriate for any given data warehouse. &amp;nbsp;However, partitioned tables with partitioning and subpartitioning on the columns most frequently used for loading data warehouses usually makes sense. &amp;nbsp;Data warehouse tables can also be denormalized to avoid the performance issues of joins on very large tables when loading data marts.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Of course, modeling a data warehouse involves a lot including various best practices and business-specific requirements.&amp;nbsp; &amp;nbsp;Following are some of the main considerations when modeling a data warehouse:&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Data should contain detail&lt;/b&gt;.&amp;nbsp; Summarization can be done at the data mart level where appropriate.&amp;nbsp; You can never “un-summarize” data, so the main data repository should contain as much detail as possible.&lt;/li&gt;&lt;li&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;The data warehouse should be modeled on a time basis&lt;/b&gt; in most cases.&amp;nbsp; Data is &lt;i style="mso-bidi-font-style: normal;"&gt;non-volatile&lt;/i&gt; and reflects transactions that have happened at a point in time.&amp;nbsp; There may be adjustment transaction records included, but there are not adjustments to data per say.&lt;/li&gt;&lt;li&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;The data warehouse should contain as much history as possible&lt;/b&gt;.&amp;nbsp; A large time window allows for more analytical possibilities such as trend analyses and historical reporting.&lt;/li&gt;&lt;li&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;The data warehouse should be inclusive—not exclusive&lt;/b&gt;.&amp;nbsp; There should be no integrity constraints, and records should not be excluded because of bad or missing values.&amp;nbsp;&amp;nbsp; Record exclusion can be done at the data mart level.&amp;nbsp; There may be records that have missing or bad values in some columns that can still be useful in some unforeseen analysis or application.&amp;nbsp; There are even cases where analyses can be skewed because of up-front data exclusion.&lt;/li&gt;&lt;li&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;The data warehouse must be designed for performance&lt;/b&gt;.&amp;nbsp; Populating data marts or creating other output from the data warehouse could involve millions or even billions of records.&amp;nbsp; &lt;i style="mso-bidi-font-style: normal;"&gt;B-tree indexes &lt;/i&gt;are typically not used since they are only useful for selecting small subsets of records.&amp;nbsp; Partitioning is crucial and should be based on record selection in queries—not necessarily how the business logically organizes data.&amp;nbsp; Limited use of pre-joins (i.e. denormalization) is acceptable in a data warehouse. &amp;nbsp;Data is non-volatile, so data anomalies caused by updates are not a problem.&amp;nbsp; The use of pre-joins helps to avoid joins on extremely large tables when populating data marts.&amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;div class="MsoListParagraphCxSpLast" style="margin-left: 0in; mso-add-space: auto;"&gt;Data marts which are created by extracting data from the main data warehouse comprise the data that is utilized by users and applications.&amp;nbsp; These can be designed to meet business requirements.&amp;nbsp; They can use exclusion criteria and summarize data.&amp;nbsp;&amp;nbsp; It is entirely appropriate to build data marts using dimensional modeling and star schema.&amp;nbsp; However, keep in mind that although star schema is extremely powerful and useful, it is not appropriate for all applications and analyses.&amp;nbsp; Always work closely with users and stakeholders when designing data marts.&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-7201168289750873958?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/7201168289750873958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2010/07/if-ralph-kimball-were-to-publish-new.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7201168289750873958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/7201168289750873958'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2010/07/if-ralph-kimball-were-to-publish-new.html' title='Modeling the Data Warehouse'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-6275492548885413935</id><published>2010-07-15T11:19:00.000-07:00</published><updated>2011-03-12T10:32:28.867-08:00</updated><title type='text'>SQL vs. Visual Query Tools</title><content type='html'>Since the early 90’s there have been many attempts to create graphical database query tools. &amp;nbsp;I myself have designed and built both desktop and web-based query tools. &amp;nbsp;I’ve also seen attempts by others including large software companies to develop query tools, data manipulation applications (e.g. ETL and BI products), forms, dashboards, and so on. &amp;nbsp;Interestingly, there still seems to be no substitute for an old command line query tool: &lt;i&gt;Structured Query Language (SQL)&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Of course I understand the value of visual interfaces, but they are not a panacea. &amp;nbsp;For example, take the following query:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;SELECT column1, column2, column3, SUM(column4)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;FROM table1&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;WHERE column1 BETWEEN ’01-JAN-10’ AND ’30-JUN-10’&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;AND column2 IS NOT NULL&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;AND column4 &amp;gt; 10000&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;GROUP BY column1, column2, column3 ;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The above SQL command takes a few seconds to create. &amp;nbsp;It is understandable by someone without technical skills (perhaps with some minimal explanation). &amp;nbsp;It is straightforward. &amp;nbsp;It is simple.&lt;br /&gt;&lt;br /&gt;Now compose the same query with a typical visual query tool:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Find the correct table in a dropdown list of tables.&lt;/li&gt;&lt;li&gt;Find column1, column2, column3, and column4 and drag them to some box on the screen.&lt;/li&gt;&lt;li&gt;Double-click on column1 to open up a dialog. &amp;nbsp;Select something that says “range” or “selection criteria” or “column values” or whatever which opens up another dialog.&lt;/li&gt;&lt;li&gt;In the new dialog select “range” and select start and end values (’01-JAN-10’ and &amp;nbsp;’30-JUN-10’) from a list or manually type in the values.&lt;/li&gt;&lt;li&gt;Save and close the open dialogs.&lt;/li&gt;&lt;li&gt;Open another dialog on column2.&lt;/li&gt;&lt;li&gt;Use the dialog to figure out how to exclude NULLS. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Save and close the dialog.&lt;/li&gt;&lt;li&gt;Open another dialog for column4.&lt;/li&gt;&lt;li&gt;Figure out how to only select values greater than 10,000.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Save and close the dialog.&lt;/li&gt;&lt;li&gt;Find a button or link in the GUI that says something like “groupings” or “aggregations” and open the appropriate dialog.&lt;/li&gt;&lt;li&gt;Create groupings on column1, column2, and column3. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Save and close the dialog.&lt;/li&gt;&lt;li&gt;Find something in the GUI that says something like “functions” or “aggregate functions”.&lt;/li&gt;&lt;li&gt;Add a SUM( ) function to column 4. &amp;nbsp;&lt;/li&gt;&lt;li&gt;Save and close the dialog.&lt;/li&gt;&lt;li&gt;Press Submit.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Was it really easier? &amp;nbsp;Keep in mind that each vendor’s product has its own GUI—there is no standard. &amp;nbsp;Also keep in mind that many vendors of these products also sell training which suggests that perhaps their GUI’s are not instantly intuitive.&lt;br /&gt;&lt;br /&gt;SQL is the standard language of relational databases. &amp;nbsp;With small variations, it is ubiquitous. &amp;nbsp;It is easy to hire people with SQL skills. &amp;nbsp;It is easy for non-programmers to learn. &amp;nbsp;SQL can get somewhat complex for complicated queries, but that is also true for any query tool graphical or otherwise.&lt;br /&gt;&lt;br /&gt;Again, I understand the value of visual interfaces. &amp;nbsp;There are many useful applications that allow a user to access information with a few clicks. &amp;nbsp;There are wonderful dashboards, graphing tools, and so on. &amp;nbsp;But whenever serious data analysis or complex ETL is required, an enterprise shouldn’t disallow the most flexible and in many cases the simplest tool for querying data.&lt;br /&gt;&lt;br /&gt;Change isn’t always progress. &amp;nbsp;What works well tends to return.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-6275492548885413935?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/6275492548885413935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2010/07/sql-vs-visual-query-tools.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/6275492548885413935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/6275492548885413935'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2010/07/sql-vs-visual-query-tools.html' title='SQL vs. Visual Query Tools'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-518173685570935396.post-8471173745618115175</id><published>2010-06-19T19:17:00.000-07:00</published><updated>2010-07-22T12:14:59.751-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BI'/><category scheme='http://www.blogger.com/atom/ns#' term='database'/><category scheme='http://www.blogger.com/atom/ns#' term='data warehouse'/><title type='text'>Data Warehousing Tools</title><content type='html'>IT applications such as data warehousing and business intelligence (BI) have become very  popular in recent years. New acronyms including ODS, ETL, DSS, OLAP, and others are frequently thrown around as of late. &amp;nbsp;A mystique is created around these buzzwords and businesses are  attracted to the latest technology trends. Consequently, there is an army of vendors trying to sell products to  those businesses. &lt;br /&gt;&lt;br /&gt;Unfortunately, these vendors are&amp;nbsp;persuading&amp;nbsp;many that things like OLAP and BI are products—not applications, that an ODS  is something you buy rather than build, and that ETL is a tool that you purchase  rather than a process. It is not necessarily a bad thing that there are products  to help build these processes and applications, but it must be  understood that products are not a substitute for knowledge. All of these  processes require the stakeholders to understand the source data, what kinds of  analyses they want to do, and what is the fastest, most economical, and most  sensible way to accomplish that. There is not and never will be something you  can buy to replace understanding. If a product can facilitate analytical  business requirements then it may be worth purchasing. But upon closer  examination, this is often far from true. &lt;br /&gt;&lt;br /&gt;For example, imagine a hypothetical health care insurer that wants to examine  how they can reduce medical expenses for their members. At a certain level of  management it is decided that a BI product will be purchased to  provide analytical tools. Vendors selling BI products are called in to make  their presentations. One or two technical folks are asked to attend, but they  are under political pressure to agree with the bosses. Professional  sales people present well-prepared (though unrealistic) demos on sample data  sets that show clever GUI’s, well-formatted reports, and eye-pleasing trend graphics and dashboards. Sometimes incentives are distributed to the attendees. One vendor  wins the purchase. &lt;br /&gt;&lt;br /&gt;Now for the reality check:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Licensing the product is costly.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The product is not a standalone product and must run on top of an existing  RDBMS. In other words, there are no savings from another license being replaced.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Additional costs include product training for staff, ramp-up time for staff  to become proficient enough to build meaningful processes, product support from  the vendor, and mandatory product upgrades.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;In addition to the work of building processes with the BI tool,  considerable work has to be done up front in the RDBMS to prepare data to make  it useable and available to the BI tool.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The new BI product requires data in a dimensional model (typically star  schema). The staff doesn’t really understand dimensional modeling and requires  further education (in addition to product training). If dimensional modeling is  not well understood, even minor errors in data structures can cause inaccurate  analyses and reporting.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The source data (in this example health care claims data) is complex and  doesn’t fit well into a dimensional model.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;BI tools are typically designed for groupings and aggregations (a/k/a  slice-and-dice applications or &lt;i&gt;cubes&lt;/i&gt;). More complex analyses and studies cannot  be completed using the BI tool.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;People start fudging to make deadlines and meet management expectations.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;QC finds considerable anomalies in reports and output from the BI tools.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Problems are hidden and phony status reports about the implementation are presented up the management chain.&lt;/li&gt;&lt;li&gt;Staff members become fed up with the needless stress and long hours and  seek employment elsewhere.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Yet more costs and delays.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;After considerable expense and time, the original goal of purchasing the BI  tool is not accomplished.&lt;/li&gt;&lt;/ul&gt;All of this may seem an exaggeration, but similar scenarios are being played  out in many businesses and organizations in all industry sectors. Some who read  this may have already experienced the pain of similar situations. &amp;nbsp;These problems can be avoided with some basic understanding.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Understand your business requirements.&lt;/b&gt; Although not all analytical  requirements can be foreseen, key stakeholders should have a very good sense of  the types of applications they will be pursuing. Many real-world analytical  questions should be considered up front. Fancy presentations by  sales persons may look good to those making purchase decisions, but they may be  irrelevant to the actual analytical needs. Key business users such as analysts, statisticians, accountants, and others involved in research and informatics need to  be involved in decision making concerning BI.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Understand your data.&lt;/b&gt; Sometimes the source data just does not contain the  information to fulfill certain analyses and queries. There is nothing magical  about data. A thorough understanding of the content, population of variables,  accuracy, and other attributes are necessary before any intelligence can be  derived from it. There is no product or tool that can make something out of data  that it is not. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Know your RDMBS.&lt;/b&gt; Most mainstream database systems such as Oracle or SQL Server  already have powerful tools for manipulating data. In fact, that’s exactly what  an RDBMS does. Because it’s called “relational” doesn’t mean it is limited to  normalized relational structures. Star schemas, snowflake schemas, cubes, and  all sorts of data marts can be created with any mainstream database product.  Database systems already contain powerful languages that are capable of any ETL. (In fact  expensive ETL tools are only front-ends that generate code in the native RDBMS). All mainstream  database systems contain powerful SQL extensions to support virtually any type  of query, analysis, or report. For very large data sets (e.g. billions of  records) most add-on tools cannot utilize special features of the RDBMS to provide reasonable performance. &amp;nbsp;Consult with your developers and DBA's before making purchase decisions.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Consider minimalist solutions.&lt;/b&gt; Various tools always add considerable complexity  and frequently add unnecessary convolutions. Huge amounts of money and time can be  invested in something that could have been accomplished much more easily. For  example, a major BI implementation can sometimes be replaced with a few  straightforward data marts, a little programming, and simple query tools. Less  money. Less time. Faster results. More flexible. More scalable. More accurate  analyses and reports. Sometimes the shortest distance between two points is a  straight line.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/518173685570935396-8471173745618115175?l=randygrenier.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://randygrenier.blogspot.com/feeds/8471173745618115175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://randygrenier.blogspot.com/2010/06/it-systems-such-as-data-warehousing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8471173745618115175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/518173685570935396/posts/default/8471173745618115175'/><link rel='alternate' type='text/html' href='http://randygrenier.blogspot.com/2010/06/it-systems-such-as-data-warehousing.html' title='Data Warehousing Tools'/><author><name>randy</name><uri>http://www.blogger.com/profile/16081467123447818194</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='32' src='http://1.bp.blogspot.com/-VXd8O5v9Rzk/TX-HeOOnNvI/AAAAAAAAAHU/0vzIwK6Fxis/s220/mugshot1.jpg'/></author><thr:total>0</thr:total></entry></feed>
