The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals. This posting discusses column-oriented databases.
Data in a typical relational database is organized in tables by row. The row paradigm is used for physical storage as well as the logical organization of data.
Column-Oriented or column-based databases physically organize data by column while still presenting data as rows. Organizing data by column makes it self-indexing since each column becomes the equivalent of an index key. Each column has table and row identifiers so that columns can be combined to produce rows of data in a table. Any column can be used in the WHERE clause of a query, the data elements can quickly be found in that column, and columns with the same row identifiers are assembled for the query results.
Microsoft SQL Server 2012 (to be released in 2012) has a new feature called Columnstore Indexes which provide column-oriented storage for tables. Microsoft claims that these can speed up queries up to 10 times. A major advantage of the new SQL Server Columnstore Indexes is that you can index all columns in a large fact table. This enables very fast performance when joining foreign keys in very large fact tables to primary keys in dimensions for star joins. With B-tree indexes, there are penalties for indexing too many columns. In a typical fact table, most columns other than the primary key are foreign keys to dimension tables, and Columnstore indexes will allow all of those columns to be indexed using column-oriented storage.
Oracle 11g supports column-oriented storage of data but only on its proprietary Exadata storage servers or Exadata appliances.
Informative links for information about column-oriented Databases:
Amazon Redshift is a cloud-based Data Warehouse Service that allows you to store and process your data in a scalable and flexible manner. You can store your data in the form of tables, and then use SQL queries to run queries against
ReplyDelete