June 19, 2010

Data Warehousing Tools

IT applications such as data warehousing and business intelligence (BI) have become very popular in recent years. New acronyms including ODS, ETL, DSS, OLAP, and others are frequently thrown around as of late.  A mystique is created around these buzzwords and businesses are attracted to the latest technology trends. Consequently, there is an army of vendors trying to sell products to those businesses.

Unfortunately, these vendors are persuading many that things like OLAP and BI are products—not applications, that an ODS is something you buy rather than build, and that ETL is a tool that you purchase rather than a process. It is not necessarily a bad thing that there are products to help build these processes and applications, but it must be understood that products are not a substitute for knowledge. All of these processes require the stakeholders to understand the source data, what kinds of analyses they want to do, and what is the fastest, most economical, and most sensible way to accomplish that. There is not and never will be something you can buy to replace understanding. If a product can facilitate analytical business requirements then it may be worth purchasing. But upon closer examination, this is often far from true.

For example, imagine a hypothetical health care insurer that wants to examine how they can reduce medical expenses for their members. At a certain level of management it is decided that a BI product will be purchased to provide analytical tools. Vendors selling BI products are called in to make their presentations. One or two technical folks are asked to attend, but they are under political pressure to agree with the bosses. Professional sales people present well-prepared (though unrealistic) demos on sample data sets that show clever GUI’s, well-formatted reports, and eye-pleasing trend graphics and dashboards. Sometimes incentives are distributed to the attendees. One vendor wins the purchase.

Now for the reality check:
  • Licensing the product is costly.
  • The product is not a standalone product and must run on top of an existing RDBMS. In other words, there are no savings from another license being replaced.
  • Additional costs include product training for staff, ramp-up time for staff to become proficient enough to build meaningful processes, product support from the vendor, and mandatory product upgrades.
  • In addition to the work of building processes with the BI tool, considerable work has to be done up front in the RDBMS to prepare data to make it useable and available to the BI tool.
  • The new BI product requires data in a dimensional model (typically star schema). The staff doesn’t really understand dimensional modeling and requires further education (in addition to product training). If dimensional modeling is not well understood, even minor errors in data structures can cause inaccurate analyses and reporting.
  • The source data (in this example health care claims data) is complex and doesn’t fit well into a dimensional model.
  • BI tools are typically designed for groupings and aggregations (a/k/a slice-and-dice applications or cubes). More complex analyses and studies cannot be completed using the BI tool.
  • People start fudging to make deadlines and meet management expectations.
  • QC finds considerable anomalies in reports and output from the BI tools.
  • Problems are hidden and phony status reports about the implementation are presented up the management chain.
  • Staff members become fed up with the needless stress and long hours and seek employment elsewhere.
  • Yet more costs and delays.
  • After considerable expense and time, the original goal of purchasing the BI tool is not accomplished.
All of this may seem an exaggeration, but similar scenarios are being played out in many businesses and organizations in all industry sectors. Some who read this may have already experienced the pain of similar situations.  These problems can be avoided with some basic understanding.

Understand your business requirements. Although not all analytical requirements can be foreseen, key stakeholders should have a very good sense of the types of applications they will be pursuing. Many real-world analytical questions should be considered up front. Fancy presentations by sales persons may look good to those making purchase decisions, but they may be irrelevant to the actual analytical needs. Key business users such as analysts, statisticians, accountants, and others involved in research and informatics need to be involved in decision making concerning BI.

Understand your data. Sometimes the source data just does not contain the information to fulfill certain analyses and queries. There is nothing magical about data. A thorough understanding of the content, population of variables, accuracy, and other attributes are necessary before any intelligence can be derived from it. There is no product or tool that can make something out of data that it is not.

Know your RDMBS. Most mainstream database systems such as Oracle or SQL Server already have powerful tools for manipulating data. In fact, that’s exactly what an RDBMS does. Because it’s called “relational” doesn’t mean it is limited to normalized relational structures. Star schemas, snowflake schemas, cubes, and all sorts of data marts can be created with any mainstream database product. Database systems already contain powerful languages that are capable of any ETL. (In fact expensive ETL tools are only front-ends that generate code in the native RDBMS). All mainstream database systems contain powerful SQL extensions to support virtually any type of query, analysis, or report. For very large data sets (e.g. billions of records) most add-on tools cannot utilize special features of the RDBMS to provide reasonable performance.  Consult with your developers and DBA's before making purchase decisions.

Consider minimalist solutions. Various tools always add considerable complexity and frequently add unnecessary convolutions. Huge amounts of money and time can be invested in something that could have been accomplished much more easily. For example, a major BI implementation can sometimes be replaced with a few straightforward data marts, a little programming, and simple query tools. Less money. Less time. Faster results. More flexible. More scalable. More accurate analyses and reports. Sometimes the shortest distance between two points is a straight line.

No comments:

Post a Comment