Showing posts with label BI. Show all posts
Showing posts with label BI. Show all posts

November 20, 2011

Rapidly Emerging Technology Series: Self-Service BI

The Rapidly Emerging Technology Series highlights current technologies that are relevant to data warehouse professionals.  This posting discusses self-service BI.

A very rapidly evolving area of information access is called Self-service Business Intelligence or Self-service BI.  Business users want more flexible access to information than standard reports can provide and not have to rely on analysts who can write code.

For many years, companies like Cognos and MicroStrategies have provided BI tools that work on top of relational databases.  Typically, the database team would set up a star schema data mart, and BI developers would create a browser-based front end would allow easy grouping and aggregation (slice-and-dice) access to the data with sums and counts.  BI products also provide tools to build dashboards that give very current views of data—typically in an operational data store (ODS).  Dashboards can also include data visualization in the form of graphs and gauges.

However, these tools are evolving to include much more sophisticated access for users.  Unlike older BI tools that sat on top of a pre-defined data mart, newer self-service BI products provide access to multiple data sources including enterprise databases as well as publicly accessible databases and subscription databases on the web.  The BI tool allows for integration of these diverse data sources at the user level for complex reports and graphs.

Most analyses involve more than single queries.  There is usually a discovery process which leads to more questions and subsequently more analyses.  Newer BI products support this discovery process.  Self-service BI allows an analysis to be built upon by easily changing variables, adding data sources, re-use of results, and so on.

Newer BI tools are being developed that also allow mobile access from a variety of devices. Users will increasingly need to access and visualize data from their smart phones and tablets.

Microsoft has invested heavily in Self-service BI.  Their SQL Server product contains tools that allow tight integration with Excel and all of its analytical and graphing capabilities as well as with SharePoint to promote various types of live data access throughout an organization.  SQL Server 2012 which will be released in coming months has a tool called PowerView which is a very flexible and powerful BI front end.  It supports integration of diverse data sources at the user level, self-service alerts, sophisticated graphing, document authoring with live data updates, and many other analytical tools.

(A pre-release version of SQL Server 2012 is available for free download at http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx.)

June 19, 2010

Data Warehousing Tools

IT applications such as data warehousing and business intelligence (BI) have become very popular in recent years. New acronyms including ODS, ETL, DSS, OLAP, and others are frequently thrown around as of late.  A mystique is created around these buzzwords and businesses are attracted to the latest technology trends. Consequently, there is an army of vendors trying to sell products to those businesses.

Unfortunately, these vendors are persuading many that things like OLAP and BI are products—not applications, that an ODS is something you buy rather than build, and that ETL is a tool that you purchase rather than a process. It is not necessarily a bad thing that there are products to help build these processes and applications, but it must be understood that products are not a substitute for knowledge. All of these processes require the stakeholders to understand the source data, what kinds of analyses they want to do, and what is the fastest, most economical, and most sensible way to accomplish that. There is not and never will be something you can buy to replace understanding. If a product can facilitate analytical business requirements then it may be worth purchasing. But upon closer examination, this is often far from true.

For example, imagine a hypothetical health care insurer that wants to examine how they can reduce medical expenses for their members. At a certain level of management it is decided that a BI product will be purchased to provide analytical tools. Vendors selling BI products are called in to make their presentations. One or two technical folks are asked to attend, but they are under political pressure to agree with the bosses. Professional sales people present well-prepared (though unrealistic) demos on sample data sets that show clever GUI’s, well-formatted reports, and eye-pleasing trend graphics and dashboards. Sometimes incentives are distributed to the attendees. One vendor wins the purchase.

Now for the reality check:
  • Licensing the product is costly.
  • The product is not a standalone product and must run on top of an existing RDBMS. In other words, there are no savings from another license being replaced.
  • Additional costs include product training for staff, ramp-up time for staff to become proficient enough to build meaningful processes, product support from the vendor, and mandatory product upgrades.
  • In addition to the work of building processes with the BI tool, considerable work has to be done up front in the RDBMS to prepare data to make it useable and available to the BI tool.
  • The new BI product requires data in a dimensional model (typically star schema). The staff doesn’t really understand dimensional modeling and requires further education (in addition to product training). If dimensional modeling is not well understood, even minor errors in data structures can cause inaccurate analyses and reporting.
  • The source data (in this example health care claims data) is complex and doesn’t fit well into a dimensional model.
  • BI tools are typically designed for groupings and aggregations (a/k/a slice-and-dice applications or cubes). More complex analyses and studies cannot be completed using the BI tool.
  • People start fudging to make deadlines and meet management expectations.
  • QC finds considerable anomalies in reports and output from the BI tools.
  • Problems are hidden and phony status reports about the implementation are presented up the management chain.
  • Staff members become fed up with the needless stress and long hours and seek employment elsewhere.
  • Yet more costs and delays.
  • After considerable expense and time, the original goal of purchasing the BI tool is not accomplished.
All of this may seem an exaggeration, but similar scenarios are being played out in many businesses and organizations in all industry sectors. Some who read this may have already experienced the pain of similar situations.  These problems can be avoided with some basic understanding.

Understand your business requirements. Although not all analytical requirements can be foreseen, key stakeholders should have a very good sense of the types of applications they will be pursuing. Many real-world analytical questions should be considered up front. Fancy presentations by sales persons may look good to those making purchase decisions, but they may be irrelevant to the actual analytical needs. Key business users such as analysts, statisticians, accountants, and others involved in research and informatics need to be involved in decision making concerning BI.

Understand your data. Sometimes the source data just does not contain the information to fulfill certain analyses and queries. There is nothing magical about data. A thorough understanding of the content, population of variables, accuracy, and other attributes are necessary before any intelligence can be derived from it. There is no product or tool that can make something out of data that it is not.

Know your RDMBS. Most mainstream database systems such as Oracle or SQL Server already have powerful tools for manipulating data. In fact, that’s exactly what an RDBMS does. Because it’s called “relational” doesn’t mean it is limited to normalized relational structures. Star schemas, snowflake schemas, cubes, and all sorts of data marts can be created with any mainstream database product. Database systems already contain powerful languages that are capable of any ETL. (In fact expensive ETL tools are only front-ends that generate code in the native RDBMS). All mainstream database systems contain powerful SQL extensions to support virtually any type of query, analysis, or report. For very large data sets (e.g. billions of records) most add-on tools cannot utilize special features of the RDBMS to provide reasonable performance.  Consult with your developers and DBA's before making purchase decisions.

Consider minimalist solutions. Various tools always add considerable complexity and frequently add unnecessary convolutions. Huge amounts of money and time can be invested in something that could have been accomplished much more easily. For example, a major BI implementation can sometimes be replaced with a few straightforward data marts, a little programming, and simple query tools. Less money. Less time. Faster results. More flexible. More scalable. More accurate analyses and reports. Sometimes the shortest distance between two points is a straight line.