Slowly Changing Dimensions
Slowly changing dimensions in dimensional modeling are commonly misunderstood. Chapter 6 of The Data Warehouse Toolkit by Ralph Kimball describes slowly changing dimensions in detail. This posting attempts to provide a summary explanation.
A slowly changing dimension is a dimension for which any of its attributes (column values) can change over time. The attributes change infrequently. Since the whole purpose of a dimension table is to aggregate on its attributes, there must be a systematic way of handling the changes.
For example, suppose there is a patient dimension with health plan information columns for each patient. If a patient changes health plans, then the values for the health plan related columns will change. For example:
Medical Record # Plan Name Plan ID Effective Date
Before 5552223333 Tufts Health Plan 2233221122 1/1/2005
After 5552223333 BCBSMA 4442227777 1/1/2009
To demonstrate the different approaches to making changes to slowly changing dimensions, we will create an example table DimPatient. The primary key is PatientID, and the plan name, plan ID, and effective date are the slowly changing attributes. (Of course, in a real-world patient dimension there would probably be many more attributes.)
DimPatient (PatientID, MRN, PlanName, PlanID, EffDate)
Ralph Kimball proposes three types of slowly changing dimensions where the definition of each type is based on how the data is updated:
Type I: Simply update the changed attributes.
Type II: Add a new dimension record with a new primary key and the new attribute values.
Type III: Create separate columns for the original and current values (e.g. OriginalPlanID, CurrentPlanID)
There are obvious advantages and disadvantages for each of Kimball’s types of slowly changing dimensions.
Type I Slowly Changing Dimension
The advantage of this type of slowly changing dimension is that it is the easiest to update and maintain. For any given change, there is only one record that needs to be updated and that is identified by the primary key.
UPDATE DimPatient
SET PlanName = ‘BCBSMA’, PlanID = ‘4442227777’, EffDate = ‘1/1/2009’
WHERE PatientID = 111222333
The main disadvantage of a Type I slowly changing dimension is that history is lost. Suppose there is a query to get amounts charged to insurance companies by year. To demonstrate this we will include example tables FactVisits and DimDate.
FactVisit (FactVisitID, VisitDateID, VisitMinutes, ClinicianID, PatientID, ProblemID, CopayAmount, BillAmount)
DimDate (DateID, Year, Month, Fiscal Year, Quarter, BusinessDay)
A star join uses DimPatient to see which health plan was billed by year.
SELECT dd.Year, dp.HealthPlan, SUM(fv.BillAmount)
FROM FactVisit fv
JOIN DimDate dd ON dd.DateID = fv.VisitDateID
JOIN DimPatient dp ON dp.PatientID = fv.PatientID
GROUP BY dd.Year, dp.HealthPlan
Because a Type I slowly changing dimension looses history, all amounts billed for our example patient will show up billed to ‘BCBSMA’ even though amounts billed prior to 2009 were billed to another plan. For this reason, Type I slowly changing dimensions are rarely useful in a dimensional model.
Type II Slowly Changing Dimension
This type of slowly changing dimension simply adds a new record with a new primary key. All things considered, this is the most useful type of slowly changing dimension because the history is automatically partitioned, and nothing else needs to be done when creating star joins.
If the DimPatient dimension is implemented as a Type II slowly changing dimension, another record would simply be added when a patient’s health plan changes.
PatientID | MRN | PlanName | PlanID | EffDate |
1000222 | 5552223333 | Tufts Health Plan | 2233221122 | 1/1/2005 |
2000444 | 5552223333 | BCBSMA | 4442227777 | 1/1/2009 |
Note that the new record has a different value in the primary key column PatientID. The MRN is used to identify the patient and is unchanged. The “slowly changing” attributes PlanName and PlanID contain the new values. The effective date is for reference purposes only and does not need to be included in the join conditions.
In the previous query, the billing amounts for the example patient will show up billed to ‘Tufts Health Plan’ for all charges prior to 2009, and to ‘BCBSMA’ for all charges since the beginning of 2009. In Kimball’s words:
The Type Two slowly changing dimension automatically partitions history and an application must not be required to place any time constraints on effective dates in the dimension.[1]
One with entity-relationship modeling background may protest that we are creating two records for the same entity with different primary keys (i.e. two PatientID’s). However, for dimension tables the primary key uniquely identifies the combination of all attributes in the row. The patient in this case is actually identified by the Medical Record Number (MRN) which will contain the same value in both records.
Type III Slowly Changing Dimension
This type would contain “original” and “current” columns for the changed attribute. In our example, we would add the following columns to our dimension:
OriginalPlanName
CurrentPlanName
OriginalPlanID
CurrentPlanID
Although this may be satisfactory for some simple data mart applications, it is a rarely useful type of slowly changing dimension with several disadvantages.
A Type III slowly changing dimension does not partition by time. We couldn’t use effective date columns in queries because we would be adding additional qualifiers on the dimension to be used in a star join which is not correct design. Even if we did use effective dates, that would only give us the starting point for the current value.
Also, adding more and more columns to the dimension could potentially get out of control. Keep in mind that the whole purpose of dimensional modeling is to simplify queries. Analysts and ETL developers would have to consider each changing attribute of each slowly changing dimension for every query.
Another disadvantage of a Type III slowly changing dimension is that it only allows for a single change to an attribute. Suppose in our example that a patient changed health plans three or four times over the time window of the data. A Type III slowly changing dimension would not work.
Using Changing Dimension Attributes in WHERE Clauses
Using changing attributes of slowly changing dimensions in a WHERE clause of a star join has the risk of either loosing data we want to include or including data that we don’t want to include. When we query on a previous value of a dimension attribute, we loose new data, and when we query on a newer value, we loose previous data.
For example, a dimension named DimDiagnosis contains diagnosis codes and descriptions. The descriptions are slowly changing attributes.
DimDiagnosis (DiagnosisID, ICD9Code, Description)
Previous Values
DiagnosisID ICD9Code Description
10000222 401.1 Hypertension Benign
New Values
DiagnosisID ICD9Code Description
10000244 401.1 Hypertension, benign
Note that the Description value has changed slightly. If a WHERE clause condition on this dimension contains previous description, it will lose data for the new description.
WHERE dimDiagnosis.Description = ‘Hypertension Benign’
One way to address this problem is to include a column with a stable value. For example, a column could be added to the DimDiagnosis dimension named DimDescription which would contain a value that doesn’t change when the Description column changes.
Previous Values
DiagnosisID ICD9Code Description DimDescription
10000222 401.1 Hypertension Benign HYPERTENSION BENIGN
New Values
DiagnosisID ICD9Code Description DimDescription
10000244 401.1 Hypertension, benign HYPERTENSION BENIGN
Then the where clause of a query could use the unchanging attribute DimDescription
WHERE dimDiagnosis.DimDescription = ‘HYPERTENSION BENIGN’
The changed attributes can still be used in groupings and aggregations.
SELECT d.Description, f.VisitDate, COUNT(*)
FROM FactVisits v
JOIN DimDiagnosis d ON d.DiagnosisID = f.DiagnosisID
WHERE dimDiagnosis.DimDescription = ‘HYPERTENSION BENIGN’
GROUP BY d.Description, f.VisitDate
Note that the changed Description value is used in the grouping while the unchanged DimDescription column is used in the WHERE clause.
Restraint should be used when adding additional columns to any dimension so as to avoid adding unnecessary complexity complicating queries rather than simplifying them.
Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
ReplyDeleteAWS Training in Chennai | Best AWS Training in Chennai | AWS Training Course in Chennai
Data Science Training in Chennai | Best Data Science Training in Chennai | Data Science Course in Chennai
No.1 Python Training in Chennai | Best Python Training in Chennai | Python Course in Chennai
RPA Course Training in Chennai | Best RPA Training in Chennai
No.1 RPA Training in Chennai | Best RPA Training in Chennai | RPA Course in Chennai
No.1 Digital Marketing Training in Chennai | Best Digital Marketing Training in Chennai
Hey, would you mind if I share your blog with my twitter group? There’s a lot of folks that I think would enjoy your content. Please let me know. Thank you.
ReplyDeleteAWS Training in Chennai | Best AWS Training in Chennai | AWS Training Course in Chennai
Data Science Training in Chennai | Best Data Science Training in Chennai | Data Science Course in Chennai
No.1 Python Training in Chennai | Best Python Training in Chennai | Python Course in Chennai
RPA Course Training in Chennai | Best RPA Training in Chennai
No.1 RPA Training in Chennai | Best RPA Training in Chennai | RPA Course in Chennai
No.1 Digital Marketing Training in Chennai | Best Digital Marketing Training in Chennai
Well done! Your explanation way is very attractive for me and I like more tips from your great post. Surely I will share your great post for my friends and this is very useful for freshers. Keeping the great work!
ReplyDeleteOracle Training in Chennai
Oracle Training institute in chennai
Social Media Marketing Courses in Chennai
Tableau Training in Chennai
Primavera Training in Chennai
Unix Training in Chennai
Oracle DBA Training in Chennai
Power BI Training in Chennai
Oracle Training in Chennai
Oracle Training institute in chennai
It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.
ReplyDeleteData science Course Training in Chennai |Best Data Science Training Institute in Chennai
matlab training chennai | Matlab course in chennai
Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
ReplyDeleteAws training chennai | AWS course in chennai
Rpa training in chennai | RPA training course chennai
Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
ReplyDeleteOracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore
This is a great inspiring article.I am pretty much pleased with your good work. You put really very helpful information. Keep it up. Keep blogging. Looking to reading your next post..
ReplyDeleteDevOps Training in Chennai
DevOps Course in Chennai
Incredible article. I'm managing a portion of these issues too..
ReplyDeletebest interiors
ReplyDeleteA persuading conversation is worth remark. I do accept that you ought to distribute more on this topic, it probably won't be an untouchable issue however typically individuals don't discuss such issues. To the following! Much obliged!!
best interiors
This comment has been removed by the author.
ReplyDeleteyes i am fully decided on amid this text and that i simply indulgent pronounce that this article is deeply best and pretty informative article.i will make hermetically sealed to be studying your blog extra. You made a fine lessening but I can't seasoned occurring but surprise, what kind of the including together facet? !!!!!!thank you!!!!!!.. Fifa 19 Crack
ReplyDeletethank you later than more for all of the knowledge you distribute,suitable pronounce. i was fantastically vivacious inside the article, it is pretty inspiring I ought to agree with on. Crack Download I as soon as touring you web site by means of now I always stumble upon appealing articles later this one.awesome task, I greatly admire that.Do preserve sharing! Regards, Norton Internet Security Product Key Generator
ReplyDelete