Sample Dimension Table
Dimensions that are not junk dimensions and implemented as physical tables can be implemented by this script.
Note that the primary key is of data type BIGINT. The INT data type maxes out at 2,147,483,647. It turns out, in extensive load testing, you can blow clean through that number. There are remedies for that. You can reset the value using DBCC CHECKIDENT, but why take chances?
Nuke it from orbit. It's the only way to be sure. And by that I mean use BIGINT for contrived keys. It taps out at 9,223,372,036,854,775,807 at the cost of four more bytes of hard drive space.
Column Name | Data Type | Function |
CreatedBy | nvarchar(50) | Who created the record. Any kind of record creation process should include the code: CreatedBy = SYSTEM_USER |
CreatedOn | datetime | When the record was created. Any kind of record creation process should include the code: CreatedOn= CURRENT_TIMESTAMP |
UpdatedBy | nvarchar(50) | Who updated the record. If there are any update statements ran against the table, this column should be populated as part of that process. This is especially important when engineers have to make manual adjustments to data. Your SQL UPDATE statement should include: UpdatedBy = SYSTEM_USER |
UpdatedOn | datetime | When the record was updated. If there are any update statements ran against the table, this column should be populated as part of that process. This is especially important when engineers have to make manual adjustments to data. Your SQL UPDATE statement should include: UpdatedOn = CURRENT_TIMESTAMP |
SourceSystem | nvarchar(100) | The source system where the data came from. This value is populated from a column of the same name in the staging table. |
SourceSystemKey | nvarchar(100) | The primary key of the record from the source system. Strictly speaking, this does not have to be the primary key of a source table. It does not even have to be an atomic value. It can be concatenated value. The important thing is that the value uniquely identify the row, and that row can be tracked back to a record in the source system. |
EffectiveFrom | datetime | If the processing of the dimension requires that history be kept, this column indicates when the row began to represent the most recent version of the data. This column usually shares a value with CreatedOn and does not suffer from the challenges found when trying to update the value of EffectiveTo. |
EffectiveTo | datetime | If the processing of the dimension requires that history be kept, this column indicates when the row ceased to represent the most recent version of the data. If IsMostRecentRecord = 1, then the value of EffectiveTo is the highest date in the date dimension. If IsMostRecentRecord = 0, then the value of this column is set to the day it stopped being true. This has some challenges in the case where you have an intraday change. It happens. You will need to use your creativity to solve your particular use case. My solution was to begin retiring records by decrementing a minute amount of time. That is the reason the data type is datatime and not just date. I will illustrate this in the section on loading a Type II SCD. |
IsMostRecentRecord | bit | If the processing of the dimension requires that history be kept, this column indicates whether or not that row represents the most up to date version of that data. |
Last updated