Sample Junk Dimension Table

Junk dimensions are collections of low cardinality values. The table represents all possible combinations of those values or, at least, all possible combinations of those values that have been imported so far.

The junk dimension template is almost identical to the dimension table template with one key difference. There is an additional column called RowHash which is a computed column that consist of a hash of all the the non-key, non-audit columns.

USE YourEDW
 
DROP TABLE IF EXISTS [dw].[DimYourDimensionName]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE TABLE [dw].[DimYourDimensionName](
[YourDimensionNameCK] [bigint] IDENTITY(1,1) NOT NULL,
--your columns.
[CreatedBy] [nvarchar](50) NULL,
[CreatedOn] [datetime] NULL,
[UpdatedBy] [nvarchar](50) NULL,
[UpdatedOn] [datetime] NULL,
[SourceSystem] [nvarchar](100) NULL,
[SourceSystemKey] [nvarchar](100) NULL,
[EffectiveFrom] [datetime] NULL,
[EffectiveTo] [datetime] NULL,
[IsMostRecentRecord] [bit] NULL,
[RowHash]  AS (CONVERT([binary](16),hashbytes('MD5',concat(
CONVERT([nvarchar](35),Column1,0),
CONVERT([nvarchar](35),Column2,0),
CONVERT([nvarchar](35),Column3,0),
CONVERT([nvarchar](35),Column4,0),
CONVERT([nvarchar](35),Column5,0),
CONVERT([nvarchar](35),Column6,0))),0)) PERSISTED,
 CONSTRAINT [PK_YourDimensionName] PRIMARY KEY CLUSTERED 
(
[YourDimensionNameCK] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END

--index is optional
DROP INDEX IF EXISTS [NCIDX_FactYourFactTableName_RowHash] ON [dw].[FactYourFactTableName]
GO


CREATE NONCLUSTERED INDEX [NCIDX_FactYourFactTableName_RowHash] ON [dw].[FactYourFactTableName]
(
	[RowHash] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

Column Name

Data Type

Function

CreatedBy

nvarchar(50)

Who created the record. Any kind of record creation process should include the code:

CreatedBy = SYSTEM_USER

CreatedOn

datetime

When the record was created.

Any kind of record creation process should include the code:

CreatedOn= CURRENT_TIMESTAMP

UpdatedBy

nvarchar(50)

Who updated the record.

If there are any update statements ran against the table, this column should be populated as part of that process. This is especially important when engineers have to make manual adjustments to data. Your SQL UPDATE statement should include:

UpdatedBy = SYSTEM_USER

UpdatedOn

datetime

When the record was updated. If there are any update statements ran against the table, this column should be populated as part of that process. This is especially important when engineers have to make manual adjustments to data. Your SQL UPDATE statement should include:

UpdatedOn = CURRENT_TIMESTAMP

SourceSystem

nvarchar(100)

The source system where the data came from. This value is populated from a column of the same name in the staging table.

SourceSystemKey

nvarchar(100)

The primary key of the record from the source system. Strictly speaking, this does not have to be the primary key of a source table. It does not even have to be an atomic value. It can be concatenated value. The important thing is that the value uniquely identify the row, and that row can be tracked back to a record in the source system.

EffectiveFrom

datetime

If the processing of the dimension requires that history be kept, this column indicates when the row began to represent the most recent version of the data.

This column usually shares a value with CreatedOn and does not suffer from the challenges found when trying to update the value of EffectiveTo.

EffectiveTo

datetime

If the processing of the dimension requires that history be kept, this column indicates when the row ceased to represent the most recent version of the data.

If IsMostRecentRecord = 1, then the value of EffectiveTo is the highest date in the date dimension. If IsMostRecentRecord = 0, then the value of this column is set to the day it stopped being true. This has some challenges in the case where you have an intraday change. It happens. You will need to use your creativity to solve your particular use case. My solution was to begin retiring records by decrementing a minute amount of time. That is the reason the data type is datatime and not just date. I will illustrate this in the section on loading a Type II SCD.

IsMostRecentRecord

bit

If the processing of the dimension requires that history be kept, this column indicates whether or not that row represents the most up to date version of that data.

RowHash

binary(16)

A computed value of those columns used to compute the uniqueness of a row. This value is used to determine INSERT or UPDATE operations on the table.

Last updated