Documents
Data Warehouse ETL Framework
Data Warehouse ETL Framework
  • Introduction To The ETL Framework
  • Framework Overview
    • Framework Description
    • Why You Need An ETL Framework
      • What is the ETL Framework?
      • Framework Objectives
      • Can I use my own resources to implement The Framework?
    • Download The Framework
    • What's In The Box
    • v1.0 Release
    • Training Resources For Engineers
    • Getting Help
  • General Principles
    • Never Finished
    • Overall Engineering Philosophy
    • Self Service Analytics As The Goal
    • Strict Adherence To Globally Accepted Practice
    • Data Flow Design
    • Effective DataOps
    • Code First Philosophy
    • Melding Kimball And Inmon
    • Master Data Management
    • Open Architecture
    • Decoupled Systems
    • Independent Identically Executing Processes
    • Robust Processes
    • Self-Diagnostics
    • Data Provenance
    • Adverse Reporting
    • Error Recovery
    • Archrival And Recovery
    • Documentation As A Necessity
  • DataOps For The Uninitiated
    • DataOps Initiation
    • What Is Agile Software Development?
    • What Is DevOps?
    • What Is Statistical Process Control?
    • You Take All That And You Wind Up With...
    • The DataOps Manifesto
    • Want To Learn More About DataOps?
  • ETL Developer's Field Guide
    • The Plain English Explanation Of What You Are About To Build
    • Data Warehouse User Profiles
    • Using Data Source Names
    • Why You Should Not Use SSIS
      • SSIS Is Not Forward Compatible
      • SSIS Is Not Very Performant
      • Python Development Is Faster Than SSIS
      • SSIS Is Difficult To Deploy To Production
      • SSIS Is Difficult To Maintain
    • Python Software Engineering Considerations
    • Enterprise Data Ecosystem
      • EDW's Place In Enterprise Architecture
      • EDW ETL Overview
    • ETL Environment Databases
    • Data Acquisition Paradigms
      • Demand Pull
      • Supply Push
    • The Common Model
    • The Semantic Layer
    • Database Object Naming Conventions
      • Columns
      • Indexes
      • Primary Keys
      • Schemas
      • Stored Procedures
      • Tables
      • User Defined Functions
    • ETL Reference Architecture
    • Table Anatomy
      • Sample Dimension Table
      • Sample Fact Table
      • Sample Indexed View
      • Sample Junk Dimension Table
      • Sample Master Data Management Table
      • Sample Stage Table
    • Master Data Management
      • MDM With Exact Match On Source System Key
      • Implementation Details
    • Slowly Changing Dimension Processing
      • General Process
      • Creating Empty Records
      • SCD Processing Types
      • Implementing SCDs As Temporal Tables
    • Handling Calculated Values
    • Warehouse Load Commandments
    • Code Style Guide
      • SSIS
      • Transact SQL
      • Python
      • C#
    • The Role Of Schemas
      • Standard Schemas And Their Definitions
      • Schemas As A Security Device
      • Schemas As A DB Object Differentiator
    • Feedback And Control Systems
      • Having An Engineering Mindset
      • The Mechanics Of Checking For Unprocessed Records
      • Time Series Analysis 101
      • Passive Monitoring System Design Theory
      • Passive Monitoring System Implementation
    • Source Control
    • Loading Historical Data
    • Security Access Model
    • Wrapping Up
  • ETL Environment Set Up
    • Install Anaconda
    • Create Initial Source Control Folders
    • Create Primary Databases And Schemas
    • Customize And Run Deployment Scripts
    • Install Python Packages
    • Create Data Source Name
    • Create The SSIS Catalogue
    • Create The File I/O Directory Structure
    • Create Global Environment In The Integration Services Catalog
    • Create The SQL Server Aliases
    • Create The BIAnalytics Proxy Account
    • Create The Opt Folder
    • Configure Email Alerts
  • Standard SSIS ETL Development Package
    • Starting A New Process
    • Tracking Package Variables
    • Remove Things You Do Not Need
    • Creating New Configuration Settings
  • Sample Script Guide
    • SQL
      • Finalize And Audit Scripts
        • 18 CREATE PROC usp_RecordRowCounts
        • 19 CREATE PROC usp_MarkRecordsAsProcessed
        • 20 CREATE PROC usp_CheckForUnprocessedRecords
      • Monitoring Scripts
        • 21 CREATE PROC usp_DisplayTablesNotLoading
        • 22 CREATE PROC usp_LoadTableLoadReportingTable
        • 23 CREATE PROC usp_TableLoadMonitoring
      • Table Object Sample Scripts
        • sample type I dimension table
        • sample type II dimension table
        • sample fact table
        • sample indexed view
        • sample junk dimension table
        • sample master data management table
        • sample stage table
      • Data Processing Sample Scripts
        • Data Pull
        • batch processing
        • bulk insert
        • data cleansing
        • fact table load
        • remove dups
        • type I dimension processing
        • type II dimension processing
      • Helper Scripts
        • Create Database
        • add rowhash
        • change collation
        • configuration insert sample script
        • documentation block
        • populate fact table with fake data
        • proc execution scripts
        • show all columns in database
        • troubleshooting
        • util
        • start an agent job from T-SQL
    • Python
      • Building Blocks
        • CombineCSVsIntoOneFile
        • ConvertCommaToPipeDelimitedFile
        • ConvertExcelToCSV
        • LoopingOverFilesInADirectory
        • Process Zip File
        • QueryDatabaseAndWriteSmallFile
        • QueryDatabaseAndWriteLargeFile
        • SendEmail
        • StringMatching
        • YAMLConfigImport
      • Full Solutions
        • DownloadMoveAndStoreDataCSV
        • LoadLargeCSVsIntoDataWarehouseStagingTables
        • MoveAndStoreDataExcel
        • ReloadFromArchive
      • Jupyter Notebooks
        • Passive Monitoring System Design Theory
    • PySpark
      • ConnectToSpark
      • LoadCSV
      • ExampleDataProcessing
    • Windows Batch
  • A Methodology To Rapidly Convert OLTP Databases to OLAP Solutions
    • Step 1: Find The Nouns
    • Step 2: Find The Stuff We Want To Do Math On
    • Step 3: Analyze Relationships
  • Data Model Creation Tool
    • Record Count
    • Sample Data
    • Create Stage Table
    • Char Length Analysis
    • Column Notes
    • Column Cleansing Notes
    • Source To Target Mapping
    • Dimension List
    • Fact Table Creation Helper
    • Foreign Key Creation
    • Process Fact Script Helper
    • View Creation Helper
    • Date Role Play View Helper
    • Data Model View Creation Helper
    • List Population Values
  • Performance Monitoring
    • The Daily Chore
    • Diagnostic Tools
      • Using The Built-In Stored Procs
      • Using The Built-In Views
    • Logging Database Diagram
  • Data Warehouse Troubleshooting Guide
    • Generalized Troubleshooting Steps
  • Business Analytics Capability Maturity Model
    • What Is A Business Analytics Capability Maturity Model?
    • Level 0. Operational Reporting
    • Level 1. Rapid Delivery
    • Level 2. Self Service
    • Level 3. Central Repository
    • Level 4. Open Data 1
    • Level 5. Open Data 2
    • Level 6. Feedback 1
    • Level 7. Data Archaeology
    • Level 8. Crystal Ball
    • Level 9. Feedback 2
    • Level 10. Oil Rig
    • Level 11. The Singularity
  • Appendices
    • Appendix A. What is Medium Data®?
    • Appendix B. The Benefits Of Using Python And T-SQL Over SSIS For ETL
Powered by GitBook
On this page
  • Examples And Counter Examples
  • Don't Do These Things Specifically
  1. ETL Developer's Field Guide
  2. Code Style Guide

Transact SQL

Follow these 10 guidelines for writing clean easy to read SQL.

Dos

  1. Limit each line of code to a single idea.

  2. Left justify your code. Every character of every line should start at column 1 in the editor.

  3. Capitalize SQL keywords and operators.

  4. If you have a really long line of code, let it run off the page. Who cares? That’s what the scroll bar is for.

  5. If you have a lot of columns in your select statement, give each column its own row.

  6. ”Drop if exists” should be a part of every script that creates a permanent database object.

Don'ts

  1. DO NOT write code with leading commas.

  2. DO NOT line up column names.

  3. DO NOT name views with a leading v_.

  4. DO NOT create procs that call other procs, views, or functions, more than one level deep.

Examples And Counter Examples

Rule 1. Limit each line of code to a single idea.

Do NOT combine ideas on a single line.

USE AdventureWorks2016

--good
SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

--bad
SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod ON poh.PurchaseOrderID = pod.PurchaseOrderID

--if you have more than one join condition, where does the rest of it go?
--good
SELECT *
FROM tableOne t1
JOIN tableTwo t2
ON t1.KeyValueOne = t2.KeyValueOne
AND t1.KeyValueTwo = t2.KeyvalueTwo

--bad
SELECT *
FROM tableOne t1
JOIN tableTwo t2 ON t1.KeyValueOne = t2.KeyValueOne
AND t1.KeyValueTwo = t2.KeyvalueTwo

--even worse
--bad
SELECT *
FROM tableOne t1
JOIN tableTwo t2 ON t1.KeyValueOne = t2.KeyValueOne AND t1.KeyValueTwo = t2.KeyvalueTwo

Rule 2. Left justify your code.

Every character of every line should start at column 1 in the editor. Go back and look! This entire class follows that standard.

USE demo

--good
DECLARE @genders TABLE(gender_id INT, gender VARCHAR(20))
DECLARE @toys TABLE(toy_id INT, gender_id INT, toy_name VARCHAR(20))

INSERT INTO @genders(gender_id, gender)
SELECT 1, 'boy'
UNION ALL
SELECT 2, 'girl'
UNION ALL
SELECT 3, 'both'

INSERT INTO @toys(toy_id, gender_id, toy_name)
SELECT 1, 1, 'GI JOE'
UNION ALL
SELECT 2,2, 'My Little Pony'


SELECT g.*,t.toy_name
FROM @genders g
LEFT OUTER JOIN @toys t
ON g.gender_id = t.gender_id 
AND t.toy_name NOT IN ('My Little Pony') 

--bad
select first      as employee_first_name,
       surname    as employee_last_name,
       title,
       case 
           when employment = 1 then 'FT' 
           when employment = 2 then 'PT'
           else 'T' 
       end        as employment_status,
       'Y'        as is_valid,
       'HR'       as employee_source

--good
select 
first as employee_first_name,
surname as employee_last_name,
title,
case 
when employment = 1 then 'FT' 
when employment = 2 then 'PT'
else 'T' 
end as employment_status,
'Y' as is_valid,
'HR' as employee_source

Rule three. Capitalize SQL reserved words and operators.

USE demo

--bad
select 
first as employee_first_name,
surname as employee_last_name,
title,
case 
when employment = 1 then 'FT' 
when employment = 2 then 'PT'
else 'T' 
end as employment_status,
'Y' as is_valid,
'HR' as employee_source


--good
SELECT 
first AS employee_first_name,
surname AS employee_last_name,
title,
CASE
WHEN employment = 1 THEN 'FT' 
WHEN employment = 2 THEN 'PT'
ELSE 'T' 
END AS employment_status,
'Y' AS is_valid,
'HR' AS employee_source

Rule 4. If you have a really long line of code, let it run off the page.

Note: This is a contrived example. As you can see, it CLEARLY violates standard 5.

USE AdventureWorks2016

--bad
SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID, pod.DueDate, 
                                                       pod.OrderQty, pod.ProductID, pod.UnitPrice, pod.LineTotal, pod.ReceivedQty, 
													   pod.RejectedQty, pod.StockedQty, pod.ModifiedDate, poh.PurchaseOrderID, 
													   poh.RevisionNumber, poh.Status, poh.EmployeeID, poh.VendorID, poh.ShipMethodID, 
													   poh.OrderDate, poh.ShipDate, poh.SubTotal, poh.TaxAmt, poh.Freight, poh.TotalDue, 
													   poh.ModifiedDate
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

--good
SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID, pod.DueDate, pod.OrderQty, pod.ProductID, pod.UnitPrice, pod.LineTotal, pod.ReceivedQty, pod.RejectedQty, pod.StockedQty, pod.ModifiedDate, poh.PurchaseOrderID, poh.RevisionNumber, poh.Status, poh.EmployeeID, poh.VendorID, poh.ShipMethodID, poh.OrderDate, poh.ShipDate, poh.SubTotal, poh.TaxAmt, poh.Freight, poh.TotalDue, poh.ModifiedDate
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

Rule 5. If you have a lot of columns in your select statement, give each column its own row.

USE AdventureWorks2016

--bad
SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID, pod.DueDate, 
                                                       pod.OrderQty, pod.ProductID, pod.UnitPrice, pod.LineTotal, pod.ReceivedQty, 
													   pod.RejectedQty, pod.StockedQty, pod.ModifiedDate, poh.PurchaseOrderID, 
													   poh.RevisionNumber, poh.Status, poh.EmployeeID, poh.VendorID, poh.ShipMethodID, 
													   poh.OrderDate, poh.ShipDate, poh.SubTotal, poh.TaxAmt, poh.Freight, poh.TotalDue, 
													   poh.ModifiedDate
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

--good
SELECT 
pod.PurchaseOrderID, 
pod.PurchaseOrderDetailID, 
pod.DueDate, 
pod.OrderQty, 
pod.ProductID, 
pod.UnitPrice, 
pod.LineTotal, 
pod.ReceivedQty, 
pod.RejectedQty, 
pod.StockedQty, 
pod.ModifiedDate, 
poh.PurchaseOrderID, 
poh.RevisionNumber, 
poh.Status, 
poh.EmployeeID, 
poh.VendorID, 
poh.ShipMethodID, 
poh.OrderDate, 
poh.ShipDate, 
poh.SubTotal, 
poh.TaxAmt, 
poh.Freight, 
poh.TotalDue, 
poh.ModifiedDate
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

Rule 6. ”Drop if exists” should be a part of every script that creates a permanent database object.

USE AdventureWorks2016

DROP PROCEDURE IF EXISTS usp_NoCountExample
GO

CREATE PROCEDURE usp_NoCountExample

AS
BEGIN

SET NOCOUNT ON;

SELECT *
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE 1 = 1

SET NOCOUNT OFF;

END;
GO

Don't Do These Things Specifically

Don't 1. DO NOT write code with leading commas.

Who does that?! (Plenty of people.)

USE AdventureWorks2016

--bad
SELECT 
pod.PurchaseOrderID
,pod.PurchaseOrderDetailID
,pod.DueDate 
,pod.OrderQty 
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

--good
SELECT 
pod.PurchaseOrderIDpod.PurchaseOrderDetailID,
pod.DueDate,
pod.OrderQty 
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

--NO! Go straight to jail. Do not pass go! Do not collect 100 dollars!
SELECT 
      pod.PurchaseOrderID
     ,pod.PurchaseOrderDetailID
     ,pod.DueDate 
     ,pod.OrderQty 
FROM Purchasing.PurchaseOrderHeader poh
JOIN Purchasing.PurchaseOrderDetail pod
ON poh.PurchaseOrderID = pod.PurchaseOrderID

Don't 2. DO NOT line up column names.

Seems cool until you have a 1000 line proc and you change a line and now you have to change 9999 other lines so everything stays lined up!

USE demo

--bad
select first employee_first_name,
       surname employee_last_name,
       title,
       case when employment = 1 then 'FT' 
       when employment = 2 then 'PT' 
       else 'T' end as employment_status,
       'Y' as is_valid,
       'HR' employee_source


--good
select 
first as employee_first_name,
surname as employee_last_name,
title,
case 
when employment = 1 then 'FT' 
when employment = 2 then 'PT'
else 'T' 
end as employment_status,
'Y' as is_valid,
'HR' as employee_source
PreviousSSISNextPython

Last updated 3 years ago