Documents
Data Warehouse ETL Framework
Data Warehouse ETL Framework
  • Introduction To The ETL Framework
  • Framework Overview
    • Framework Description
    • Why You Need An ETL Framework
      • What is the ETL Framework?
      • Framework Objectives
      • Can I use my own resources to implement The Framework?
    • Download The Framework
    • What's In The Box
    • v1.0 Release
    • Training Resources For Engineers
    • Getting Help
  • General Principles
    • Never Finished
    • Overall Engineering Philosophy
    • Self Service Analytics As The Goal
    • Strict Adherence To Globally Accepted Practice
    • Data Flow Design
    • Effective DataOps
    • Code First Philosophy
    • Melding Kimball And Inmon
    • Master Data Management
    • Open Architecture
    • Decoupled Systems
    • Independent Identically Executing Processes
    • Robust Processes
    • Self-Diagnostics
    • Data Provenance
    • Adverse Reporting
    • Error Recovery
    • Archrival And Recovery
    • Documentation As A Necessity
  • DataOps For The Uninitiated
    • DataOps Initiation
    • What Is Agile Software Development?
    • What Is DevOps?
    • What Is Statistical Process Control?
    • You Take All That And You Wind Up With...
    • The DataOps Manifesto
    • Want To Learn More About DataOps?
  • ETL Developer's Field Guide
    • The Plain English Explanation Of What You Are About To Build
    • Data Warehouse User Profiles
    • Using Data Source Names
    • Why You Should Not Use SSIS
      • SSIS Is Not Forward Compatible
      • SSIS Is Not Very Performant
      • Python Development Is Faster Than SSIS
      • SSIS Is Difficult To Deploy To Production
      • SSIS Is Difficult To Maintain
    • Python Software Engineering Considerations
    • Enterprise Data Ecosystem
      • EDW's Place In Enterprise Architecture
      • EDW ETL Overview
    • ETL Environment Databases
    • Data Acquisition Paradigms
      • Demand Pull
      • Supply Push
    • The Common Model
    • The Semantic Layer
    • Database Object Naming Conventions
      • Columns
      • Indexes
      • Primary Keys
      • Schemas
      • Stored Procedures
      • Tables
      • User Defined Functions
    • ETL Reference Architecture
    • Table Anatomy
      • Sample Dimension Table
      • Sample Fact Table
      • Sample Indexed View
      • Sample Junk Dimension Table
      • Sample Master Data Management Table
      • Sample Stage Table
    • Master Data Management
      • MDM With Exact Match On Source System Key
      • Implementation Details
    • Slowly Changing Dimension Processing
      • General Process
      • Creating Empty Records
      • SCD Processing Types
      • Implementing SCDs As Temporal Tables
    • Handling Calculated Values
    • Warehouse Load Commandments
    • Code Style Guide
      • SSIS
      • Transact SQL
      • Python
      • C#
    • The Role Of Schemas
      • Standard Schemas And Their Definitions
      • Schemas As A Security Device
      • Schemas As A DB Object Differentiator
    • Feedback And Control Systems
      • Having An Engineering Mindset
      • The Mechanics Of Checking For Unprocessed Records
      • Time Series Analysis 101
      • Passive Monitoring System Design Theory
      • Passive Monitoring System Implementation
    • Source Control
    • Loading Historical Data
    • Security Access Model
    • Wrapping Up
  • ETL Environment Set Up
    • Install Anaconda
    • Create Initial Source Control Folders
    • Create Primary Databases And Schemas
    • Customize And Run Deployment Scripts
    • Install Python Packages
    • Create Data Source Name
    • Create The SSIS Catalogue
    • Create The File I/O Directory Structure
    • Create Global Environment In The Integration Services Catalog
    • Create The SQL Server Aliases
    • Create The BIAnalytics Proxy Account
    • Create The Opt Folder
    • Configure Email Alerts
  • Standard SSIS ETL Development Package
    • Starting A New Process
    • Tracking Package Variables
    • Remove Things You Do Not Need
    • Creating New Configuration Settings
  • Sample Script Guide
    • SQL
      • Finalize And Audit Scripts
        • 18 CREATE PROC usp_RecordRowCounts
        • 19 CREATE PROC usp_MarkRecordsAsProcessed
        • 20 CREATE PROC usp_CheckForUnprocessedRecords
      • Monitoring Scripts
        • 21 CREATE PROC usp_DisplayTablesNotLoading
        • 22 CREATE PROC usp_LoadTableLoadReportingTable
        • 23 CREATE PROC usp_TableLoadMonitoring
      • Table Object Sample Scripts
        • sample type I dimension table
        • sample type II dimension table
        • sample fact table
        • sample indexed view
        • sample junk dimension table
        • sample master data management table
        • sample stage table
      • Data Processing Sample Scripts
        • Data Pull
        • batch processing
        • bulk insert
        • data cleansing
        • fact table load
        • remove dups
        • type I dimension processing
        • type II dimension processing
      • Helper Scripts
        • Create Database
        • add rowhash
        • change collation
        • configuration insert sample script
        • documentation block
        • populate fact table with fake data
        • proc execution scripts
        • show all columns in database
        • troubleshooting
        • util
        • start an agent job from T-SQL
    • Python
      • Building Blocks
        • CombineCSVsIntoOneFile
        • ConvertCommaToPipeDelimitedFile
        • ConvertExcelToCSV
        • LoopingOverFilesInADirectory
        • Process Zip File
        • QueryDatabaseAndWriteSmallFile
        • QueryDatabaseAndWriteLargeFile
        • SendEmail
        • StringMatching
        • YAMLConfigImport
      • Full Solutions
        • DownloadMoveAndStoreDataCSV
        • LoadLargeCSVsIntoDataWarehouseStagingTables
        • MoveAndStoreDataExcel
        • ReloadFromArchive
      • Jupyter Notebooks
        • Passive Monitoring System Design Theory
    • PySpark
      • ConnectToSpark
      • LoadCSV
      • ExampleDataProcessing
    • Windows Batch
  • A Methodology To Rapidly Convert OLTP Databases to OLAP Solutions
    • Step 1: Find The Nouns
    • Step 2: Find The Stuff We Want To Do Math On
    • Step 3: Analyze Relationships
  • Data Model Creation Tool
    • Record Count
    • Sample Data
    • Create Stage Table
    • Char Length Analysis
    • Column Notes
    • Column Cleansing Notes
    • Source To Target Mapping
    • Dimension List
    • Fact Table Creation Helper
    • Foreign Key Creation
    • Process Fact Script Helper
    • View Creation Helper
    • Date Role Play View Helper
    • Data Model View Creation Helper
    • List Population Values
  • Performance Monitoring
    • The Daily Chore
    • Diagnostic Tools
      • Using The Built-In Stored Procs
      • Using The Built-In Views
    • Logging Database Diagram
  • Data Warehouse Troubleshooting Guide
    • Generalized Troubleshooting Steps
  • Business Analytics Capability Maturity Model
    • What Is A Business Analytics Capability Maturity Model?
    • Level 0. Operational Reporting
    • Level 1. Rapid Delivery
    • Level 2. Self Service
    • Level 3. Central Repository
    • Level 4. Open Data 1
    • Level 5. Open Data 2
    • Level 6. Feedback 1
    • Level 7. Data Archaeology
    • Level 8. Crystal Ball
    • Level 9. Feedback 2
    • Level 10. Oil Rig
    • Level 11. The Singularity
  • Appendices
    • Appendix A. What is Medium Data®?
    • Appendix B. The Benefits Of Using Python And T-SQL Over SSIS For ETL
Powered by GitBook
On this page
  1. ETL Developer's Field Guide
  2. Why You Should Not Use SSIS

SSIS Is Difficult To Maintain

When I design ETL processes, I think about high performance fighter aircraft. Sound weird? What do F-16s and ETL processes have in common? Maintenance requirements.

If you get a close look at many modern fighter aircraft, you will notice that the skin of the aircraft is covered in what are clearly panels. You can tell a panel by the outline of the screws you have to undo to remove the panel. These panels are maintenance access hatches. They allow easy access to aircraft systems.

On some older aircraft, in order to maintain some systems, you have to remove other systems that are in the way. In particular, I am thinking of the T-38 where you have to remove the engines to perform certain maintenance functions. The engine does not need fixed, but you have to take it out anyway because the broken system is buried in the fuselage and not near a maintenance access hatch.

I build my ETL processes with a lot of maintenance hatches. When things break, I do not want to have to remove the engine to change a hydraulic line. I just want to pop open a hatch, swap out the bad part, put the hatch back, and start generating sorties.

SSIS is a hot mess. A lot of the things I have talked about are difficult to impossible to implement with SSIS. Trying to maintain existing processes is a nightmare.

When something breaks in SSIS, you have to open up the package in the IDE which can be a nightmare all its own. Tracking down the error in the system logs, makes you feel like Nicolas Cage’s character in National Treasure, and that is assuming you get good error information at all.

SSIS has a system of event handlers. These are actions that take place when certain events occur. The interface for managing event handlers is not well implemented. Trying to find an event handler, well, let us just say that I have solved a Rubik’s Cube faster.

Like I mentioned above, SSIS allows junior engineers to create anti-patterns. A better way to put that is, SSIS allows junior engineers to create unnecessarily labyrinthine packages. They will have task all over the place like some digital Gordian Knot. It will make the wires on your home entertainment center look like a paragon of organization. They will have packages calling other packages. They will use SSIS as an orchestration tool which just unnecessarily adds another layer of orchestration. They will write a huge monolithic package that does everything instead of breaking things down into atomic task that are identical independently executing processes.

You can and should build your SSIS packages to call stored procedures. However, if a stored procedure is busted, you cannot just push the fix. Sometimes, alterations to stored procs require that you add columns to the output query (the last select statement in a proc). Any change to the columns of the output query result in new metadata. SSIS reads the metadata from stored procs. If the metadata that it is reading, does not match what it expects, SSIS will error out.

In order to fix it, you have to open up SSIS, go to the data flow tab, open up either the source or the destination task and refresh the metadata. Then you have to redeploy the package. And I ask you. Why? Why do all that when you can just fix the proc, go through change control, change your connection in SSMS, hit F5, and call it a day?

I have already talked about code deployment, but that topic also overlaps with maintenance issues. Getting SSIS packages promoted through your various environments is just harder than it needs to be.

Even if you have a senior SSIS engineer writing packages and that person knows all the tips, tricks, and tactics for writing relatively high performing packages, it still will not matter. SSIS just does not lend itself well to the concept of having many maintenance hatches that allow quick access to subsystems.

Do I sound a little nutty? That is because I have PTSD from dealing with poorly developed SSIS packages. You would not believe what I have dealt with over the years.

I cannot stress enough. Clean, high performing warehouse ETL. No SSIS. Use T-SQL and Python.

PreviousSSIS Is Difficult To Deploy To ProductionNextPython Software Engineering Considerations

Last updated 4 years ago