Posts
All blog posts and articles.
Recent Posts
-
Migrating your DABs CI/CD from Terraform to the Direct Engine
Declarative Automation Bundles (formerly Databricks Asset Bundles, though everyone still says DABs) have been using Terraform under the hood since day one. T...
-
No more blob crawling: table storage metrics in plain SQL
A while back I wrote up a whole process for figuring out how much storage your Unity Catalog managed tables are actually using. It involved the Azure Blob SD...
-
Getting a handle on the blobs behind Unity Catalog
Update: As of DBR 18.0, Databricks has a built-in SQL command for this: ANALYZE TABLE ... COMPUTE STORAGE METRICS. It gives you active, vacuumable, and ti...
-
Creating Reusable Databricks Asset Bundle (DAB) Templates
As teams adopt Databricks Asset Bundles (DABs) for managing projects, ensuring consistency and accelerating setup becomes crucial. Instead of copying and pas...
-
Using Deterministic Primary Keys Instead of Identities 🔑
Traditional data warehousing often relies on auto-incrementing IDs as primary keys, leading to dependencies that can hinder data freshness, particularly in s...
-
From Batch to Streaming: Supercharging Data Freshness on Azure Databricks
We traded in our clunky, nightly batch processing for a sleek, real-time streaming data machine on Azure Databricks. By cleverly combining a hybrid approach ...