As teams adopt Databricks Asset Bundles (DABs) for managing projects, ensuring consistency and accelerating setup becomes crucial. Instead of copying and pasting configurations, you can create DAB Templates. These templates allow you to define a standard project structure and configuration, while prompting the user for specific details (like project names, workspace URLs, or the project’s purpose) during initialization using a structured schema.
This guide demonstrates how to build a DAB template following the official Databricks conventions, using a JSON schema for prompts and a nested template structure.
Download Example Template Files:
You can find the example template files used in this post (including the schema, bundle config, job resource, and placeholder notebook) in the following directory on GitHub:
Understanding the Official DAB Template Structure
The official convention for Databricks Asset Bundle templates involves a specific structure:
databricks_template_schema.json
(Root Level): This JSON file defines the input variables (prompts) presented to the user duringdatabricks bundle init
. It uses JSON Schema properties to define each prompt’s name, description, type, default value, etc.template/
Directory (Root Level): This directory contains the actual template files that will be copied and processed to create the new bundle project.template//
Subdirectory: Inside thetemplate
directory, a subdirectory named using the Go template variable `` (or another variable defined in the schema) is required. This subdirectory contains the core template files:databricks.yml.tmpl
: The template for the main bundle configuration file.resources/
: A subdirectory typically containing resource definition templates (e.g.,_job.yml.tmpl
).src/
: A subdirectory typically containing source code like notebooks or Python files.
Key Differences from Older Methods:
- Prompts are defined in
databricks_template_schema.json
, not within atemplate:
block indatabricks.yml.tmpl
. - Input variables defined in the JSON schema are accessed within
.tmpl
files using the syntax(e.g.,
), without theinput_
prefix. - The template files reside within the nested
template//
structure.
The .tmpl
Extension Convention:
As mentioned before, using the .tmpl
extension for files containing Go template syntax (``) is a helpful convention for clarity. The bundle init
command processes these files and saves the output without the .tmpl
extension in the generated project.
Example Official DAB Template
Template Project Structure:
my-official-dab-template/
├── databricks_template_schema.json # Defines prompts
└── template/
└── / # Core template files reside here
├── databricks.yml.tmpl
├── resources/
│ └── _job.yml.tmpl # Resource template using project name
└── src/
└── placeholder_notebook.py
Template Schema (databricks_template_schema.json
)
{
"properties": {
"project_name": {
"type": "string",
"description": "Enter a short, descriptive name for this project (used in names, paths)",
"default": "my_databricks_project"
},
"bundle_purpose": {
"type": "string",
"description": "Select the primary purpose (e.g., ETL, Analytics, ML Training, ML Inference)",
"default": "Analytics"
},
"databricks_host_prod": {
"type": "string",
"description": "Enter the Databricks workspace URL for the PROD environment"
},
"workspace_root_path": {
"type": "string",
"description": "Enter the root path in the Databricks workspace for deployment (e.g., /Shared/Bundles)",
"default": "/Shared/Bundles"
},
"team_permission_group": {
"type": "string",
"description": "Enter the Databricks group name for team MANAGE permissions (optional, leave blank if none)",
"default": "data_engineering"
}
},
"required": [
"project_name",
"bundle_purpose",
"databricks_host_prod"
]
}
Explanation:
- Uses standard JSON Schema format.
- Defines properties for
project_name
,bundle_purpose
, etc., with types, descriptions, and defaults. - The
required
array lists variables that must be provided by the user.
Bundle Configuration Template (template//databricks.yml.tmpl
)
This file is now simpler, as prompts are externalized. Note the variable access ``.
# Located inside template//databricks.yml.tmpl
bundle:
# Bundle name is derived from the directory name created by init
name:
# Define deployment targets using the input variables
targets:
dev:
mode: development
default: true
prod:
mode: production
host: ""
root_path: "//prod"
# Include resource definition files
# Path is relative to this databricks.yml.tmpl file
include:
- resources/*.yml
# Define permissions for the bundle artifacts using input variables
permissions:
- level: CAN_MANAGE
group_name: ""
- level: CAN_VIEW
group_name: users
Explanation:
- Variables are accessed directly, e.g.,
,
. - The
include
path is relative to this file within thetemplate/
structure.
Example Job Resource Template (template//resources/_job.yml.tmpl
)
This job definition template uses the variables and follows the naming convention.
# Located inside template//resources/_job.yml.tmpl
resources:
jobs:
# Resource key can also use variables if needed, though static is often simpler
_job:
name: "[] Basic Job (${bundle.target})"
tags:
project: ""
purpose: ""
source: "DAB Template"
tasks:
- task_key: run_notebook
notebook_task:
# Path relative to the root of the *generated* bundle (databricks.yml location)
notebook_path: ../src/placeholder_notebook.py
permissions:
- level: CAN_MANAGE
group_name: ""
- level: CAN_VIEW
group_name: users
Explanation:
- The filename itself (
_job.yml.tmpl
) uses template syntax. - The resource key
_job:
also uses the variable. - Variable access is ``.
- The
notebook_path
is relative to the location of the generateddatabricks.yml
file.
Placeholder Notebook (template//src/placeholder_notebook.py
)
This file remains unchanged as it contains no template variables.
# Located inside template//src/placeholder_notebook.py
# Databricks notebook source
print("This is a placeholder notebook generated from the DAB template.")
# TODO: Replace this with actual project logic.
# Example: Accessing bundle variables (if passed via job parameters)
# dbutils.widgets.text("project_name", "default_project")
# project_name = dbutils.widgets.get("project_name")
# print(f"Running notebook for project: {project_name}")
# Example: Getting the purpose tag (if passed as a parameter)
# dbutils.widgets.text("purpose", "default_purpose")
# purpose = dbutils.widgets.get("purpose")
# print(f"Notebook purpose: {purpose}")
Using the Template
To create a new project based on this official template structure:
- Navigate: Open your terminal where you want the new project.
- Initialize: Run
databricks bundle init <template_source>
. - Answer Prompts: Provide values based on the descriptions defined in
databricks_template_schema.json
. - Project Generated: A new directory named after the
project_name
you entered is created. Inside, the files from the template’stemplate//
directory are copied, processed (substituting `` values), and renamed (e.g.,.tmpl
removed, filenames with variables resolved).
Conclusion
Following the official Databricks Asset Bundle template structure using databricks_template_schema.json
provides a robust and standardized way to create reusable project starters. While slightly more complex initially, it clearly separates prompt definitions from configuration logic, leading to cleaner and more maintainable templates, especially as they grow in complexity.
References
Here are links to the official documentation for the key concepts discussed in this post:
- Databricks Asset Bundles Documentation
- Databricks Asset Bundles overview - Main documentation for DABs.
- Develop a Databricks Asset Bundle template - Official guide on creating templates, including the
databricks_template_schema.json
structure. - Databricks Asset Bundles configuration - Details on
databricks.yml
settings.
- Go Templates
- Go
text/template
Package Documentation - Official documentation for the Go template language used by Databricks Asset Bundles for variable substitution (``).
- Go