Here are some of the key projects I’ve worked on:
Real-time Data Platform Migration
Transitioning core data platform from nightly batch to real-time streaming
Role: Senior Data Platforms Developer / Lead | Dates: Jan 2025 - Present | Status: Ongoing
Technologies: Azure Databricks, Delta Live Tables (DLT), Spark Structured Streaming, Kafka, Azure Event Hub, Redis, Python, Terraform, Azure DevOps
Leading the technical design and implementation for migrating core components of the enterprise data platform from a nightly batch refresh model to real-time data ingestion and processing. This involves designing and building streaming pipelines, integrating with messaging queues (Kafka, Event Hub), leveraging caching solutions (Redis), and defining infrastructure using Terraform. Also contributing to the Data Platform Governance committee overseeing these changes.
Data Platform Modernization
Led modernization effort improving ETL performance and data availability
Role: Technical Lead | Dates: 2022 - 2024 | Status: Completed
Technologies: Azure Databricks, Apache Spark, PySpark, Python, SQL, Delta Lake, Azure DevOps, Git, CI/CD, Databricks Asset Bundles
Spearheaded the technical leadership for a team of 12 engineers on a critical initiative to modernize the enterprise data platform. Identified significant deficiencies in reporting availability due to legacy ETL performance and quantified the business impact. Developed and presented the comprehensive solution design, ROI analysis, budget forecasting (multi-million dollar investment), and implementation plan to executive leadership, securing buy-off and funding. Key activities included:
- Converting ~600 legacy code packages (1k-20k LOC each) to modern PySpark/Databricks equivalents.
- Establishing CI/CD pipelines in Azure DevOps using Databricks Asset Bundles (DABs) for automated testing and deployment.
- Creating standardized Databricks Asset Bundle templates to ensure consistency and accelerate project setup across the team.
- Onboarding and training over 100 developers on the new platform, tools, and best practices within the first year.
Outcomes:
- Reduced nightly ETL processing time from 8 hours to 2 hours (4x improvement).
- Improved critical reporting SLA compliance from 5% to 99%+.
- Enabled broad platform adoption through the successful onboarding and training of 100+ developers within the first year.
- Recipient of the 2024 PacificSource Enterprise Innovation Award for this work.
Provider Master Data Management (MDM)
Implemented MDM solution for complex healthcare provider data
Role: Software Developer II | Dates: Approx. 2020 - 2021 | Status: Completed
Technologies: SQL Server, T-SQL, SSIS (or other ETL tool), Data Modeling
Designed and implemented a Master Data Management (MDM) solution specifically for healthcare provider data. This involved analyzing and resolving complex contracting arrangements and business relationships within the data modeling process to create a single source of truth for provider information. Developed ETL processes to integrate and cleanse data from various source systems.