Part 2: Open Source Metadata-Driven Frameworks for Microsoft Fabric, which one to use…

Part 2: Open Source Metadata-Driven Frameworks for Microsoft Fabric, which one to use…

Exploring three ‘Open Source Metadata Frameworks’

In my previous blog, I introduced the concept of metadata-driven data loading in Microsoft Fabric – a powerful approach to streamline and standardize data ingestion and transformation (A Metadata Driven Framework for MS Fabric: What is it and why do you want it? – Powerdobs). The key conclusion is that a metadata-driven approach offers many advantages and is definitely worth considering when implementing a new data platform. However, implement it smartly!

 

In implementing a meta data driven approach ‘smartly’, there are various tools that can make your life easier. In this follow-up blogpost, I’ll share my findings from testing three promising solutions, each offering a unique take on metadata-driven architecture.

 

Why Use an Open Source Metadata Framework?

While custom-built solutions can work for specific clients or data sources, over time they often become complex and harder to maintain. Tooling can help you in those cases as open source frameworks offer:

  • Standardization across environments, teams and customers
  • Lower setup and maintenance costs
  • Community-driven improvements
  • Built-in support for logging, lineage, and data quality (depending on framework)

 

Overview of Explored Frameworks

Framework

Layers Supported

Metadata Storage

Deployment Complexity

FMD Framework

Bronze, Silver

Fabric SQL DB

Easy (Fabric CLI)

Fabric Accelerator

Bronze, Silver, Gold

Azure SQL DB (Fabric coming)

Moderate (IaC + config)

AquaShack

Landing, Base, Curated

JSON in notebooks

Simple (Notebook only)

 

All three open source solutions are available on Github, direct links are provided in the first column.

 

1. FMD Framework (by Erwin Kreuk)

A Fabric-native solution with deployment via Fabric CLI. This Fabric Metadata-Driven Framework is a scalable, extensible solution for managing, integrating, and governing data using a metadata-driven approach on Fabric SQL Database.

The latest version improved deployment by using Fabric CLI and now exposes readable source code in GitHub, making updates and version tracking easier than previous versions.

 

Highlights:

  • Uses central config workspace and separate workspaces for code and data per environment
  • Supports landing, bronze and silver layers
  • Metadata and logging via Fabric SQL DB
  • Hashing of key columns and hash column for change detection
  • Silver layer always is historical (row versioning with end-dating)
  • Demo CSV included for quick testing
  • Cleaning and data quality features are optional

 

Limitations:

  • No gold layer
  • Orchestration logic in Fabric Data Pipelines can be somewhat complex and has a bit of a learning curve

 

2. Fabric Accelerator (based on ELT Framework by Benny Austin)

The Extract Load Transform (ELT) framework is a robust implementation based on the ELT Framework. It supports multiple platforms including Synapse and Databricks. The Fabric variant is the newest implementation of this ELT framework and is still evolving. Although the full framework uses more non-Fabric components, the Fabric Accelerator can be setup with these components at a minimum:

  • Fabric Capacity
  • Newest version: Fabric SQL Db for controlDB (but I tested with older version with Azure SQL Db)
  • Optional for test data: WorldWideImporter DB in Azure SQl Db

 

Highlights:

  • Supports bronze, silver, and gold layers
  • Uses Azure SQL DB for metadata
  • Infrastructure as Code setup
  • Includes streaming (hot path for monitoring) and batch (gold path) processing
  • Real-time dashboards via Eventhouse and KQL

 

Limitations:

  • Deployment is complex and error-prone because of the need to manually replace guids in multiple stages
  • Still uses Legacy Invoke Pipeline activities

 

3. AquaShack (by Christian Henrik Reich)

This GitHub solution is a “pico example” of a meta-data driven Lakehouse in Fabric. The layers are called Landing, Base and Curated. A lightweight, notebook-driven framework ideal for quick prototyping or educational use. A nice simple example for a Notebook-only solution.

 

Highlights:

  • All logic in notebooks, no pipelines
  • Uses JSON metadata for orchestration
  • Simple cleaning steps and DAG-based execution
  • Example data included

 

Limitations:

  • No ingestion layer
  • No logging or restart options
  • No gold layer or surrogate key support

 

Conclusion

Each framework offers a different balance of simplicity, flexibility, and completeness. For Fabric-only environments, the FMD Framework and AquaShack are great starting points. For more advanced setups with gold layer support and real-time monitoring, Fabric Accelerator is a strong candidate—though it requires more effort to deploy and its Fabric implementation is still evolving.

The FMD Framework stands out for its automated deployment via Fabric CLI and its clean separation of code and data across workspaces. This design allows data analysts to access data without touching code and supports better branching strategies for development—similar to how Azure Data Factory operates.

All three frameworks are worth exploring for their approaches to metadata handling, data cleansing, and quality assurance. However, it’s essential to evaluate which solution best fits your specific needs and thoroughly test it before deploying in production. If you encounter issues or have suggestions for improvement, please share them on the respective GitHub repositories. Better yet, consider contributing directly to the codebase.

If you know of other open-source solutions for metadata-driven pipelines in Fabric, I’d love to hear about them. Let’s keep building better data platforms—together.

 

 

 

 

Ernst Wolf