Part 2: Open Source Metadata-Driven Frameworks for Microsoft Fabric, which one to use...

12 Aug

Part 2: Open Source Metadata-Driven Frameworks for Microsoft Fabric, which one to use…

in Blog

Exploring three ‘Open Source Metadata Frameworks’

In my previous blog, I introduced the concept of metadata-driven data loading in Microsoft Fabric – a powerful approach to streamline and standardize data ingestion and transformation (A Metadata Driven Framework for MS Fabric: What is it and why do you want it? – Powerdobs). The key conclusion is that a metadata-driven approach offers many advantages and is definitely worth considering when implementing a new data platform. However, implement it smartly!

In implementing a meta data driven approach ‘smartly’, there are various tools that can make your life easier. In this follow-up blogpost, I’ll share my findings from testing three promising solutions, each offering a unique take on metadata-driven architecture.

Why Use an Open Source Metadata Framework?

While custom-built solutions can work for specific clients or data sources, over time they often become complex and harder to maintain. Tooling can help you in those cases as open source frameworks offer:

Standardization across environments, teams and customers
Lower setup and maintenance costs
Community-driven improvements
Built-in support for logging, lineage, and data quality (depending on framework)

Overview of Explored Frameworks

Framework	Layers Supported	Metadata Storage	Deployment Complexity
FMD Framework	Bronze, Silver	Fabric SQL DB	Easy (Fabric CLI)
Fabric Accelerator	Bronze, Silver, Gold	Azure SQL DB (Fabric coming)	Moderate (IaC + config)
AquaShack	Landing, Base, Curated	JSON in notebooks	Simple (Notebook only)

All three open source solutions are available on Github, direct links are provided in the first column.

1. FMD Framework (by Erwin Kreuk)

A Fabric-native solution with deployment via Fabric CLI. This Fabric Metadata-Driven Framework is a scalable, extensible solution for managing, integrating, and governing data using a metadata-driven approach on Fabric SQL Database.

The latest version improved deployment by using Fabric CLI and now exposes readable source code in GitHub, making updates and version tracking easier than previous versions.

Highlights:

Uses central config workspace and separate workspaces for code and data per environment
Supports landing, bronze and silver layers
Metadata and logging via Fabric SQL DB
Hashing of key columns and hash column for change detection
Silver layer always is historical (row versioning with end-dating)
Demo CSV included for quick testing
Cleaning and data quality features are optional

Limitations:

No gold layer
Orchestration logic in Fabric Data Pipelines can be somewhat complex and has a bit of a learning curve

2. Fabric Accelerator (based on ELT Framework by Benny Austin)

The Extract Load Transform (ELT) framework is a robust implementation based on the ELT Framework. It supports multiple platforms including Synapse and Databricks. The Fabric variant is the newest implementation of this ELT framework and is still evolving. Although the full framework uses more non-Fabric components, the Fabric Accelerator can be setup with these components at a minimum:

Fabric Capacity
Newest version: Fabric SQL Db for controlDB (but I tested with older version with Azure SQL Db)
Optional for test data: WorldWideImporter DB in Azure SQl Db

Highlights:

Supports bronze, silver, and gold layers
Uses Azure SQL DB for metadata
Infrastructure as Code setup
Includes streaming (hot path for monitoring) and batch (gold path) processing
Real-time dashboards via Eventhouse and KQL

Limitations:

Deployment is complex and error-prone because of the need to manually replace guids in multiple stages
Still uses Legacy Invoke Pipeline activities

3. AquaShack (by Christian Henrik Reich)

This GitHub solution is a “pico example” of a meta-data driven Lakehouse in Fabric. The layers are called Landing, Base and Curated. A lightweight, notebook-driven framework ideal for quick prototyping or educational use. A nice simple example for a Notebook-only solution.

Highlights:

All logic in notebooks, no pipelines
Uses JSON metadata for orchestration
Simple cleaning steps and DAG-based execution
Example data included

Limitations:

No ingestion layer
No logging or restart options
No gold layer or surrogate key support

Conclusion

Each framework offers a different balance of simplicity, flexibility, and completeness. For Fabric-only environments, the FMD Framework and AquaShack are great starting points. For more advanced setups with gold layer support and real-time monitoring, Fabric Accelerator is a strong candidate—though it requires more effort to deploy and its Fabric implementation is still evolving.

The FMD Framework stands out for its automated deployment via Fabric CLI and its clean separation of code and data across workspaces. This design allows data analysts to access data without touching code and supports better branching strategies for development—similar to how Azure Data Factory operates.

All three frameworks are worth exploring for their approaches to metadata handling, data cleansing, and quality assurance. However, it’s essential to evaluate which solution best fits your specific needs and thoroughly test it before deploying in production. If you encounter issues or have suggestions for improvement, please share them on the respective GitHub repositories. Better yet, consider contributing directly to the codebase.

If you know of other open-source solutions for metadata-driven pipelines in Fabric, I’d love to hear about them. Let’s keep building better data platforms—together.

Tags:

AquaShack,AzureData,DataEngineering,DataPlatform,DataQuality,ELT,FabricAccelerator,FMDFramework,lakehouse,MetadataDriven,MicrosoftFabric,OpenSource

Part 2: Open Source Metadata-Driven Frameworks for Microsoft Fabric, which one to use…