r/MicrosoftFabric

The writeHeavy default is quietly hurting Direct Lake performance in a lot of Gold workspaces

Ran into the same issue in three separate Fabric engagements over the last few months, so writing it up in case it saves someone time.

All newly created Fabric workspaces default to the writeHeavy Spark resource profile. This is correct for Bronze and ingestion workloads.

The problem: writeHeavy disables V-Order by default, and a lot of teams never change the profile on their Gold workspace.

What that costs you, per Microsoft Learn's cross-workload optimization guide:

Direct Lake cold-cache queries: 40 to 60 percent slower without V-Order.
SQL analytics endpoint and Warehouse: roughly 10 percent slower reads.
Spark: no read impact either way.

The fix is one line of config per environment. Set spark.fabric.resourceProfile to readHeavyForPBI on Gold workspaces, readHeavyForSpark on Silver if it's read-heavy, leave writeHeavy on Bronze.

A few other things that surprised me while digging into this:

Optimize Write, Auto Compact, and Low Shuffle Merge are all enabled by default in Fabric's Spark runtime. A lot of "optimization advice" on the internet is telling you to re-enable things that are already on.
Liquid Clustering is the recommended approach for new tables, but ALTER TABLE ... CLUSTER BY on an existing unpartitioned table requires Delta Lake 3.3. Fabric Runtime 1.3 is on Delta 3.2. So you can create new clustered tables today, but retrofitting existing tables requires a migration.
Runtime 2.0 (Spark 4.0, Delta 4.0) is Experimental Public Preview. Delta 4.0 features like type widening and variant type only work in Spark notebooks. If the table is read by Direct Lake, SQL endpoint, or Warehouse, those features break interoperability. Microsoft's own guidance is to stay on Runtime 1.3 for production.

I wrote up the full decision framework (five levers, per-layer profile mapping, Runtime 2.0 caveats) here if it's useful.

Curious what others have seen. Has anyone actually measured the V-Order difference on a real Direct Lake model, or is the 40 to 60 percent number holding up in practice?

psistla.com

u/alternative-cryptid — 5 hours ago

▲ 1 r/MicrosoftFabric

Compare notebooks in two Workspace

Is there any reliable native way or 3rd party tool that can help me compare notebooks from one Workspace to another like checking if all notebooks available in one Workspace is valuable in another Workspace or not and if available their code block matches or not.

We are not using deployment pipeline so it's not option. I'm thinking about any python library or any tool

r/MicrosoftFabric

The writeHeavy default is quietly hurting Direct Lake performance in a lot of Gold workspaces

Ran into the same issue in three separate Fabric engagements over the last few months, so writing it up in case it saves someone time.

Compare notebooks in two Workspace

Power BI license required to work in Fabric?

Direct Lake Throttling?

Something broken with run Notebook under Workspace Identity.. or has very excessive CU overhead?

Notebook (Python/PySpark); get user or security context of running notebook

How are you handling the 8x/day refresh limit for the Fabric Capacity Metrics app on a Pro workspace?

fabric-cicd v1.0.0 is here - a major milestone with breaking changes and other important updates!

Breaking Changes — Read Before Upgrading!

New Item Support

New Functionality

Bug Fixes

Get Started

The dbt-fabricspark Lakehouse adapter now comes with a ridiculous amount of production grade test coverage

Stuck - semantic model how to parameter the Lakehouse connection?

Lakehouse data staging

Designing the data infrastructure for my org - looking for feedback

CopyJob: Cannot How to add audit columns?

Analyst, Data Management

Deep dive into OneLake Security in Microsoft Fabric

Power BI + AI: Are we moving beyond dashboards toward conversational analytics?

Workspace Item level permission restriction ?

Passed the DP-700 Exam after implementing a lakehouse medallion architecture for an organization.

How do I explain that SQL Server should not be used as a code repository?

Variable library item reference - preview status