r/MicrosoftFabric

The writeHeavy default is quietly hurting Direct Lake performance in a lot of Gold workspaces

The writeHeavy default is quietly hurting Direct Lake performance in a lot of Gold workspaces

Ran into the same issue in three separate Fabric engagements over the last few months, so writing it up in case it saves someone time.

All newly created Fabric workspaces default to the writeHeavy Spark resource profile. This is correct for Bronze and ingestion workloads.

The problem: writeHeavy disables V-Order by default, and a lot of teams never change the profile on their Gold workspace.

What that costs you, per Microsoft Learn's cross-workload optimization guide:

  • Direct Lake cold-cache queries: 40 to 60 percent slower without V-Order.
  • SQL analytics endpoint and Warehouse: roughly 10 percent slower reads.
  • Spark: no read impact either way.

The fix is one line of config per environment. Set spark.fabric.resourceProfile to readHeavyForPBI on Gold workspaces, readHeavyForSpark on Silver if it's read-heavy, leave writeHeavy on Bronze.

A few other things that surprised me while digging into this:

  • Optimize Write, Auto Compact, and Low Shuffle Merge are all enabled by default in Fabric's Spark runtime. A lot of "optimization advice" on the internet is telling you to re-enable things that are already on.
  • Liquid Clustering is the recommended approach for new tables, but ALTER TABLE ... CLUSTER BY on an existing unpartitioned table requires Delta Lake 3.3. Fabric Runtime 1.3 is on Delta 3.2. So you can create new clustered tables today, but retrofitting existing tables requires a migration.
  • Runtime 2.0 (Spark 4.0, Delta 4.0) is Experimental Public Preview. Delta 4.0 features like type widening and variant type only work in Spark notebooks. If the table is read by Direct Lake, SQL endpoint, or Warehouse, those features break interoperability. Microsoft's own guidance is to stay on Runtime 1.3 for production.

I wrote up the full decision framework (five levers, per-layer profile mapping, Runtime 2.0 caveats) here if it's useful.

Curious what others have seen. Has anyone actually measured the V-Order difference on a real Direct Lake model, or is the 40 to 60 percent number holding up in practice?

psistla.com
u/alternative-cryptid — 5 hours ago

Compare notebooks in two Workspace

Is there any reliable native way or 3rd party tool that can help me compare notebooks from one Workspace to another like checking if all notebooks available in one Workspace is valuable in another Workspace or not and if available their code block matches or not.

We are not using deployment pipeline so it's not option. I'm thinking about any python library or any tool

reddit.com
u/rabinjais789 — 3 hours ago

Power BI license required to work in Fabric?

I have a case where we want to use Fabric without Power BI. We have a trial started, but when my Fabric developers want to create a workspace on the trial capacity, they are prompted to start a Power BI trial. They already have Fabric Free-licenses.

Is a Power BI license required to work in Fabric with notebooks, lakehouses etc (no PBI models)?

reddit.com
u/Mr_Mozart — 13 hours ago

Direct Lake Throttling?

I have a single semantic model in my F2 capacity that seems to be consuming quite a bit of CU resources and throttling. I'm in the process of stripping it down to improve performance, but wondering if there is a set of strategies to systematically vet a semantic model and set it up for Direct Lake success? I currently process everything in notebooks and store in lakehouses with a final stored procedure to write to a Gold Layer Warehouse.

https://preview.redd.it/met0jqslclwg1.png?width=975&format=png&auto=webp&s=132db410bb9ca0eddd7a8e01efc61b62cf1abdb0

https://preview.redd.it/jgukranlclwg1.png?width=975&format=png&auto=webp&s=9cca3d04ef1462075293b81f7b2c3749288df9ff

https://preview.redd.it/7fm35anlclwg1.png?width=960&format=png&auto=webp&s=61fdc4f011c7099b0d1748a61ebd31e5aa9fb85d

reddit.com
u/wjwilson206 — 2 hours ago

Something broken with run Notebook under Workspace Identity.. or has very excessive CU overhead?

How much CU overhead is there when running Notebook with Workspace Identity.

We have lots of available capacity. We have different users and developers starting new sessions and running pipelines and complex scripts. We stop all that.

I can run a test Pipeline that calls a Notebook.

Once I use a connection object with Workspace Identity auth for the notebook....and set up the permissions, we get:

Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Exception, Error value - Failed to create Livy session for executing notebook. Error: {"code":"BadRequest","subCode":0,"message":"Encountered internal error while calling TokenProvider to get obo token. The return code is BadRequest, and no error details was provided."

I do not believe this is a capacity issue. If it is, how much additional overhead would running the Notebook through a connection use? I can't image this would increase it. Does it? By how much?

Otherwise -- there is something broken in connections + Workspace Identity.

reddit.com
u/Personal-Quote5226 — 2 hours ago

Notebook (Python/PySpark); get user or security context of running notebook

Is there a standard approach to get the security context (a guid that can be used to identify the account, user / account name) of the notebook execution?

I want to write that security context somewhere, so I want to know the standard way to retrieve it in consideration that the expected security context is different depending on how it’s run. I want to capture it though.

reddit.com
u/Personal-Quote5226 — 4 hours ago

How are you handling the 8x/day refresh limit for the Fabric Capacity Metrics app on a Pro workspace?

We're currently running the out of the box Fabric Capacity Metrics app on a Pro workspace (shared capacity), which limits us to 8 refreshes per day. Since we already have F128 capacities available, can we assign this workspace to our F128 capacity to unlock higher refresh limits, or are there other alternatives to increase the refresh frequency?

reddit.com
u/Bulky_Combination884 — 7 hours ago

fabric-cicd v1.0.0 is here - a major milestone with breaking changes and other important updates!

The fabric-cicd library just hit v1.0.0 — a huge milestone for the project! This release brings some significant changes, so if you're already using the library, please read through the breaking changes before upgrading. Let's break it all down.

Breaking Changes — Read Before Upgrading!

This release introduces some important breaking changes that require action on your part:

  • Explicit token credential is now required. The default credential fallback has been removed. You must now explicitly pass a token credential — no more implicit authentication as a safety net.
  • Implicit authentication in Fabric Notebook has been removed. Explicit token credential is also required when running fabric-cicd from the Fabric Notebook context.
  • FabricWorkspace and deploy_with_config now require keyword-only arguments. Update your function calls to use explicit keyword arguments.
  • Identity logging has been removed along with the disable_print_identity feature flag. Optionally remove the no longer existing flag.

These changes make authentication more intentional and secure. Please review the updated documentation for more information on supported authentication here: Getting Started - fabric-cicd.

New Item Support

  • Ontology item type is now supported. New to the Fabric ecosystem, you can now deploy Ontology items as part of your CI/CD pipelines.

New Functionality

  • Better deployment transparencydeploy_with_config now gives clearer feedback on success and failure scenarios, making it much easier to debug pipelines.
  • Unpublish operations now collect API responses: API response collection has been extended to unpublish operations, giving you better visibility across the full deployment lifecycle.
  • New get_changed_items() utility function: Detect Fabric items changed via git diff and use them for selective deployment — a great way to speed up your pipelines by only deploying what actually changed. Special thanks to u/vipulb91 for this awesome contribution!

Bug Fixes

Quite a few important fixes landed in this release:

  • Prevent unintended GUID replacements in Variable Library item files during publish.
  • Fix YAML content check to properly reject non-YAML files (like Notebooks) during key_value_replace parameterization, preventing file corruption.
  • Fix Notebook deployment failures caused by non-deterministic ordering of definition files in the API payload (Convert py to ipynb failed with unexpected error).
  • Parameter file is now ignored when not explicitly defined in the config file — no more surprise behavior.
  • New enable_hard_delete feature flag to bypass the workspace recycle bin during unpublish operations.
  • Addition of a timeout for long-running operation polling to prevent pipelines from hanging indefinitely.

Get Started

Check out the full release notes here: https://github.com/microsoft/fabric-cicd/releases/tag/v1.0.0

To upgrade:

pip install fabric-cicd==1.0.0

Relevant Links

reddit.com
u/fabshire25 — 24 hours ago

The dbt-fabricspark Lakehouse adapter now comes with a ridiculous amount of production grade test coverage

There was a bit of a reputation that the dbt-fabricspark adapter is....janky.

My team runs it in production since January 2026 and many, many developers use it locally as a critical lifeline of our Data Engineering backbone.

So to protect our bottoms, I put up this giant PR that regression tests every nook and cranny of the adapter's surface area - every PR to main must pass these tests against a live local devcontainer with Spark + Livy, and 2 Fabric Lakehouses before merging to main.

Hopefully this builds more community confidence in the robustness of the adapter. IMO it's pretty rock solid nowadays.

If you find any bugs, please let us know in the GitHub Issue 🙂

feat: Containerized and parallelized tests with 669% faster CI by mdrakiburrahman · Pull Request #87 · microsoft/dbt-fabricspark

u/raki_rahman — 19 hours ago

Stuck - semantic model how to parameter the Lakehouse connection?

Through deployment pipeline, promoting the Dev semantic model to UAT workspace but after deployment Data source setting is not swapping Uat warehouse (copied below screenshot). In the pipeline deployment Data Source rules also disabled. And i tried to parameritized the experession.tmdl with below code and it's not working. Finally ChatGPT said fabric won't support change expression.tmdl and finally said to use the deployment pipeline and automatically M code will swap from dev to uat warehouse connection, but it's not happening. How to promote the semantic model?

https://preview.redd.it/lzi70zsxclwg1.png?width=2322&format=png&auto=webp&s=256aa2d0c153674095f5dfa031bb0a3b8808045c

expression Environment = "DEV" meta [IsParameterQuery=true, IsParameterQueryRequired=true, Type=type text, List={"DEV","UAT","PROD"}, DefaultValue="DEV"]

lineageTag: ebbd659f-278f-45d5-a23d-53537661e1d1



annotation PBI\_ResultType = Text

expression 'DirectLake - fab_core_gld_dwh' =

	let

		Env = Environment,



		WorkspaceId = (

if Env = "DEV" then "yyyyyyy-16e7-yyyy-a5fd-yyyyyyy"

else if Env = "UAT" then "zzzzz-e834-4029-a3e4-cccccccc"

else if Env = "PROD" then "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"

else error "Invalid Environment"

		),



		LakehouseId = (

if Env = "DEV" then "xxxxxxxx-a9dc-44f1-b01b-xxxxxx"

else if Env = "UAT" then "yyyyyy-934f-tttttt-9b5b-zzzzzz"

else if Env = "PROD" then "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"

else error "Invalid Environment"

		),



		OneLakeURL =

"https://onelake.dfs.fabric.microsoft.com/"

& WorkspaceId & "/"

& LakehouseId,

		Source = AzureStorage.DataLake(

OneLakeURL,

[HierarchicalNavigation = true]

		)

	in

		Source

annotation PBI\_IncludeFutureArtifacts = False
reddit.com
u/efor007 — 1 hour ago

Lakehouse data staging

Hello Fabric community,

what do you recommend for best practice? I have my work items like notebooks, pipelines etc. in one workspace for each stage (dev, test, prod), so a total of 3 workspaces.

Additionally, I don't want my lakehouses to be in the same workspace as the other items, so i stored my lakehouses and its data in another workspace. So my question is: Do you recommend one workspace for all lakehouses (Bronze, Silver, Gold) and every stage (dev, test, prod) is aligned to these central lakehouse workspace or is it better to also store the data seperately in dev, test, prod?

reddit.com
u/paul_1907 — 9 hours ago

Designing the data infrastructure for my org - looking for feedback

I’m currently working as a data/analytics engineer at a small/mid sized manufacturing company where I’ve more or less been tasked with building out our data platform from scratch. Also I’m the first data hire so I literally have no one to turn to in the company. This /r has been my guidance for everything lol. I’m also someone who does not have a lot of experience in DE. So I’m learning everything and then implementing it.

We’re still pretty early in our data maturity, with a lot of siloed systems (CRM, ERP, Sharepoint lists, etc.), so the goal has been to create something scalable but not overly complex.

Right now, I’ve set things up in Microsoft Fabric using a medallion-style architecture:

- Bronze (central lakehouse): raw data ingested from various APIs with minimal transformation

- Silver (central processing layer): cleaned and standardized using a config-driven pipeline

- Gold (department-level warehouses): business-ready tables in separate workspaces for different teams (Sales, Ops, etc.)

On top of that, I’m using workspace isolation so each department has its own workspace for reporting and access control, while keeping bronze and silver in a central workspace.

A lot of this is still evolving (e.g., handling schema changes, thinking about incremental loads vs overwrites, optimizing compute usage in Fabric, etc.), and I’m trying to strike a balance between doing things “right” and not over-engineering too early.

Curious to hear from others who’ve built something similar:

- Does this architecture make sense at this stage?

- Anything you’d strongly recommend changing early before it becomes painful later?

- How would you approach scaling this (especially around governance, costs, and team autonomy)?

Appreciate any thoughts or critiques.

reddit.com
u/The_curious_one9790 — 22 hours ago

Analyst, Data Management

Hi everyone

We have a 6-month contract opening for an Analyst (Data Management) in my team. This role is open only to candidates currently based in Canada.

Looking for someone with 2+ years experience in data analytics/engineering, data quality/governance, and tools like SQL, Python, Apache Spark and Microsoft Fabric/Power BI.

If you or someone you know might be interested, DM me, I can share more details.

reddit.com
u/data-navigator — 1 day ago

Deep dive into OneLake Security in Microsoft Fabric

I did a 1-hour deep dive with u/taylorsamy to figure out what actually works today… and what doesn’t.

Quick summary:

  • SQL endpoint & Power BI -> works really well
  • Spark notebooks -> not so much, especially with CLS and RLS

We saw pretty inconsistent behavior with Spark when granular security is applied.

Curious if others have run into the same issues with Spark + CLS/RLS?

Watch the full deep dive here:
https://youtu.be/4a5X6NwJSZ0

u/aleks1ck — 1 day ago
🔥 Hot ▲ 90 r/MicrosoftFabric+1 crossposts

Power BI + AI: Are we moving beyond dashboards toward conversational analytics?

I’ve been experimenting with combining AI tools (specifically Claude) with Power BI semantic models, and it’s starting to feel like a meaningful shift in how analytics might be consumed.

Traditionally, the workflow looks something like this:

  • Build reports and dashboards
  • Add visuals to answer anticipated questions
  • Users click, filter, and explore

That model works well, but it’s always limited by what was pre-built into the report.

What’s interesting now is the ability to connect an LLM directly to a semantic model and let users ask questions in natural language. Instead of navigating visuals, they can just ask things like:

  • Why did June outperform March?
  • What are the main drivers behind this trend?
  • How do two entities compare and what should we do about it?

The responses aren’t just filtered views of existing visuals either. The model can:

  • Pull in dimensions not surfaced in the report
  • Provide summaries across multiple metrics
  • Offer explanations and even recommendations

It starts to feel less like “report consumption” and more like interacting with the data itself.

A few things that stood out to me while testing this:

  • The semantic model becomes way more important than the report layer
  • You can’t realistically pre-build all the insights users might want
  • AI fills the gap between high-level dashboards and deep analysis
  • There’s real potential for scaling analytics to less technical users

That said, there are still some open questions:

  • How do you manage trust and validation of AI-generated insights?
  • What does governance look like when users can query anything?
  • Does this reduce the need for complex report design, or just shift it?
  • How do organizations handle access and security at scale?

Curious how others are thinking about this.

Are you exploring AI + Power BI integrations yet?
Do you see this replacing parts of the dashboard experience, or just augmenting it?

youtu.be
u/PowerBIBro — 3 days ago

Workspace Item level permission restriction ?

For Powerbi Developers team, we want to give the only pubish permission to reporting workspace. For publish, required to grant contribution permission. with contribution permsiion they can create another items i.e data pipelines etc. We want to restirct the item creation, For ex: Only they can publish the reports and rest all other item should be restrict. is it possible ?

reddit.com
u/efor007 — 1 day ago

Passed the DP-700 Exam after implementing a lakehouse medallion architecture for an organization.

Feels good to have the certification for credibility.

Followed the learning path on Microsoft learn.

Practiced a lot of mock tests available online. (Certiace, skillcerpro etc as I wanted to crack it in first attempt)

Generated lot of mock tests using Claude.

reddit.com
u/dracarysmafu — 1 day ago
🔥 Hot ▲ 274 r/MicrosoftFabric+2 crossposts

How do I explain that SQL Server should not be used as a code repository?

This week my BI Developer colleague proudly showed me a new Power BI report that he'd vibe-coded. Here's how it works:

  1. Write a SQL query that selects the data needed for the report, concatinates it into one massive row, then format that row as a JavaScript array.
  2. Write your custom report as a html web-page, complete with styles and JS functions.
  3. Put the whole web page code file into one large string. Put the JS array containing your data from step 1 into your code string so that you now have a JS variable containing all of your raw data hardcoded into your html.
  4. You now have a large string of html + JS that contains your custom report complete with data! Sadly the string exceeds the length of VARCHAR(MAX), so you'll need to chop it up, and insert each chunk into a table. Now all you need to do is set the table as a data source in PBI, re-join the rows into one long string, and voilà! A custome Power BI visual in 4 simple steps!

I'm fairly new to the data engineering role (transitioned from software dev) but this is insane right? My colleage has very strong SQL skills but isn't really a programmer, so I'm guessing this is a case of 'when all you have is a hammer, everything looks like a nail'.

I don't even know how to begin trying to explain the problems with this approach to my colleague, or what to suggest as an alternative (maybe just make a custom visual using the dev tools provided by PBI?). I don't want to come off sounding condescending but I have to say something before this becomes our standard way of creating custom reports.

reddit.com
u/Firestone78 — 4 days ago

Variable library item reference - preview status

Hey there,

anyone knows anything on the development status of item references in variable library?

I learned my lesson regarding preview items the hard way in the past, but I can't help myself, I like this feature.

  • you can hover over the item references to see what workspace it is referencing. When using guids you have to look them up to see where they point to
  • you don't have to look up guids when setting up variables, simply search for the file in the file picker
  • previously I needed 2 variables for 1 reference (workspace guid & item guid). Now I can reference both properties through the same variable

All in all a great feature and imo better than static guids in every way

Now I am wondering

  • are there known issues?
  • when will there be support for item properties like an SQL endpoint guid?
  • regarding to the docs Pipelines are not support for these variable types, but they do in fact work. When will there be official support?
  • on a scale of 1 to 10, how dumb am I for considering to use a preview feature in production?
u/p-mndl — 1 day ago