r/DuckDB | reddlx

Rosetta DBT Studio is an open-source desktop workspace for modern data teams. This presentation walks through the full DuckLake workflow — from creating and importing lakehouse instances to exploring metadata, running SQL queries, and building reusable SQL Notebooks — all inside one focused interface.

🔍 Topics covered in this video: • Connecting databases and cloud storage (S3, Azure Blob, Google Cloud Storage) • Creating and importing DuckLake instances with the guided setup wizard • Browsing tables, views, schemas, statistics, Parquet files, partitions, and snapshot history • DuckLake operations: schema evolution, row updates, deletes, and data imports • SQL Editor with Monaco autocomplete, query execution, result p

u/Wide_Importance_8559 — 8 days ago

▲ 26 r/DuckDB

Built a DuckDB community extension for realtime CDC streams from DuckLake.

Repo:
https://github.com/ekkuleivonen/ducklake-cdc-extension

It turns DuckLake snapshot changes into durable row-level change streams that can be consumed from SQL, or Python.

Current features:

row-level DML CDC
DDL/schema change subscriptions
durable consumers with checkpointing
per-consumer filtering
replay from snapshots

Example:

SELECT *
FROM cdc_dml_changes_read('lake', 'orders_sink');

Use cases I built this for:

realtime sinks
cache invalidation
event-driven lakehouse automation
search indexing
lightweight streaming pipelines without external CDC infra

One design goal was to avoid requiring a separate Python daemon or Kafka setup for simple CDC workflows around DuckLake.

Still early and evolving, but would love feedback from people building on DuckDB/DuckLake.

u/IntroductionFlimsy56 — 7 days ago

▲ 17 r/DuckDB

Hi everyone!

I'm working on a no-code agent builder that builds conversational analytics agents that respond with interactive charts and UI. I've implemented dedicated support for DuckDB to make it easier to hook up databases for agents to query.

Here's an example agent I've built using the project: https://console.thesys.dev/app/-2PqdNdGjSQb6WrdYI9pR

Feedback would be highly appreciated!

reddit.com

u/AviusAnima — 10 days ago

▲ 4 r/DuckDB

While building a local data warehouse on GitHub Archive data, we ran into a slightly frustrating DuckDB behavior: columns that happen to contain only NULL values in the first file read get inferred as the generic JSON type. When real values show up in later files, staging breaks.

Our fix: a synthetically generated canonical sample — a single JSON file with at least one non-null value for every column. It gets passed to read_json_auto alongside every real archive file, ensuring stable type inference from the start. A WHERE clause filters it out of the actual results.

The project is a fully local data warehouse on GitHub Archive — ~700-attribute variable JSON events, schema discovery in Python that auto-generates dbt models, and a star schema with mart tables that Rill can explore interactively at sub-second speed across 25M+ events.

The canonical sample approach feels like something others must have hit too — curious how you've handled this, or whether there's a cleaner DuckDB-native solution we missed.

Repo: idesis-gmbh/githubexperiments

reddit.com

u/goerch — 13 days ago

▲ 14 r/DuckDB+1 crossposts

Four months ago I got tired of every "just look at this CSV" task turning into a 45-minute detour. So I built DuckViz, and it kept growing into something I didn't quite plan.

Here's what happens when you drop a CSV (or Parquet, or JSON) into app.duckviz.com:

Second 1 — it's a dashboard. DuckDB compiled to WebAssembly is doing the querying right there in your browser tab. Your file never uploads. Open devtools, check the network panel, you'll see nothing leave. Charts, filters, drilldowns, all of it — instant.

Second 30 — you ask it a question in plain English. "Show me revenue by region for customers older than 6 months." It generates the chart. The AI doesn't see your data either — it only sees your schema and writes the SQL.

Minute 2 — it's a report. Switch to Report mode and you get a rich-text doc you can write narrative in, with live charts embedded inline. Like Notion, but the charts actually query your data.

Minute 5 — it's a slide deck. Same data, presentation layout. Drop charts on slides, present from the browser. Your quarterly review just wrote itself.

Minute 6 — it looks like your brand. 10 built-in themes plus a full custom theme builder. Pick a vibe or paste your brand colors.

Minute 10 — it's inside your own product. the same thing renders inside your SaaS. Your customers' data never touches your servers either. That's the part that closes deals with security teams.

It's free to try, no signup for the demo, no email gate. The full thing — saved projects, sharing, AI — is behind a free account.

duckviz.com

Checkout the sample dashboards created using DuckViz for mock API's

https://duckviz-examples.vercel.app/

reddit.com

u/viksdev — 10 days ago