
r/DuckDB

DuckDB 1.5 introduced the new VARIANT type, so we added support for it in Valentina Studio 17.3.
Current support includes:
- Schema Editor integration
- Visual inspection/editing of nested objects & arrays
- special editors for images, blobs, UUIDs, etc.
- AI-assisted SQL Editor
- Direct editing of VARIANT values in Data Editor
Available on macOS, Windows and Linux.
Curious how DuckDB users currently work with semi-structured data and whether visual tooling around VARIANT is useful in practice.
More details:
https://valentina-db.com/dokuwiki/doku.php?id=valentina:articles:vstudio_v173_duck_variant
🎬 Watch: https://youtu.be/PU3-mqbnfuM
Rosetta DBT Studio is an open-source desktop workspace for modern data teams. This presentation walks through the full DuckLake workflow — from creating and importing lakehouse instances to exploring metadata, running SQL queries, and building reusable SQL Notebooks — all inside one focused interface.
🔍 Topics covered in this video: • Connecting databases and cloud storage (S3, Azure Blob, Google Cloud Storage) • Creating and importing DuckLake instances with the guided setup wizard • Browsing tables, views, schemas, statistics, Parquet files, partitions, and snapshot history • DuckLake operations: schema evolution, row updates, deletes, and data imports • SQL Editor with Monaco autocomplete, query execution, result p
Built a DuckDB community extension for realtime CDC streams from DuckLake.
Repo:
https://github.com/ekkuleivonen/ducklake-cdc-extension
It turns DuckLake snapshot changes into durable row-level change streams that can be consumed from SQL, or Python.
Current features:
- row-level DML CDC
- DDL/schema change subscriptions
- durable consumers with checkpointing
- per-consumer filtering
- replay from snapshots
Example:
SELECT *
FROM cdc_dml_changes_read('lake', 'orders_sink');
Use cases I built this for:
- realtime sinks
- cache invalidation
- event-driven lakehouse automation
- search indexing
- lightweight streaming pipelines without external CDC infra
One design goal was to avoid requiring a separate Python daemon or Kafka setup for simple CDC workflows around DuckLake.
Still early and evolving, but would love feedback from people building on DuckDB/DuckLake.
Hi everyone!
I'm working on a no-code agent builder that builds conversational analytics agents that respond with interactive charts and UI. I've implemented dedicated support for DuckDB to make it easier to hook up databases for agents to query.
Here's an example agent I've built using the project: https://console.thesys.dev/app/-2PqdNdGjSQb6WrdYI9pR
Feedback would be highly appreciated!
While building a local data warehouse on GitHub Archive data, we ran into a slightly frustrating DuckDB behavior: columns that happen to contain only NULL values in the first file read get inferred as the generic JSON type. When real values show up in later files, staging breaks.
Our fix: a synthetically generated canonical sample — a single JSON file with at least one non-null value for every column. It gets passed to read_json_auto alongside every real archive file, ensuring stable type inference from the start. A WHERE clause filters it out of the actual results.
The project is a fully local data warehouse on GitHub Archive — ~700-attribute variable JSON events, schema discovery in Python that auto-generates dbt models, and a star schema with mart tables that Rill can explore interactively at sub-second speed across 25M+ events.
The canonical sample approach feels like something others must have hit too — curious how you've handled this, or whether there's a cleaner DuckDB-native solution we missed.
Four months ago I got tired of every "just look at this CSV" task turning into a 45-minute detour. So I built DuckViz, and it kept growing into something I didn't quite plan.
Here's what happens when you drop a CSV (or Parquet, or JSON) into app.duckviz.com:
Second 1 — it's a dashboard. DuckDB compiled to WebAssembly is doing the querying right there in your browser tab. Your file never uploads. Open devtools, check the network panel, you'll see nothing leave. Charts, filters, drilldowns, all of it — instant.
Second 30 — you ask it a question in plain English. "Show me revenue by region for customers older than 6 months." It generates the chart. The AI doesn't see your data either — it only sees your schema and writes the SQL.
Minute 2 — it's a report. Switch to Report mode and you get a rich-text doc you can write narrative in, with live charts embedded inline. Like Notion, but the charts actually query your data.
Minute 5 — it's a slide deck. Same data, presentation layout. Drop charts on slides, present from the browser. Your quarterly review just wrote itself.
Minute 6 — it looks like your brand. 10 built-in themes plus a full custom theme builder. Pick a vibe or paste your brand colors.
Minute 10 — it's inside your own product. the same thing renders inside your SaaS. Your customers' data never touches your servers either. That's the part that closes deals with security teams.
It's free to try, no signup for the demo, no email gate. The full thing — saved projects, sharing, AI — is behind a free account.
Checkout the sample dashboards created using DuckViz for mock API's