u/IntroductionFlimsy56

Realtime CDC streams for DuckLake as a DuckDB extension

Realtime CDC streams for DuckLake as a DuckDB extension

Built a DuckDB community extension for realtime CDC streams from DuckLake.

Repo:
https://github.com/ekkuleivonen/ducklake-cdc-extension

It turns DuckLake snapshot changes into durable row-level change streams that can be consumed from SQL, or Python.

Current features:

  • row-level DML CDC
  • DDL/schema change subscriptions
  • durable consumers with checkpointing
  • per-consumer filtering
  • replay from snapshots

Example:

SELECT *
FROM cdc_dml_changes_read('lake', 'orders_sink');

Use cases I built this for:

  • realtime sinks
  • cache invalidation
  • event-driven lakehouse automation
  • search indexing
  • lightweight streaming pipelines without external CDC infra

One design goal was to avoid requiring a separate Python daemon or Kafka setup for simple CDC workflows around DuckLake.

Still early and evolving, but would love feedback from people building on DuckDB/DuckLake.

u/IntroductionFlimsy56 — 2 days ago
▲ 26 r/DuckDB

Built a DuckDB community extension for realtime CDC streams from DuckLake.

Repo:
https://github.com/ekkuleivonen/ducklake-cdc-extension

It turns DuckLake snapshot changes into durable row-level change streams that can be consumed from SQL, or Python.

Current features:

  • row-level DML CDC
  • DDL/schema change subscriptions
  • durable consumers with checkpointing
  • per-consumer filtering
  • replay from snapshots

Example:

SELECT *
FROM cdc_dml_changes_read('lake', 'orders_sink');

Use cases I built this for:

  • realtime sinks
  • cache invalidation
  • event-driven lakehouse automation
  • search indexing
  • lightweight streaming pipelines without external CDC infra

One design goal was to avoid requiring a separate Python daemon or Kafka setup for simple CDC workflows around DuckLake.

Still early and evolving, but would love feedback from people building on DuckDB/DuckLake.

u/IntroductionFlimsy56 — 7 days ago