u/samalba42

Gave an LLM an SQL interface to our CI logs, and sharing what we learned

Gave an LLM an SQL interface to our CI logs, and sharing what we learned

Disclosure up front: I'm a co-founder at Mendral (YC W26). We build an agent that debugs CI failures. Not a pitch, sharing what we learned.

We run around 1.5B CI log lines and 700K jobs per week through ClickHouse for our agent to query. It writes its own SQL, no predefined tool API. The LLM-on-logs angle is covered to death. The CI-specific parts are what I haven't seen discussed much.

1) GitHub's rate limit is hard to deal with.

15K requests per hour per App installation. Continuously polling workflow runs, jobs, steps, and logs across dozens of active repos, while the agent itself also needs to hit the API to pull PR diffs, post comments, and open PRs. A single big commit can spawn hundreds of parallel jobs, each producing logs you need to fetch.

Early on we'd burst, hit the ceiling, fall 30+ minutes behind, and the agent would be reasoning about stale data. Useless if an engineer is staring at a red build right now.

Cap ingestion at ~3 req/s steady and use durable execution (we're on Inngest) so when we hit the limit we read X-RateLimit-Reset, add 10% jitter, and suspend the workflow with full state checkpointed. When the window resets, execution picks up at the exact API call it left off on, so there's no retry logic, no dedup, no idempotency work. The rate limit becomes a pause button. P95 ingestion delay is under 5 minutes, usually seconds.

2) Raw SQL beat a constrained tool.

We started with the usual get_failure_rate(workflow, days), get_logs(job_id), etc. It capped the agent so we switched to raw SQL against a documented schema unlocked investigations we never scripted. Recent models write good ClickHouse SQL because there's a huge amount of it in training data. Median investigation across 52K queries is 4 queries, 335K rows scanned, ~110ms per raw-log query.

3) Clickhouse for storage.

Every log line in our table carries 48 columns of run-level metadata: commit SHA, author, branch, PR title, workflow name, job name, runner info, timestamps. In a row store this is insane. In ClickHouse with ZSTD, commit_message compresses 301:1 because every log line in a run shares the same value. The whole table lands at ~21 bytes per log line on disk including all 48 columns. The real win isn't the disk savings, it's that the agent can filter by any column without a join. When it asks "show me failures on this runner label, in the last 14 days, where the PR author is X," there's no join needed.

Questions:

  • Anyone running an ingestion layer against GitHub Actions (or Buildkite, CircleCI) that has to share API budget with other consumers? How are you splitting it? We ended up keeping ~4K req/hour headroom for the agent and tuning ingestion under 3 req/s. Trial and error.
  • Anyone using columnar stores (ClickHouse, DuckDB, Druid) for CI observability specifically, vs general log platforms (Loki, Elastic)? Tradeoffs?

We made a longer writeup in case it's useful: https://www.mendral.com/blog/llms-are-good-at-sql

u/samalba42 — 4 days ago