u/cardogio

We made our VIN decoder 100x faster. Again
🔥 Hot ▲ 93 r/selfhosted

We made our VIN decoder 100x faster. Again

Follow-up to our previous post.

First, the v3 rewrite: SQLite was killing us on batch operations - 1000 VINs meant 4000 queries. We switched to binary indexes and now it's:
- Cold start: 200ms -> 23ms
- Single decode: 30ms -> 0.3ms
- Batch 1000: 4 seconds -> 300ms

Still fully offline, still no API keys.

On the EU data feedback: this is the real problem we've been digging into. Vehicle data is a mess globally, but especially across regions:

-US sources use 37k+ boolean feature keys with values embedded in key names ("12.3\" display": true)
- Canadian sources use nested category structures - better, but incompatible
- EU sources have great mechanical specs but almost no feature data

Same car, three regions, three completely different data contracts. And trim names are chaos:
- a US "Premium Plus" is a Canadian "Progressiv" is a German "45 TFSI quattro S tronic".

We're working on a schema standard (VIS) to normalize this. The goal: decode a VIN anywhere, get the same structured output regardless of source. Will share more when it's ready. As always - fully open source - code here: https://github.com/cardog-ai/corgi/

cardog.app
u/cardogio — 21 hours ago
We rewrote our VIN decoder from SQLite to binary indexes - 100x faster, and our neural net reverse-engineered the VIN spec

We rewrote our VIN decoder from SQLite to binary indexes - 100x faster, and our neural net reverse-engineered the VIN spec

We built Corgi, an open-source offline VIN decoder. v2 used SQLite which worked fine until we needed to batch decode 1000 VINs and hit 4000 sequential queries.

v3 uses MessagePack-encoded binary indexes with O(log n) lookup:

Cold start: 200ms -> 23ms
Single decode: 30ms -> 0.3ms
Batch 1000: 4 seconds -> 300ms
npm package: 21MB -> 6.5MB (gzip)

The architecture was inspired by @wonderooo's corgi-rs which uses finite-state transducers.

While validating accuracy, we also trained a small transformer (6.6M params) on 50k VIN-vehicle pairs.

It learned the ISO 3779 encoding scheme from data alone - figured out that position 10 encodes model year, that VINs starting with 5YJ are Teslas, etc.

The embeddings cluster vehicles by body type with 0.99 cosine similarity between similar vehicles. All from a 10 digit string.

Blog post with details: https://cardog.app/blog/corgi-v3-binary-indexes

u/cardogio — 22 hours ago