
We rewrote our VIN decoder from SQLite to binary indexes - 100x faster, and our neural net reverse-engineered the VIN spec
We built Corgi, an open-source offline VIN decoder. v2 used SQLite which worked fine until we needed to batch decode 1000 VINs and hit 4000 sequential queries.
v3 uses MessagePack-encoded binary indexes with O(log n) lookup:
Cold start: 200ms -> 23ms
Single decode: 30ms -> 0.3ms
Batch 1000: 4 seconds -> 300ms
npm package: 21MB -> 6.5MB (gzip)
The architecture was inspired by @wonderooo's corgi-rs which uses finite-state transducers.
While validating accuracy, we also trained a small transformer (6.6M params) on 50k VIN-vehicle pairs.
It learned the ISO 3779 encoding scheme from data alone - figured out that position 10 encodes model year, that VINs starting with 5YJ are Teslas, etc.
The embeddings cluster vehicles by body type with 0.99 cosine similarity between similar vehicles. All from a 10 digit string.
Blog post with details: https://cardog.app/blog/corgi-v3-binary-indexes



