I built a GST-aware invoice parsing API — GSTIN validation, CGST/SGST split, semantic search across invoices. Free tier available.
If you're building anything in India that touches vendor payments, procurement, or expense management, you've probably had to parse GST invoices at some point. The problem isn't extraction — it's that generic parsers return a flat tax_amount and call it done. They don't know what CGST is. They don't validate GSTIN format. They don't check whether CGST/SGST and IGST are being applied simultaneously on the same line (which is invalid under GST rules).
I built this to handle all of that natively, so you get validated, structured JSON out of the box — not raw text you have to clean up yourself.
What it does:
You POST any invoice PDF, PNG, JPG, or DOCX, and get back fully structured JSON — including:
- GSTIN validation (15-char format check)
- CGST / SGST / IGST split with mutual exclusivity check
- HSN / SAC codes per line item
- Grand total arithmetic validation
- Confidence score + extraction warnings for anything that looks off
There's also a /ask endpoint — you can query your uploaded invoices in plain English. Things like "total GST paid to this vendor in October" or "which invoices are overdue by more than 30 days" return structured JSON answers, not just a text blob.
Stack (since this crowd will ask): FastAPI + Celery + Redis + Supabase (Postgres + pgvector) + Railway. LLM extraction via GPT-4o-mini for both text and scanned vision. Embeddings via text-embedding-3-small stored in pgvector.
Quick start:
bash
# Register (free, no card)
curl -X POST https://invoice-intelligence-api.up.railway.app/v1/register \
-H "Content-Type: application/json" \
-d '{"email": "you@example.com", "name": "Your Name"}'
# Upload an invoice
curl -X POST https://invoice-intelligence-api.up.railway.app/v1/invoices \
-H "X-API-Key: sk_..." \
-F "file=@invoice.pdf"
# Returns immediately with job_id — poll for result
curl https://invoice-intelligence-api.up.railway.app/v1/invoices/{job_id} \
-H "X-API-Key: sk_..."
There's also a /v1/invoices/sync endpoint if you want a single blocking call with the full result — no polling needed.
Free tier is 50 invoices/month forever, no card required.
Docs: https://invoice-intelligence-api.up.railway.app/docs (Swagger)
I'd genuinely love feedback — especially if you're currently doing invoice parsing in a project and have edge cases I should know about.