
I made a small thing for carrying metadata with embedding vectors. Keen for a sanity check.
Hey all,
I’ve been mucking around with vector databases a bit lately and ran into something I wasn’t really expecting.
I had some vectors in one place, then started playing with moving them somewhere else, and quickly realised the vectors themselves weren’t really the annoying part. It was all the stuff around them.
Where did this vector come from?
What document was it from?
Which chunk was it?
What chunking approach did I use?
Which embedding model created it?
Has the source changed since then?
If I move this to another vector DB, does any of that useful context come with it?
Maybe this is obvious to people who spend more time in this space, but it felt a bit messy to me.
So I built a small project called Vector Passport.
The idea is pretty simple. Each vector gets a small JSON “passport” that carries the useful metadata with it. Source details, hashes, chunk info, embedding model details, timestamps, basic staleness checks, and optional signing.
Nothing too clever. Just a way of making the vector a bit less anonymous.
I’m not trying to build another vector database or another RAG framework. There are plenty of those already, and I’m nowhere near silly enough to add another one to the pile.
This is more of a small attempt at making vectors easier to move around, audit, rebuild, or reason about later.
At the moment it’s very early. There’s a schema, a Python helper/CLI, and a few examples across different vector stores.
I’m mainly posting it here to get a sanity check.
A few things I’d love feedback on:
- Is this actually a real problem for people, or have I just found a way to create paperwork for vectors?
- Is this the right kind of metadata to capture?
- Is this better as a lightweight spec/convention rather than a tool?
- Are there existing projects or standards I should be looking at?
- What would make this more useful in real ingestion pipelines?
I’m an infrastructure person, not an ML expert, so I’m very open to being corrected here.
Repo: https://github.com/saworbit/vector_passport
Keen for any thoughts, even if the thought is “nice idea, but no”.