Hi everyone,
I’m looking for a self-hosted solution for managing and querying a collection of large RPG rulebook PDFs, mainly D&D / Pathfinder books i've got.
My ideal setup at first would have been something similar to:
- Paperless-ngx for document/library management
- Paperless-AI for asking questions over the documents
But the problem is that Paperless-ngx was not a great fit for these PDFs. Those pdfs are GIGANTIC (400+MB per file due to images).
Paperless tends to run out of memory during the opening phase (ingestion was fine, slow, but fine). I do not really need OCR, because the PDFs already have selectable text and i would i avoid re-doing it (i tried optimizing them, but i would lose a lot of OCR related to images)
So i tought that maybe something more like a structured like PDF/manual library was a better idea.
Ideally i'm trying to find something with:
- Web-based PDF viewing
- Categories/collections/tags
- Search inside documents
- AI/RAG chat over the content
- Citations with document name and page number
- Ideally links from the AI answer back to the PDF page
- Local storage, preferably plain files or an easy-to-back-up data directory
- Local LLM/embedding support if possible
I have looked at tools like Komga, Kavita, AnythingLLM, Open WebUI, RAGFlow, and some PDF reader + AI projects such as NimbusPDF.
Komga/Kavita look good for reading and organizing PDFs, while RAGFlow/Open WebUI/AnythingLLM look better for AI, but I have not found something that integrates both sides cleanly. (i tought i could do it myself, but maybe you powerful guys have already something in mind)
The closest thing I imagine would be:
PDF library / reader + text extraction from selectable PDFs + chunking per page/chapter
+ vector/keyword index + AI chat with page citations + click citation -> open PDF at that page
I am fine with Docker, LXCs or anything. I would prefer not to rely on cloud APIs. I wouldn't be too much worried about out of memory in this case as i would migrate the project on a 256gb RAM server i got laying around with a GPU with up to 12gb for now (in future i think i may be able to upgrade it).
Any recommendations or existing projects?
Thanks in advance