u/Azokul

Hi everyone,

I’m looking for a self-hosted solution for managing and querying a collection of large RPG rulebook PDFs, mainly D&D / Pathfinder books i've got.

My ideal setup at first would have been something similar to:

  • Paperless-ngx for document/library management
  • Paperless-AI for asking questions over the documents

But the problem is that Paperless-ngx was not a great fit for these PDFs. Those pdfs are GIGANTIC (400+MB per file due to images).
Paperless tends to run out of memory during the opening phase (ingestion was fine, slow, but fine). I do not really need OCR, because the PDFs already have selectable text and i would i avoid re-doing it (i tried optimizing them, but i would lose a lot of OCR related to images)

So i tought that maybe something more like a structured like PDF/manual library was a better idea.
Ideally i'm trying to find something with:

  • Web-based PDF viewing
  • Categories/collections/tags
  • Search inside documents
  • AI/RAG chat over the content
  • Citations with document name and page number
  • Ideally links from the AI answer back to the PDF page
  • Local storage, preferably plain files or an easy-to-back-up data directory
  • Local LLM/embedding support if possible

I have looked at tools like Komga, Kavita, AnythingLLM, Open WebUI, RAGFlow, and some PDF reader + AI projects such as NimbusPDF.

Komga/Kavita look good for reading and organizing PDFs, while RAGFlow/Open WebUI/AnythingLLM look better for AI, but I have not found something that integrates both sides cleanly. (i tought i could do it myself, but maybe you powerful guys have already something in mind)

The closest thing I imagine would be:

PDF library / reader + text extraction from selectable PDFs + chunking per page/chapter

+ vector/keyword index + AI chat with page citations + click citation -> open PDF at that page

I am fine with Docker, LXCs or anything. I would prefer not to rely on cloud APIs. I wouldn't be too much worried about out of memory in this case as i would migrate the project on a 256gb RAM server i got laying around with a GPU with up to 12gb for now (in future i think i may be able to upgrade it).

Any recommendations or existing projects?

Thanks in advance

reddit.com
u/Azokul — 17 days ago