r/pdf

Adobe, "Correct Recognized Text" is terrible and OCR needs work
▲ 3 r/pdf

Adobe, "Correct Recognized Text" is terrible and OCR needs work

Why can't you run OCR text through a spell check? Why can't I just let my computer do something instead of being prompted to fix every. single. misspelled. word.

What is swmnmoncd supposed to mean?

Acrobat is on its 40th version or something. It's like 30 years old. When will it be doing spell check for OCR text? How many dollars per month will that feature cost?! Am I really supposed to fix 3,000 questionable OCR words per document? I didn't scan this in, but it is in a reasonable 144 to 150 dpi. Not great. Not terrible.

Acrobat Pro 2026 64-bit

Here is the document (shown) I wanted to re-run through a better OCR and fix, but I can't easily do it. (I have many more).

https://nsarchive2.gwu.edu/NSAEBB/NSAEBB493/docs/intell_ebb_026.PDF

u/publiusvaleri_us — 6 hours ago
▲ 1 r/pdf

PDFe SCAM took £350 out of my account, unauthorised

I’ve sent multiple emails to PDFe to ask for a refund since I noticed they have taken £350 out of my account over the last few months, unauthorised. They aren’t replying by to my emails. Scam.

reddit.com
u/Loud-Instruction-150 — 5 hours ago
▲ 1 r/pdf

Working on a text to PDF tool. Do people actually want AI features in this space?

I have been working on a text to PDF tool(TextToPDFNet) mainly focused on clean output from raw text. The main idea was to fix messy formatting issues like broken lines uneven spacing and unstructured text before converting it into a proper PDF.

Now here’s the thing. I keep seeing AI features being added everywhere. Things like auto formatting structure detection and smart cleanup of extracted text.

But I am not really sure if people actually want that in PDF tools or if they just want something fast and accurate that works every time.

From your experience what matters more to you
Better text accuracy and formatting
Faster conversion
Or AI based features that try to improve the output automatically

reddit.com
u/Ok_Celebration8093 — 11 hours ago
▲ 9 r/pdf

What are the risks of using online PDF unlock services?

I’ve been trying to unlock my password-encrypted PDFs lately, and most of the solutions I find online are web-based tools where you upload your PDF file and download the unlocked version instantly.

But honestly, I’m a bit concerned about the risks involved. The PDFs contain confidential information, and I’m not sure how reliable these online services really are. Like:

  • Can they store or misuse my data?
  • What if the file gets leaked or accessed by someone else?
  • Are there any hidden malware or security threats?

Because of this, I started looking into offline solutions instead. I recently tried a desktop-based PDF unlocker, and it felt much safer since everything happens locally on my system without uploading files anywhere.

Still, I’m curious — what do you guys think?

Are online PDF unlock tools safe to use, or is it better to stick with offline software? Any experiences (good or bad)?

reddit.com
u/Expert_Weird6460 — 1 day ago
▲ 2 r/pdf

PDF tools for creating print ready board game manuals (Adobe keeps messing up).

Hi all, I have a pdf file of a board game manual (The Night Cage: Shrieking hollow), that was made with a ton of transparent elements and some layers that behave like masks, to achieve certain visual effects. I am making a translation of the rules, and every time I touch the page and section titles, the whole thing breaks. The transparent design elements turn into monochrome boxes. I tried to convert to InDesign, but the conversion fails.

Does anybody know about some tools that would be able to edit it without breaking, or possibly the tool that created this?

The Pdf rulebook can be found here.

reddit.com
u/ENDerke_ — 1 day ago
▲ 2 r/pdf

Fast & cheap OCR on 50M PDF pages to build PDF search engine

I need to OCR 50M PDF pages, they are in Dutch, French and German. Most are computer written text that was printed out and scanned in. Sometimes there's a stamp or a little hand writing, but it's not important to capture that information.

The aim would be to build a search engine on top of those PDFs. Not necessarily for AI, but just for humans to search PDFs based on the text in the PDFs.

I have a limited budget of less than 1k and would like to finish the job in under 4 days. I think most VLMs are probably too expensive to run at this scale with this budget?

Options I'm looking at: Tesseract, Paddle OCR, Surya OCR, Mindee DocTR, Rapid OCR, ...

So far I'm thinking of picking Rapid OCR with PP-OCRv5, but this seems optimized for Chinese so not sure if it will work well for my languages.

Some VLMs I'm looking at, but they will probably be too slow and expensive: LightOnOCR 2 1B, SmolVLM-256M, HunyuanOCR 1B, Docling Granite, ...

Do I run these models natively, or better to go with something like Docling, PyMuPDF4LLM, Marker, ... Or do these add a lot of overhead?

Any recommendations on how to run this in parallel?

Am I missing anything? Tips on how to build the search engine afterward?

reddit.com
u/vroemboem — 1 day ago
▲ 1 r/pdf

If invoice data was auto-extracted and you could just review/edit + export to Excel — would that actually help?”

I’ve been trying different ways to extract data from invoices and PDFs, but most tools either:

  • break when formats change
  • struggle with multi-page tables
  • or need a lot of manual correction

Curious how others are handling this in real workflows.

Are you still doing manual entry, or using some tools that actually work reliably?

reddit.com
u/Impressive-Rise7510 — 1 day ago
▲ 2 r/pdf+1 crossposts

Is there a single button I could press to convert into a .PDF an email which contains another email and a .PDF ?

u/Primary-Ad-5843 — 3 days ago
▲ 3 r/pdf+1 crossposts

Why you can’t just "click and type" on a PDF (The history of the "Digital Paper" dream) 📜

u/AtomlitLabs — 2 days ago
▲ 2 r/pdf+1 crossposts

From Frustration to building ihatepdf.cv to 30K+ users

u/SaaSForge — 3 days ago
▲ 2 r/pdf

Recommendations for a good desktop editor which is Mac (ARM) compatible, and without a subscription

u/alehel — 4 days ago
▲ 2 r/pdf+1 crossposts

How to make a continuous seamless scrollable pdf?

u/the-nc7 — 5 days ago