u/Blues003

Hey everyone! I've already posted on the matter some months before on the matter but after some research and even new products on the market, I feel like I'd love a second opinion. I'm about to pull the trigger on a local AI machine and would love some input from people who actually run this stuff daily, since I am unsure what hardware to pick. I have seen mixed reviews on reddit about what these things can and cannot do.

The task: I'm building a pipeline to automatically extract and classify Portuguese electronic health records (EHRs) to ICD-10/11 codes. Think clinical notes, discharge summaries, that kind of thing — at most 20 pages per document, so context length shouldn't (?) be a huge concern. Also, notes are in eletronic format - no need to recognize handwriting. There'll also be no need to recognize pictures and cross-reference data of any sort - just extract what's explicitly stated in text.

This is exploratory at this stage — I'm not shipping a finished product tomorrow — but I do want the hardware to be production-capable down the line. Ideally, in the end I'd like to have either a product or a tool to speed up my ICD-10/11 coding activities.

The pipelines I'm considering

**NER + RAG + LLM**: A fine-tuned BERTimbau or similar does named entity recognition on the clinical text, a retrieval layer narrows down candidate ICD codes from the full code tree, and a 27B-class LLM (MedGemma-27B or similar) does the final reasoning and classification. This seems like the most robust approach.
**End-to-end LLM**: Feed the full record directly to a capable 27B+ model with a well-engineered prompt and get structured output. Simpler pipeline, more dependent on model quality, probably needs a bigger LLM and much less deterministic.
**Fine-tuned encoder classifier**: Train a classification head on top of a BERT-style model for direct ICD prediction. Lightweight but needs labelled data and struggles with the 70k+ code label space.

Importantly, accuracy matters far more than speed for this use case. Wrong ICD codes have real clinical and billing consequences. This means that, while token speed should be usable, it doesn't have to be blazing fast.

The reason I'm going local is real EHRs must stay local — full stop, non-negotiable, GDPR. However, I'm completely open to generating synthetic Portuguese clinical text to train or fine-tune models on the cloud. If I can build a solid synthetic dataset, cloud fine-tuning is fair game.

So, for this build, I am considering either a 64GB Custom 5090 desktop build (for around ~€7K), a Strix Halo mini PC, or a DGX Spark. There will be *no* second GPU on this machine, for budget reasons - not now, and likely not ever.

A couple of extra details:

I also want to eventually explore ultrasound and fluoroscopy image segmentation, so multimodal capability is a nice-to-have.
The machine will also be used for some gaming, though that's not a priority — it's more of a bonus than a requirement.

My current lean: The 5090 build feels right for the 27-31B model tier where production accuracy is achievable, and the speed advantage matters for a product that clinicians would actually use. The Strix Halo and DGX Spark are interesting if I end up needing 70B+ models, but I'm not convinced I do for this task. They also seem more limited as machines, overall.

But I'd genuinely love to hear from anyone who's run medical NLP pipelines locally, or who has experience with Strix Halo or DGX Spark in production-ish workloads. Am I missing something? Is there a strong argument for the unified memory approach that I'm not weighing correctly? Is the 5090 capable enough for this sort of task? Or am I about to spend 7K that I'll regret sooner rather than later?

Thanks in advance!

5090 desktop build for a medical NLP project?