r/LocalAIServers

If I'd ever win lottery, no one would know. But there will be signs!!

If I'd ever win lottery, no one would know. But there will be signs!!

Who else is thirsty for beefy server GPU to test AI models locally?

u/Medical_Ask_6169 — 1 day ago

Need advice for a $10,000 AI workstation build (video, image, voice, LLMs, training, everything)

Need advice for a $10,000 AI workstation build (video, image, voice, LLMs, training, everything)

I’m planning to go very deep into the AI space and I want to build a serious workstation with around a $10,000 budget.

Main use cases:

- Local LLMs
- AI image generation
- AI video generation
- Voice cloning / speech models
- Fine-tuning and training
- Running multiple AI tools simultaneously
- Heavy VRAM workloads
- Stable Diffusion / Flux / ComfyUI
- Open-source models
- Maybe some game dev / rendering too

I want something that will still be powerful and relevant for the next few years instead of becoming obsolete immediately.

What hardware configuration would you recommend today for this budget?

Questions I’m specifically confused about:

  1. CPU:
    Should I go Intel or AMD for AI workloads?
    Is Intel actually better for compatibility/stability or is AMD better now?

  2. GPU:
    I know NVIDIA is basically mandatory for CUDA, but which setup makes the most sense?

- Single RTX 5090?
- Dual 4090s?
- Multiple GPUs?
- Used enterprise GPUs?
- Wait for newer cards?

  1. Motherboard:
    Does Intel CPU + NVIDIA GPU + Intel motherboard work “best together” in terms of compatibility/stability?

Or does motherboard brand/platform not really matter much as long as PCIe lanes, RAM support, and power delivery are good?

  1. RAM:
    How much RAM is realistically needed now?
    128GB?
    256GB?

  2. Storage:
    What’s the smartest storage setup for AI workloads?
    Separate NVMe drives for models/cache/projects?

  3. Cooling + PSU:
    How crazy do cooling and PSU requirements get once you start doing heavy AI workloads 24/7?

  4. Linux vs Windows:
    Do most serious AI people just use Linux at this point?
    Is Windows still okay for heavy AI work?

I’d really appreciate recommendations from people actually doing AI locally instead of generic gaming-PC advice.

If you were building the best possible AI workstation around $10k today, what exact parts would you choose and why?

reddit.com
u/Mission_Objective163 — 6 hours ago
▲ 61 r/LocalAIServers+16 crossposts

Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

Shipped this for the AMD x lablab hackathon. Attached video is one of the actual reels the pipeline produced - one English sentence in, finished mp4 with characters, story, music, and voice-over out. ~45 minutes end-to-end on a single AMD Instinct MI300X. Every model is Apache 2.0 or MIT.

Pipeline (8 stages, all sequential on the same GPU):

  1. Director Agent - Qwen3.5-35B-A3B (vLLM + AITER MoE) plans 6 shots from one sentence, returns structured JSON with character bibles, shot prompts, music brief, per-shot voice-over script, narration language
  2. Character masters - FLUX.2 [klein] paints one canonical portrait per character. No LoRA training step - reference editing pins identity across shots by construction
  3. Per-shot keyframes - FLUX.2 again with reference image. Sub-second per keyframe after warmup
  4. Animation - Wan2.2-I2V-A14B, 81 frames @ 16 fps native. FLF2V for cut:false continuation arcs (last frame of shot N anchors first frame of shot N+1)
  5. Vision critic - same Qwen3.5-35B reloaded with 10 structured failure labels (character drift, extras invade frame, camera ignored, walking backwards, object morphing, hand/finger artifact, wardrobe drift, neon glow leak, stylized AI look, random intimacy). Bad clips re-render with targeted retry strategies (different seed, FLF2V anchor, prompt simplification)
  6. Music - ACE-Step v1 generates a 30s instrumental from Director's brief
  7. Narration - Kokoro-82M, 9 languages. Director picks language to match setting (Tokyo→Japanese, Paris→French, Mumbai→Hindi)
  8. Mix - ffmpeg with per-shot vo aligned via adelay

Wan 2.2 specifics (the bit this sub will care about):

  • 1280×720, not 640×640 default. Costs more but matches what producers want
  • 121 frames at 24 fps was my first attempt - gave temporal rippling. Switched to 81 @ 16 fps native (the distribution Wan was trained on) and it cleaned up
  • flow_shift = 5 for hero shots, 8 for b-roll (upstream wan_i2v_A14B.py defaults)
  • Negative prompt: verbatim Chinese trained negative from shared_config.py. umT5 was multilingual-pretrained against those exact tokens. English translation is observably weaker
  • Camera language: ONE camera verb per shot, sentence-case, placed first ("Tracking shot following from behind"). Multiple verbs in one prompt cancel each other out
  • Avoid the word "cinematic" - triggers Wan's stylization branch, gives the AI look. Use lens/film tags instead ("Arri Alexa, anamorphic, 35mm film grain")

Performance work:

  • ParaAttention FBCache (lossless 2× on Wan2.2)
  • torch.compile on transformer_2 (selective, the dual-expert MoE makes full compile flaky) - another 1.2×
  • AITER MoE acceleration on Qwen director (vLLM)
  • End-to-end: 25.9 min → 10.4 min per 720p clip on MI300X

Why a single MI300X: 192 GB HBM3 lets a 35B MoE, 4B diffusion, 14B I2V MoE, 3.5B music, and a TTS share the same card sequentially. Same stack on a 24 GB consumer GPU would need 4-5 boxes wired together.

Code (public, Apache 2.0): https://github.com/bladedevoff/studiomi300

Hugging Face (documentation, like this space 🙏) https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/studiomi300

Live demo on HF Space is temporarily offline while infra restores - should be back within hours. In the meantime the showcase reels in the repo are real pipeline outputs, no human re-edited shots.

Happy to dig into AITER MoE setup, FBCache tuning, FLF2V anchoring, or the vision critic's failure taxonomy in comments.

u/Inevitable-Log5414 — 1 day ago
▲ 2 r/LocalAIServers+1 crossposts

How much storage do you need to hoard models locally?

Hi All, I'm wondering how many TBs of storage you would fill with locally saved LLMs if you thought they would become unavailable online for download. I'm thinking about both large and small models, like a snapshot of the best of everything there is available online right now. Could be for coding, for writing, or for automation/robotics. Assuming that you also have the hardware to run models of any size, what's in your bugout load out if the grid goes down?

reddit.com
u/somebodys-something — 2 days ago

Mi50 16GB or V100 16GB?

Hey everyone! I'm checking out GPU market for a local LLM. I'm interested in the mi50 16GB and the v100 16GB (the 32GB versions of both GPUs are unjustifiably expensive).

Here’s what I’ve noticed while researching the topic:

V100 - the "safe" option that just works. But there's a catch: it's SXM2, so you need to buy a PCIe adapter + cooling. Ideally, you could mount cooling from a 5090-4090 (or something simpler), and then you can probably forget about overheating.

The only downside is that everything will cost more, but it'll work fine if you set it up right.

mi50 - in terms of specs, it's better than the v100, but I see some serious (in my view) problems:

- Different BIOS versions that need to be installed depending on task. Like using the Radeon VII BIOS to make it work in consumer motherboards, but sellers usually sell them already flashed, so that shouldn't be an issue.

- "Insufficient multithreading" - https://www.reddit.com/r/LocalAIServers/comments/1koltfb/comment/mt1ihpe/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button - the commenter is likely talking about vLLM.

- Old ROCm - requires some tricks with .env (which isn't a problem), but if you need anything beyond LLM inference (for example, if you want to fine-tune a model), then big problems start to arise. With the v100, these issues are much less frequent (CUDA, after all).

On the plus side, the mi50 is cheaper than a bare v100 SXM2 (and the mi50 comes with a heatsink and PCIe by default).

Also, a downside for both is the lack of flash-attention-2 support, which means newer models might just not work (though it's unclear if they won't work in vLLM or llama.cpp).

So the question remains: knowing these nuances, which is the better choice? Keeping in mind that I'll likely buy several GPUs.

reddit.com
u/CommonResearch3314 — 4 days ago

I switched fully to local AI for a week — something changed

I stopped using cloud AI tools entirely for the past week.

Everything now runs locally.

What surprised me wasn’t performance — it was how my workflow started changing in unexpected ways.

Feels like we’re closer to personal AI stacks becoming normal than I thought.

Has anyone else fully committed to local setups

reddit.com
u/Classic-Space-5705 — 3 days ago

On-premises enterprise AI coding deployment is harder than vendors say and easier than IT teams fear

Done on-premises enterprise AI coding deployments at three different organizations. The gap between vendor documentation and operational reality is consistent enough to write up.

What vendors undersell is that the initial model selection and sizing is more consequential than they imply. The model that produces acceptable inference latency for 50 developers on your hardware may produce unacceptable latency for 200. Getting sizing right before committing to hardware is genuinely difficult and vendor estimates are optimistic. Context engine configuration is also more work than "connect it to your repos" on complex enterprise codebases.

What IT teams overestimate is the ongoing operational overhead. Once the deployment is stable it's much lower than most internal teams expect. It's infrastructure maintenance. The tools designed for enterprise AI coding deployments have admin interfaces that don't require deep AI expertise to operate. The things that go wrong are things IT teams already know how to handle.

The organizations that struggle with on-premises AI coding are the ones that either chose hardware before understanding real sizing requirements or tried to do it without someone who's done a deployment before owning the initial configuration.

reddit.com
u/Major-Language8609 — 3 days ago
▲ 1 r/LocalAIServers+1 crossposts

Hi everyone !

I have vibecoded something for testing purpose (probably need more human work if it's interesting).
Test on hardware is planned (need to buy some hardware).
It's basically an implementation of mainframe like but with more simple hardware.

The concept is around RDMA for shared memory and is planned to be used in AI infrastructure.
I'm not really great in Kubernetes & RDMA, more DevOps/Coding profil, so help can be apreciate.

It's called Frame and availlable in github : https://github.com/Plume-Labs/frame

Happy to discuss!

u/Electronic_Horse_752 — 7 days ago
▲ 3 r/LocalAIServers+1 crossposts

Need your help - question about agentic AI Agent OS

Hey guys,

I am building a - in my opinion - pretty advanced Agent OS right now but I am not out for ads and I hope you can help me out:

Tell me the most important things that come to your mind, if you think about agentic AI Systems - specifically about Agent OS Systems. Which capabilities should a system you would actually use have?

Are you guys even interested in local-first GDPR compliant architectures?

You would really help me by bringing your thoughts to me.

Thanks in advance!

reddit.com
u/Competitive_Book4151 — 5 days ago