u/BitGreen1270

Is it worth getting a 5090 for my needs?

Is it worth getting a 5090 for my needs?

I'm considering biting the bullet and getting a pc with the following specs:

  • 5090
  • Amd 9950x3d
  • X870 motherboard
  • 32gb ram (16x2) CL32

EDIT2: Price for this is falling in the arena of 5500-6000 USD where I live.

Obviously costs a bomb. But I'm hoping it will become cost effective over time (10 years probably) as I intend to use it to learn as much as I can about LLMs and ideate and work on use cases for them. I also feel the future is going to be LLMs in some form or other and it's better late than never to try and keep up.

My questions

  1. how does it perform with dense models like qwen3.6-27B and gemma4-31B. These are most likely the models I'll be trying to build applications around.
  2. The alternative is using adhoc compute resources on vast.ai or maybe spend more for Google cloud or something. But that gets expensive also fast. I can keep costs down by keeping it adhoc but that increases friction.
  3. My only application is LLMs. I don't play games or anything else that needs a gpu like this one.

Edit: forgot to mention, my current system is a lenovo e14 laptop with 780m igpu and 32gb ram.

u/BitGreen1270 — 17 hours ago

Why is opencode so slow in processing the prompt with llama server?

I'm running opencode and llama-server locally. I have 32gb ram and 780m igpu. With Qwen3.6 I get around 21 t/s. Which should be decent but opencode just takes too long to process every input. What is it doing exactly?

Tmux shows the available ram at the bottom (8+ GB available). Server startup command below the video.

Once it start thinking everything goes fine.

https://reddit.com/link/1ta0pws/video/4r3b899svh0h1/player

./llama-server \
-m models/Qwen3.6-35B-A3B-UD-Q3_K_S.gguf \
--temp 0.6 \
--top_p 0.95 \
--top_k 20 \
--min_p 0.0 \
--presence_penalty 0.0 \
--repeat_penalty 1.0 \
-c 65536 \
-ctk q8_0 \
-ctv q8_0 \
--flash-attn on \
-t 16 \
-ngl 99 \
--mlock \
--host 0.0.0.0

EDIT:

Tried pi.dev and it definitely seems like it's related to the system prompt. pi.dev is definitely faster, probably because of the smaller system prompt.

https://reddit.com/link/1ta0pws/video/nt1tpf9x7i0h1/player

reddit.com
u/BitGreen1270 — 3 days ago

Why is exa not working in the chat? It connects fine in the admin panel

I have exa configured correctly in the admin panel. When I click on verify, it says successfully verified. Exa is configured as MCP streamable and time server is openai (running locally).

https://preview.redd.it/3m0jj0iysa0h1.png?width=1295&format=png&auto=webp&s=3860e5dd1aee2e365b67ecc9e93b64b79a2eafcc

When I start a new chat, both of them are disabled by default:

https://preview.redd.it/hmyvqhq7ta0h1.png?width=1176&format=png&auto=webp&s=33bd64fe3c3d3cfce7429ca63171817631f1c25a

But if I enable them both, I get this error message for exa (with the id 1):

https://preview.redd.it/w2q7a20cta0h1.png?width=891&format=png&auto=webp&s=6974431570b0eabca5803c25ac85335909a1074f

Am I doing something wrong? Apologies if I'm missing something basic.

reddit.com
u/BitGreen1270 — 4 days ago

I spent a weekend and hand coded a python script that can use tools to do math calculations, fetch news articles and convey it with sarcasm. Used opencode with a qwen3.6 and it added in a robust url fetch tool.

Am I naive in thinking this is a good starting point to build out an agentic automation for specific use cases? Or is it really that much more powerful to learn more on langchain, autogen etc?

I look at the docs and it really confuses me on what value add it provides. Is it meant to be for people without coding experience? Or large scale automation?

reddit.com
u/BitGreen1270 — 11 days ago