Dialling in LLM on VPS for performance & efficiency?
Hi all - long time lurker first time poster.
Spent a few hours last night with Claude Code setting up a Hermes Agent on a VPS and connecting it via API to several models below. I have it connected to my Second Brain vault too.
QUESTIONS:
- People are telling me you need to dial in the performance & efficiencies? All tips or tricks here?
- How do I keep costs down and efficiencies up?
- Any tips and tricks for getting this firing as effectively as I can?
Use cases: General brainstorming, documents, proposals, CV generation, image generation, prototype development etc.
🟣 Primary Model — Anthropic Model: claude-sonnet-4-6
🟠 OpenRouter
📸 Vision (image analysis)
Model: google/gemini-2.0-flash-exp:free (via OpenRouter)
📚 Session Search / Memory Summarisation
Model: google/gemini-2.0-flash-exp:free (via OpenRouter)
🤖 Subagent / Delegation
Model: deepseek/deepseek-chat (via OpenRouter)
Used for: Child agents spawned via delegate_task — the parallel research workers I use when I split tasks (like the GEO source verification just now)
🔵 DeepSeek (direct provider)
Like,Status: Configured as a provider but no separate API key in .env — currently routing through OpenRouter
🔍 Web Search & Extract — Exa
API Key: ✅ Set (EXA_API_KEY**)**
Used for: All web_search and web_extract calls — AI-native search engine powering your research