u/AngryVal

Hi all - long time lurker first time poster.

Spent a few hours last night with Claude Code setting up a Hermes Agent on a VPS and connecting it via API to several models below. I have it connected to my Second Brain vault too.

QUESTIONS:

People are telling me you need to dial in the performance & efficiencies? All tips or tricks here?
How do I keep costs down and efficiencies up?
Any tips and tricks for getting this firing as effectively as I can?

Use cases: General brainstorming, documents, proposals, CV generation, image generation, prototype development etc.

🟣 Primary Model — Anthropic Model: claude-sonnet-4-6

🟠 OpenRouter
📸 Vision (image analysis)
Model: google/gemini-2.0-flash-exp:free (via OpenRouter)

📚 Session Search / Memory Summarisation
Model: google/gemini-2.0-flash-exp:free (via OpenRouter)

🤖 Subagent / Delegation
Model: deepseek/deepseek-chat (via OpenRouter)
Used for: Child agents spawned via delegate_task — the parallel research workers I use when I split tasks (like the GEO source verification just now)

🔵 DeepSeek (direct provider)
Like,Status: Configured as a provider but no separate API key in .env — currently routing through OpenRouter

🔍 Web Search & Extract — Exa
API Key: ✅ Set (EXA_API_KEY**)**
Used for: All web_search and web_extract calls — AI-native search engine powering your research

Dialling in LLM on VPS for performance &amp; efficiency?

Dialling in LLM on VPS for performance & efficiency?