u/Cool_Literature2565

Introducing model-router: Cost-aware auto-routing plugin for Hermes Agent

Hey r/hermesagent!

I built a plugin that routes each conversation turn to the cheapest model that can handle it — automatically, with no manual switching.

5 tiers, cheapest that prolly can handle task wins:

| T1 | qwen3.5-flash

| T2 | deepseek-v4-flash

| T3 | minimax-m2.7

| T4 | deepseek-v4-pro + reasoning

| T5 | claude-sonnet + reasoning | (manual only)

Every turn: Flash classifies complexity in ~1s → cheapest viable model runs. Simple messages stay on T1. Complex ones get T2-T4. Cost drops dramatically for normal daily use.

## Auto-escalation

If tools fail 2× in a row → bumps up one tier automatically. Keeps going until T4 if needed. Error counter resets after escalation so subsequent failures can escalate again. When the turn ends → drops back to the base tier.

You never think about it. It just works.

## Explicit hints

Write `T3` or `tier4` anywhere in your message to force that tier — no Flash call:

- "T3 review this code" → MiniMax

- "use tier4 for this architecture design" → Haiku

- "T1 what time is it" → Flash

Multiple tier mentions (T1 T2 T3...) = discussing tiers, not requesting one → falls back to Flash.

## Manual control

- `/t1` through `/t5` — pin session to a tier

- `/auto` — resume auto-routing

- `/model` — also pins (same as manual tier)

- Pin persists until `/auto` or new session

## Status bar

Shows `[T1]`, `[T2]`, etc. in real time so you always know what's running.

Super simple install, auto-fix core files after updates, if needed. Works with profiles. Fully customizable tiers. I use it only with OpenRouter, but it should work with other providers too. Hermes WebUI integration.

---

**Repo:** https://github.com/open-world-project/model-router

Feedback welcome!

u/Cool_Literature2565 — 5 days ago