9Router Key Insights
What is 9Router?

9Router is a free, open-source local AI proxy and token optimisation gateway built for developers who use AI coding tools like Claude Code, Cursor, Codex, Cline, and GitHub Copilot. It sits between your CLI tools and over 40 upstream LLM providers, exposing a single OpenAI-compatible endpoint at localhost:20128/v1. Its core function is to eliminate coding downtime caused by quota exhaustion or rate limits through a 3-tier automatic fallback system that cascades from premium subscriptions to budget API tiers to genuinely free providers.
On top of routing, its built-in RTK Token Saver compresses tool outputs such as git diff and log dumps before they reach the LLM, cutting input token costs by 20 to 40 percent on every single request. For teams and solo developers tired of mid-session interruptions and runaway API bills, 9Router is the infrastructure layer that keeps the code flowing at near-zero cost.

9Router's RTK (Rust Token Kompressor, ported to JavaScript) intercepts every tool_result in your prompt before it is dispatched to the LLM. It peeks at the first kilobyte of data, identifies the content type (git diff, log dump, file tree, grep output), and applies targeted lossless filters. The result is 20 to 40 percent fewer input tokens per request with absolutely no loss of context. If a filter fails or makes output larger, RTK silently discards it so your request is never broken.
The fallback engine is the beating heart of 9Router. You configure a “Combo” that chains up to five model tiers: your active subscription (e.g. Claude Code Pro at $20 per month), a cheap backup (e.g. GLM at $0.60 per million tokens), and a free unlimited provider (e.g. Kiro AI). The instant a rate-limit or quota-exhausted error is detected, 9Router catches it silently and re-routes the exact same request to the next tier. Your coding tool never sees the switch.

Inspired by the viral “why use many token when few token do trick” prompt by Julius Brussee, Caveman Mode injects a system-level instruction that forces the LLM to respond in terse, stripped-down language. The technical substance of the reply is preserved in full, but verbose filler is dropped, delivering up to 65 percent fewer output tokens. For high-volume agentic coding workflows, this alone can eliminate a meaningful chunk of monthly API expenditure.

9Router acts as a universal protocol translator. Your CLI tool sends a standard OpenAI JSON request to localhost, and 9Router unpacks it, restructures it into the native format of the destination provider (Claude, Gemini, Vertex, Kiro, Cursor), fires it off, then translates the response back into OpenAI format before handing it to your tool. The result is that Claude Code, Codex, Cline, Roo, and 12 other supported CLI tools all route through a single configuration point.
9Router's dashboard provides live token consumption data, reset countdowns (5-hour, daily, weekly), and per-model cost estimation. Multi-account round-robin per provider lets you load-balance across multiple accounts so one hitting its limit does not stall the entire stack. OAuth tokens are refreshed automatically in the background so there is no manual re-authentication during active sessions.
9Router ships with first-class support for three genuinely free providers: Kiro AI (free unlimited Claude 4.5, GLM-5, MiniMax via AWS/Google OAuth), OpenCode Free (no auth, models auto-fetched), and Vertex AI ($300 Google Cloud credits for new accounts). Combined with RTK, a developer using only the free tier pays exactly $0 per month while still accessing production-grade models like Claude Sonnet 4.5.
9Router Pricing Plans
| Tier | Cost (USD) | What You Get |
|---|---|---|
| 9Router Software | $0 forever | Full proxy, all features, open-source MIT licence |
| Free Providers (Kiro, OpenCode) | $0 | Unlimited Claude 4.5, GLM-5, MiniMax, no API key |
| Vertex AI | $0 (new GCP: $300 credit) | Gemini 3 Pro, DeepSeek, GLM-5 via Google Cloud |
| Cheap Backup (GLM-5.1) | $0.60 per 1M tokens | Daily reset, great for overflow routing |
| Cheap Backup (MiniMax M2.7) | $0.20 per 1M tokens | 5-hour reset, cheapest per-token option |
| Kimi K2.5 Flat | $9 per month | 10M tokens monthly at $0.90 per 1M effective |
| Claude Code Pro/Max | $20–$200 per month | Premium subscription maximised via 9Router |
| GitHub Copilot | $10–$19 per month | Routed via MITM for model flexibility |
Deployment Flexibility — Local, VPS, Docker
9Router is not locked to a single machine. For shared teams or remote workflows, it deploys on any VPS with a straightforward npm run build && npm run start process and a handful of environment variables. Docker images are published to both Docker Hub (decolua/9router) and GitHub Container Registry for multi-platform linux/amd64 and linux/arm64 support.
Cloudflare Tunnel integration means remote tools like Cursor on a laptop can route through a server-hosted 9Router instance without opening firewall ports. The SQLite database persists all provider configs, combos, and usage history via a mounted volume.
Pros and Cons
- Free forever, MIT licence.
- 20 to 40% token savings via RTK.
- 65% output token reduction with Caveman.
- 12 CLI tools supported natively.
- Genuine zero-cost free provider support.
- Docker, VPS, and Cloudflare deployment ready.
- Requires Node.js setup (not GUI-only).
- No per-model latency benchmarking dashboard.
- Some free providers (iFlow, Qwen) discontinued in 2026.
Best 9Router Alternatives
| AI Gateway / LLM Router / Token Saver | Multi-Provider Fallback | Token Optimisation |
|---|---|---|
| OmniRoute | 4-tier, 36+ providers, TypeScript | Semantic cache only |
| LiteLLM | 100+ providers, enterprise focus | No built-in compression |
| OpenRouter | Cloud-based, no local install | No token saving features |
| LobeChat Gateway | Limited provider list | No RTK-equivalent |
