Link
Rethinking AI Agents: Why a Simple Router May Be All You Need
My thoughts
As AI Agents have become more popular and are being deployed in the real world, many developers have started gravitating towards frameworks like LangChain and CrewAI to build AI Agent applications.
This article presents an interesting pushback, suggesting that you don’t need a complex agent system.
Further, it identifies ways to use an LLM as a router rather than an agent that benefits your users.
The core argument is that instead of using an AI agent to reason through multiple steps, a simple router can often decide which function to call.
Using a router then turns many AI agent problems into classification problems because you already know most of the potential choices the LLM would make, so encode that into your system ahead of time.
The article looks at three issues with AI agents:
- High Token Usage: Agents using may make multiple LLM calls, each generating many tokens (costing time and money)
- Latency: More token generation means higher latency (time to response)
- Non-deterministic Behavior: LLMs can be unpredictable, which isn’t ideal for many applications (you’ve thought of most of the choices ahead of time and want the application to do one of those choices)
The article proposes three concepts to address these problems:
1. Thin Router Layer Instead of complex agent reasoning, use an LLM just to route to the right function/tool. In one example the post works through, this reduces token usage from ~50 tokens to ~3 tokens per decision. In that case, the name of the tool chosen was 3 tokens.
2. Code-First Business Logic Rather than letting LLMs make all decisions, encode known workflows in code. For example, a Text-to-SQL application might have fixed steps:
- Get table definitions
- Generate SQL query
- Validate query
- Execute query
3. Proper Encapsulation Keep context (user roles, chat history) properly encapsulated in your code structure. This Encapsulation helps make the system more deterministic.
The article then provides a practical Python/Flask implementation showing how to:
- Define base tools/functions
- Create a router to select tools
- Wrap it in a web application
What I found particularly interesting was the discussion of scale.
While using OpenAI/Anthropic APIs works for prototypes, at scale, I have found, as the articles mention, that most LLMs have been tuned to be chatty, so they generate more tokens than are necessary for making decisions.
Where latency counts (either the user needs a response fast) or many LLM calls will need to be made, the more terse the response, the better.
Open-source models like Functionary can be helpful here.
Functionary describes itself as a “Chat language model that can use tools and interpret the results.”
Functionary also allows you to do what’s called Grammar Sampling.
Per Functionary’s docs:
We also offer our own function-calling grammar sampling feature which constrains the LLM’s generation to always follow the prompt template, and ensures 100% accuracy for function name. The parameters are generated using the efficient lm-format-enforcer, which ensures that the parameters follow the schema of the tool called
These patterns of using focused LLM calls and deterministic routing reflect a sturdier architectural approach to AI applications - one that values simplicity and predictability over complex agent interactions.
This conclusion reminds me of Gall’s Law, which states that “a complex system that works is invariably found to have evolved from a simple system that worked.”
This approach suggests starting with simple, reliable routing patterns that can evolve as needed rather than building complex agent architectures from scratch.
Key Takeaways for AI Agent Development
This approach reminds me of the Unix philosophy: “Do one thing and do it well.”
Instead of having one agent try to do everything, have specialized functions and a simple LLM router to coordinate them.
For AI application developers, this means:
- Start simple - don’t reach for agents if a router will do
- Consider latency and determinism in your architecture
- Use code for fixed workflows, LLMs for specific tasks
- Think about scaling early - what works in the prototype might not work at scale
The article concludes by noting that you can still mix this approach with traditional agents when needed.
For example, you could have one of your router’s functions be a full agent for complex tasks while keeping simple tasks streamlined.
The architectural decision to start with simple, reliable routing patterns that evolve as needed is a pragmatic middle ground between pure agent frameworks and traditional application development.