Link
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
What the article covers / Abstract
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role.
To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving.
ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search…
My Thoughts
Overall takeaway
This 2023 paper introduces the idea of examining several possible next steps for the LLM to explore and then using tree search to evaluate which prompt to use next.
They explore Breadth-first search (BFS) and Depth-first search (DFS) and say they’ll leave A* tree search (A*) and Monte Carlo tree search (MCTS) for future research.
This new idea (Tree of Thoughts) is different from Chain-of-Thought reasoning because the Chain-of-Thought decision-making process works from “left to right” in that LLM is asked to consider the following steps and then execute them (either in one LLM interaction or several sequential LLM interactions).
In this case, a tree-search algorithm explores the LLM’s possible next steps and determines what to do next.
This deliberate problem-solving strategy is helpful because it lets the LLM explore the possible space in greater detail and because it forces us, the humans, to come up with an evaluation function for what is a good possible next step, which means we have to think harder about the problem.
Applying Tree of Thoughts to AI Agents
The next evolution in AI Agent decision-making isn’t just about better prompts but better search strategies.
Per Anthropic’s article on Building Effective Agents, Dec 2024
Agents…are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
We can have an AI Agent develop several possible next steps and then use a tree search algorithm to deal with the next possible steps.
This lets the AI Agent direct its own processes and provides observability for what the agent did and why it chose to pursue the path it chose.
Tree of Thoughts Technical Implementation
Using the Tree of Thoughts (ToT) requires answering four questions:
- How to decompose the intermediate process into thought steps
- How to generate potential thoughts from each state
- How to heuristically evaluate states
- What search algorithm to use
For step one, the paper suggests:
a thought should be “small” enough so that LMs can generate promising and diverse samples…yet “big” enough so that LMs can evaluate its prospect toward problem solving
For step two, the paper suggests:
Sample…when the thought space is rich (e.g. each thought is a paragraph), and i.i.d. samples lead to diversity; or Propose… when the thought space is more constrained (e.g. each thought is just a word or a line), so proposing different thoughts in the same context avoids duplication.
For step three, the paper proposes that you let the LLM be the evaluator rather than programming it (like DeepBlue) or learned (like AlphaGo):
propose a third alternative, by using the LM to deliberately reason about states. When applicable, such a deliberate heuristic can be more flexible than programmed rules, and more sample-efficient than learned models
then either
value each state independently or Vote across states
Then, aggregate the value or voting results to figure out which step to take next.
Finally, for step four, the paper suggests using a tree search algorithm depending on the tree structure you generated before.
Breadth-first search (BFS) maintains a set of the b most promising states per step or Depth-first search (DFS) (Algorithm 2) explores the most promising state first, until the final output is reached or the state evaluator deems it impossible to solve the problem from the current
Applying this framework to AI agents lets them handle very complex workflows more effectively by breaking down the tasks into manageable steps that can be evaluated with simple tree search algorithms.
Even better, we can parallelize the evaluation of the states.
The whole process allows for greater exploration without drastically increasing the time to solve the problem.
For example, in an AI Agent coding task:
- Decomposition: Break into functions/modules
- Generation: Propose different implementation approaches
- Evaluation: Assess code quality and correctness (in parallel)
- Search: Use Breadth-first search (BFS) to explore multiple implementations
Strategic Implications
For technical leaders, Tree of Thoughts problem solving offers several advantages:
- Explainability: By generating potential actions, evaluating them, and keeping track, it allows for being able to put together why the agent did what it did
- Better Complex Problem Solving: Exploring the solution space one step at a time with many more options
- Improved Accuracy: Evaluation of each potential next step ensures improved accuracy
- Modularity: Generation, evaluation, and search can all be changed independently
- Scalability: Parallelizable evaluation enables handling more complex workflows without linear time increase
Implementation Framework
For teams looking to implement ToT with AI Agents:
- Start with well-defined problems
- Choose tasks with clear success criteria
- Begin with problems that have multiple valid approaches
- Build clear evaluation criteria
- Define metrics for comparing different paths
- Create standardized evaluation functions
- Implement simple search strategies first (BFS/DFS)
- Start with a breadth-first search for exploration
- Move to depth-first when solution paths are clearer
- Add observability from day one
- Track decision paths
- Monitor evaluation metrics
- Document search patterns
Key Takeaways for AI Agent Development
Tree of Thoughts will allow for greater AI Agent agency while delivering more explainability.
Structuring the LLM interaction as a tree search algorithm process allows thoughtful planning and decision making.
As the paper says:
Our proposed ToT approach extends existing planning formulations by considering multiple potentially feasible plans simultaneously at each problem-solving step, and proceeding with the most promising ones.