Venue
Neural Information Processing Systems
Domain
Artificial Intelligence, Natural Language Processing
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference.This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role.To surmount these challenges, we introduce a new framework for language model inference, "Tree of Thoughts" (ToT), which generalizes over the popular "Chain of Thought" approach to prompting language models, and enables exploration over coherent units of text ("thoughts") that serve as intermediate steps toward problem solving.ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords.For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.37th Conference on Neural Information Processing Systems (NeurIPS 2023).
The paper introduces the Tree of Thoughts (ToT) framework for enhancing problem solving in large language models (LMs). Traditional LMs, while capable of extensive textual tasks, face limitations in tasks requiring strategic reasoning and planning due to their autoregressive, token-level decision-making processes. The ToT framework allows LMs to explore multiple reasoning paths, evaluate choices, and backtrack when necessary by treating problem-solving as a search over a tree, where each node represents coherent thoughts leading to a solution. The authors empirically demonstrate the effectiveness of ToT through experiments on three novel tasks: Game of 24, Creative Writing, and Mini Crosswords, achieving substantial improvements in success rates compared to traditional prompting methods. The framework integrates recent cognitive scientific insights on human decision-making, specifically the distinction between fast (System 1) and deliberative (System 2) reasoning, proposing a structured approach to enhance LMs' problem-solving capabilities without the need for additional training.
This paper employs the following methods:
- Tree of Thoughts (ToT)
- Breadth-First Search (BFS)
- Depth-First Search (DFS)
The following datasets were used in this research:
- Game of 24
- Creative Writing
- Mini Crosswords
- Success Rate
- Coherency Score
- ToT achieved a success rate of 74% on Game of 24 compared to 4% with chain-of-thought prompting.
- ToT generated more coherent passages with an average score of 7.56 compared to 6.19 for IO and 6.93 for CoT in Creative Writing.
- ToT improved word-level success rates to 60% in Mini Crosswords.
The authors identified the following limitations:
- Number of GPUs: None specified
- GPU Type: None specified
Tree of Thoughts
problem solving
search algorithms
prompting techniques
large language models