Link
What the article covers
In this article, we’ll walk you through a very simple yet effective technique that allows us to make sure we are retrieving more of, and more relevant bits of context to a given query: query expansion.
TL;DR: Query expansion increases the number of results, so it increases recall (vs precision). In general, BM25 favors precision while embedding retrieval favors recall (See this explanation by Nils Reimers). So, it makes sense to use BM25+query expansion to increase recall in cases where you want to rely on keyword search.
My Thoughts
Overall takeaway
LLMs are very sensitive to prompting and their training data.
Some organizations train their own models or fine-tuned existing models on the data they want to ensure is included to solve the LLM sensitivity issues.
These two solutions proved expensive (to train a whole new LLM) or complicated (you need someone who knows how to fine tune a model with your data).
Rather than training a new model, researchers developed Retrieval-Augmented Generation (RAG), in which the LLM system uses the user’s prompt and retrieves data from a new data source to combine and send to the LLM model to process.
However, RAG is still sensitive to prompting because the system needs to figure out what data to retrieve from the new data source before providing it to the LLM Model.
Enter Query Expansion as a potential solution to this sensitivity problem
Query Expansion generates similar queries to feed into the RAG system and the LLM model.
This RAG prompt strategy helps provide more data and examples to an LLM model through the same initial prompt.
The Query Expansion RAG system works for humans rather than asking them to provide synonyms or similar options to their initial prompt.
This strategy improves the results by providing more context to the LLM model.
Strategic Implications
For technical leaders, Query Expansion offers several advantages:
- Implementation Simplicity: Works with existing RAG systems
- User Experience: Reduces prompt engineering burden
- Cost Efficiency: Cheaper than model training/fine-tuning
- Result Quality: Better context leads to better outputs
- Scalability: Can improve over time with usage patterns
Implementation Framework
For teams implementing Query Expansion:
- Start with an existing RAG pipeline
- Add query expansion layer
- Choose expansion strategy (synonym, semantic, hybrid)
- Define expansion limits
- Implement result consolidation
- Monitor and tune
- Track query coverage
- Measure result quality
- Optimize expansion parameters
Key Takeaways
Query Expansion represents a pragmatic evolution in RAG systems by making them work smarter rather than harder.
Instead of requiring perfect prompts or expensive model training, it automatically enhances queries to get better results from existing systems.
For AI application developers, this means:
- More robust applications with less prompt engineering
- Better results without model retraining
- Improved user experience through automatic context enhancement
Anytime the system can expand the human’s thinking while delivering more of what the human wants, it is a win.