LLM RAG Query Expansion For Better Prompting Results

Link

What the article covers

In this article, we’ll walk you through a very simple yet effective technique that allows us to make sure we are retrieving more of, and more relevant bits of context to a given query: query expansion.

TL;DR: Query expansion increases the number of results, so it increases recall (vs precision). In general, BM25 favors precision while embedding retrieval favors recall (See this explanation by Nils Reimers). So, it makes sense to use BM25+query expansion to increase recall in cases where you want to rely on keyword search.

My Thoughts

Overall takeaway

LLMs are very sensitive to prompting and their training data.

Some organizations train their own models or fine-tuned existing models on the data they want to ensure is included to solve the LLM sensitivity issues.

These two solutions proved expensive (to train a whole new LLM) or complicated (you need someone who knows how to fine tune a model with your data).

Rather than training a new model, researchers developed Retrieval-Augmented Generation (RAG), in which the LLM system uses the user’s prompt and retrieves data from a new data source to combine and send to the LLM model to process.

However, RAG is still sensitive to prompting because the system needs to figure out what data to retrieve from the new data source before providing it to the LLM Model.

Enter Query Expansion as a potential solution to this sensitivity problem

Query Expansion generates similar queries to feed into the RAG system and the LLM model.

This RAG prompt strategy helps provide more data and examples to an LLM model through the same initial prompt.

The Query Expansion RAG system works for humans rather than asking them to provide synonyms or similar options to their initial prompt.

This strategy improves the results by providing more context to the LLM model.

Strategic Implications

For technical leaders, Query Expansion offers several advantages:

Implementation Simplicity: Works with existing RAG systems
User Experience: Reduces prompt engineering burden
Cost Efficiency: Cheaper than model training/fine-tuning
Result Quality: Better context leads to better outputs
Scalability: Can improve over time with usage patterns

Implementation Framework

For teams implementing Query Expansion:

Start with an existing RAG pipeline
Add query expansion layer
- Choose expansion strategy (synonym, semantic, hybrid)
- Define expansion limits
Implement result consolidation
Monitor and tune
- Track query coverage
- Measure result quality
- Optimize expansion parameters

Key Takeaways

Query Expansion represents a pragmatic evolution in RAG systems by making them work smarter rather than harder.

Instead of requiring perfect prompts or expensive model training, it automatically enhances queries to get better results from existing systems.

For AI application developers, this means:

More robust applications with less prompt engineering
Better results without model retraining
Improved user experience through automatic context enhancement

Anytime the system can expand the human’s thinking while delivering more of what the human wants, it is a win.