Reranking

A second ranking stage applied after the initial search. Vector search retrieves, say, the 50 most similar chunks; a reranker model then reorders those 50 by relevance more precisely before sending the best ones to the LLM.

It's worth the extra cost because similarity search is fast but coarse — it approximates. The reranker is slower but finer, so combining the two (retrieve broadly + reorder carefully) tends to raise the quality of the final context significantly.