Agentic retrieval should not make every request expensive. We added complexity evaluation so the system can distinguish direct questions from requests that need decomposition.
This is a latency and accuracy trade-off. Decomposition can improve recall for multi-part questions, but it adds work. The gate keeps that cost visible and avoids treating "more retrieval" as automatically better.


