1 article
New benchmarks and frameworks measure how agentic systems handle constrained evidence, multimodal perception, and complex sequential reasoning.