Archive — Page 3
67 articles total
ResearchPartial Grounding Offers Middle Ground for Classical Planning Problems
Researchers explore hybrid encoding that avoids exponential blowup from full grounding while maintaining computational tractability.
ResearchDEAF Benchmark Tests Whether Audio Models Actually Hear
New diagnostic tool reveals whether audio language models process acoustic signals or fake understanding through text inference.
ResearchGenerative AI Transforms Stakeholder Problem-Solving in Environmental Planning
New research demonstrates how large language models bridge the gap between natural language stakeholder input and formal computational models.
ResearchTransformers Are Bayesian Networks, Researchers Formally Prove
New mathematical framework shows transformer architecture implements belief propagation, offering precise theoretical understanding of why these models work.
ResearchOpen-Source Mamba 3 Outperforms Transformers With Lower Latency
New state-space model architecture achieves 4% better language modeling while reducing computational overhead and inference speed.
ResearchPopular Data Analysis Agents Stumble on Real-World Timeseries Tasks
Study finds six commercial and open-source agents struggle with stateful queries and incident-specific scenarios.
ResearchLarge Reasoning Models Struggle With Computational Imbalance
New research identifies how frontier LRMs waste computation on simple tasks while failing on complex ones, limiting real-world deployment.
ResearchNew Framework Scales Diversity in Agent Training for Better Tool Use
DIVE method addresses brittleness in LLM agents by synthesizing more diverse tasks while maintaining executability and verifiability.
ResearchAutonomous Driving Shifts from Perception to Reasoning Bottleneck
Survey finds LLMs and multimodal models could address a fundamental deficit in how self-driving systems handle long-tail scenarios and social judgment.
ResearchNew Distillation Method Targets Student Model Learning Sweet Spot
Researchers show standard LLM distillation wastes compute; PACED framework concentrates training on problems at the frontier of student capability.
ResearchResearchers propose framework-agnostic evaluation for multi-agent LLM systems
New MASEval benchmark measures entire system performance, not just model capabilities, addressing a critical gap in agentic AI evaluation.
ResearchResearchers Propose Behavior Trees to Lock Down Unsafe AI Agent Decisions
A new approach distills LLM agent behavior into verifiable decision trees, addressing the black-box problem in autonomous systems.