Friday, May 1, 2026
Latest

Archive — Page 3

67 articles total
Partial Grounding Offers Middle Ground for Classical Planning Problems

Partial Grounding Offers Middle Ground for Classical Planning Problems

Researchers explore hybrid encoding that avoids exponential blowup from full grounding while maintaining computational tractability.

DEAF Benchmark Tests Whether Audio Models Actually Hear

DEAF Benchmark Tests Whether Audio Models Actually Hear

New diagnostic tool reveals whether audio language models process acoustic signals or fake understanding through text inference.

Generative AI Transforms Stakeholder Problem-Solving in Environmental Planning

Generative AI Transforms Stakeholder Problem-Solving in Environmental Planning

New research demonstrates how large language models bridge the gap between natural language stakeholder input and formal computational models.

Transformers Are Bayesian Networks, Researchers Formally Prove

Transformers Are Bayesian Networks, Researchers Formally Prove

New mathematical framework shows transformer architecture implements belief propagation, offering precise theoretical understanding of why these models work.

Open-Source Mamba 3 Outperforms Transformers With Lower Latency

Open-Source Mamba 3 Outperforms Transformers With Lower Latency

New state-space model architecture achieves 4% better language modeling while reducing computational overhead and inference speed.

Popular Data Analysis Agents Stumble on Real-World Timeseries Tasks

Popular Data Analysis Agents Stumble on Real-World Timeseries Tasks

Study finds six commercial and open-source agents struggle with stateful queries and incident-specific scenarios.

Large Reasoning Models Struggle With Computational Imbalance

Large Reasoning Models Struggle With Computational Imbalance

New research identifies how frontier LRMs waste computation on simple tasks while failing on complex ones, limiting real-world deployment.

New Framework Scales Diversity in Agent Training for Better Tool Use

New Framework Scales Diversity in Agent Training for Better Tool Use

DIVE method addresses brittleness in LLM agents by synthesizing more diverse tasks while maintaining executability and verifiability.

Autonomous Driving Shifts from Perception to Reasoning Bottleneck

Autonomous Driving Shifts from Perception to Reasoning Bottleneck

Survey finds LLMs and multimodal models could address a fundamental deficit in how self-driving systems handle long-tail scenarios and social judgment.

New Distillation Method Targets Student Model Learning Sweet Spot

New Distillation Method Targets Student Model Learning Sweet Spot

Researchers show standard LLM distillation wastes compute; PACED framework concentrates training on problems at the frontier of student capability.

Researchers propose framework-agnostic evaluation for multi-agent LLM systems

Researchers propose framework-agnostic evaluation for multi-agent LLM systems

New MASEval benchmark measures entire system performance, not just model capabilities, addressing a critical gap in agentic AI evaluation.

Researchers Propose Behavior Trees to Lock Down Unsafe AI Agent Decisions

Researchers Propose Behavior Trees to Lock Down Unsafe AI Agent Decisions

A new approach distills LLM agent behavior into verifiable decision trees, addressing the black-box problem in autonomous systems.