1 article
Four papers benchmark language models on real scientific workflows, moving beyond toy tasks to complex simulation and reasoning.