1 article
New benchmark reveals how large language models drift from original constraints during multi-turn collaborative refinement.