2 articles

Independent 30-question benchmark reveals how Google's new models stack up against competitors in practical scenarios.

New position paper argues standard accuracy metrics fail to detect memorization, data leakage, and brittle shortcuts in machine learning models.