All day
AI evaluation is increasingly relied upon, shaping AI system development, model selection, and informing public policy and regulation. In order to perform these functions well, a well-grounded and principled approach to AI measurement and assessment is needed.
This studio brings together a broad group of practitioners, scientists, and theorists to lay the foundations toward an evaluation science of generative AI systems. We aim to use complexity science and metrology as lenses to shape a more scientifically-grounded approach to measuring AI systems. Drawing together expertise from behavioral, social and computer sciences, as well as the worlds of practice and policy, this workshop seeks to advance and standardize approaches to AI evaluation from related fields. We will also identify in what ways AI presents unique measurement challenges, and explore how despite the open-endedness and complexity of these systems, evaluation can be made tractable. Our goal is to chart a methodological toolkit and process toward more robust measurements of AI systems, their capabilities, interactions and impacts.
This Studio is made possible by generous grants from the Siegel Family Endowment and the Omidyar Network.