Study: Rethinking how we report on AI research

AI research papers typically report only aggregate results, without the granular detail that will allow other researchers to spot important issues like errors in recognizing certain faces on racial and gender lines. (image: iStock)

April 17, 2023

Suppose that an artificial intelligence algorithm distinguishes between female and male faces with 90 percent accuracy. Sounds impressive, right? But now, suppose that that algorithm is wrong 34.5 percent of the time for darker-skinned female faces, while erring on only 0.8 percent of lighter male faces. That’s a big problem — but right now, AI research papers typically report only aggregate results, without the granular detail that will allow other researchers to spot issues like these.

SFI Professor Melanie Mitchell coauthored a paper, published in Science on April 13, pointing out this problem and proposing solutions.

The problem of aggregation is made worse in the case of models like ChatGPT, because the system doesn’t have a single, clearly defined goal. Benchmarks like “Beyond the Imitation Game” have been developed for such models, combining more than 200 tasks. A particular score on that benchmark tells researchers little about the strengths or weaknesses of a given model. Furthermore, the culture of AI centers on outdoing the current state-of-the-art performance rather than carefully understanding existing models.

Mitchell and her colleagues propose two primary solutions. The first is that scientific journals should require far more granular analyses of the performance of AI models, revealing how well they do on all relevant subgroups. This is essential for understanding a model’s behavior: For example, one computer vision system distinguished between objects like ships and horses with high precision — but analysis showed that it knew nothing about ships or horses and was recognizing features of the surrounding background or watermarks naming the image’s source — features that wouldn’t help in the real world.

The second recommendation is that data should be released showing the model’s results on every instance it’s tested on, so that outside researchers can do further analyses.

Mitchell acknowledges that this is just a start. Because so much AI development is happening in industry rather than academia, changing publication practices can’t do all that’s needed.

“There’s a lot of discussion of whether AI systems should go through regulatory approval like we have for medical products, where the FDA requires that certain tests or studies be done,” Mitchell says. “Perhaps that’s the next step for machine learning products being deployed in the world.”

Read the paper, "Rethink reporting of evaluation results in AI" in Science (April 14, 2023). DOI: 10.1126/science.adf6369

####

NSF Grant Award No. 2020103 "AI Institute: Planning: Foundations of Intelligence in Natural and Artificial Systems"

More SFI News

View All News

Study: Rethinking how we report on AI research

April 17, 2023

Share

News Media Contact

Santa Fe Institute

Tags

More SFI News

In memoriam: Daniel C. Dennett

New Book: The time for complexity economics has come

Karen Willcox Winner of the 2024 Theodore von Kármán Prize

Tim Kohler to deliver Linda S. Cordell Lecture

To accelerate biosphere science, reconnect three scientific cultures

Mirta Galesic receives prestigious ERC Advanced Grant

Carlo Rovelli receives 2024 Lewis Thomas Prize

Research News Brief: Defining a city using cell-phone data

Complexity tools for USDA nutritional guidelines

Quantifying the potential value of data

Carlo Rovelli joins SFI's Fractal Faculty

New book offers thoughtful approach to modeling complex social systems

Research News Brief: A test of AI “personalities” and behavior

Study: To make sense of history, embrace uncertainty

Study: Predicting steps in a random process

Embodied intelligence & a sense of self

How to track important changes in a dynamic network

African and South Asian students build new connections during inaugural Complexity Global School

New gifts support SFI Education and Postdoctoral programs

The cultural evolution of collective property rights