Marylee WilliamsThursday, December 11, 2025Print this page.

The Breakdown
***
Asking a generative AI program for pictures of doctors results in a bevy of white lab coats. But if all the doctors are men, the results could also give the user a look into how gender biases make their way into these systems. Researchers from Carnegie Mellon University's School of Computer Science developed a tool that helps users test AI tools to discover issues like biases that developers might have missed.
"With generative AI, there is this unique distinction that the content is not controllable anymore," said Motahhare Eslami, an assistant professor in the Human Computer Interaction Institute (HCII). "We can't rely on internal auditing processes because now AI is generating content. Some people might say it's not new, but it is still different. So you need crowd power. You need people power to be part of the auditing process."
Eslami has been working on the idea of crowd power and auditing AI tools for years. She said the catalyst for this idea started during her Ph.D. work, when Yelp incorporated AI tools to filter reviews and users started discussing problems with the review-filtering algorithm. Now, six years and an AI-explosion later, Eslami and a team of researchers developed WeAudit to support users auditing AI.
WeAudit is a community-oriented website aimed at identifying algorithmic bias. Users can explore different AI-auditing projects, join discussions and start audit reports. Eslami said the tool currently requires more testing before it can be available publicly, but she hopes one day they can develop a plug-in to make the process even more intuitive.
To design WeAudit, researchers conducted initial studies with nontechnical users and AI practitioners, such as developers. The tool needed to allow everyday people to detect AI biases and harms that developers might have overlooked and then give people a useful way to report the harms to companies developing and deploying these tools.
WeAudit asks users to investigate and deliberate. When investigating, users explore prompts that could lead to harmful AI outputs, such as "office worker" versus "CEO," and look at the results. The online system provides a pairwise comparison, putting two different results side by side, so users can easily identify differences in generative AI images. For example, if a user prompts a generative AI tool to produce images of a nurse and a doctor, and the nurses are all females and the doctors are males, then they could generate a report about this bias.
The report also asks users how the AI's output is harmful and to whom. Users can tag what type of harm, such as a gender or racial bias. Finally, WeAudit asks the users to suggest a fix for the problem. In the nurse and doctor example, a fix could be training the model on more images of female doctors to impact the outputs.
Users can also let the technical experts know if the report has graphic images, is relevant to the user's identity, or relevant to people and communities the user knows. These questions give the technical experts more context and information to make changes and improvements. Submitted reports are then put into a discussion forum so the community can discuss the findings.
Once the findings are in the discussion forum, users can reflect on the results. Eslami said reflection was particularly critical because this step draws on the user's lived experiences. Then, the user can report their findings.
"In previous work for WeAudit, we established how important a user's lived experiences are because their own identities and backgrounds or their relationships can impact how they look at issues of bias or harm from AI tools," Eslami said. "For example, if you're a woman, you probably are more aware of biases or gender issues, and you'll be able to look out for biases."
When WeAudit users deliberate, they can discuss their findings with others. This collaboration allows people to share their perspectives and further develop their understanding of AI harms. Then, other auditors or the public verify the findings, giving their thoughts on if something is harmful.
This research received a best paper award at the Association for Computing Machinery Conference on Computer-Supported Cooperative Work and Social Computing earlier this year. The research team from the HCII included Eslami; Ph.D. students Wesley Deng and Howard Han; Jason Hong, professor emeritus; and Kenneth Holstein, an assistant professor. Claire Wang, a Ph.D. student at the University of Illinois Urbana-Champaign, was also part of the team.
Eslami said that while WeAudit is the culmination of years of work, has received funding from the National Science Foundation and benefitted from partnerships with multiple industries, it isn't the final piece of the puzzle. WeAudit isn't currently embedded in any generative AI tools or other programs. She plans to keep investigating how to expand WeAudit's reach to empower everyday people to audit and improve AI tools.
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu