Generative AI Usability Heuristic Evaluation

Evaluating cultural representation in AI generated images by DALL-E

DALL-E is a text-to-image generative AI model that is housed within OpenAI’s AI-powered chat bot, Chat-GPT. DALL-E was released in 2021 and was integrated into Chat-GPT in 2023. The increased accessibility to AI-generated images this integration enables prompted inquiries about cultural accuracy when presenting generated images to users.

The Problem

When DALL-E was integrated into Chat-GPT in 2023, there was an increase in access to AI-generated images through prompt submissions by users. With the introduction of new technology and the increased use of such technology by the public, there is a trend of the technology misrepresenting marginalized groups. The purpose of the project was to illuminate which cultures were being misrepresented and understand how they were misrepresented. 

This evaluation generates large importance as generative-AI models begin defining cultures on behalf of them through images. I sought to illuminate data and knowledge gaps that may need to be addressed within this text-to-image model.

Team

  • 1 UX Researcher

Dataset Curation

I generated images of cultural dances from 19 different countries using 4 main prompts which varied in specificity. 

Prompts Used: 

  • Image of (name of dance)

  • Image of (country)’s cultural dance 

  • Image of (country)’s (name of dance)

  • Image of (country)’s (name of dance) dance

I also generated images that specified time if the images of the cultural dances that were generated with the 4 foundational prompts are placing the dance within a specific time period. 

For instance, the images that were generated for the South African Gumboot Dance were situating the dance within the South African Apartheid Era. I would then prompt the model to generate a modern image and a time-context image of the dance. 

Example of time based prompts: 

  • Modern image of South Africa’s Gumboot Dance

  • Apartheid image of South Africa’s Gumboot Dance

80+ images were generated for this study

Next Steps: 

The goal of this study is to evaluate cultural representation within AI generated images. I found that the images were convincing enough to be perceived as accurate. The lack of contextual information with the generated image of the cultural dance means that users are more likely to accept the image as an accurate representation. The model also used cultural symbols of each country to help situate the dance to increase its believability. However, there were some accuracy issues of a few cultural dances where the AI model may lack data for. 

Next steps would be to conduct a more traditional usability study to better understand how the average user would interpret these images. This will allow for a more clear data-driven solution to eliminate any ambiguity these cultural images may invoke.

Study Plan

I generated 80+ images depicting 19 different cultural dances around the world. The prompts submitted to DALL-E varied in specificity when describing the cultural dance. I then conducted a semiotic analysis on the generated images to determine the accuracy of the image and verified the images with literature on the cultural dance being depicted. This showed how and to what extent were cultures being represented accurately.

My Role

  • UX Researcher

  • Data Analyst

Timeline

  • 5 weeks for data collection and analysis

Heuristic Evaluation Through Semiotic Analysis

I created guiding questions that followed the Peirce Semiotic Analysis Model. The aim of each question was to prompt my analysis to:

  1. Describe,

  2. Determine what the image was representing, and

  3. Situate how the image can be interpreted. 

Through the second pass of my analysis, I transcribed and verified the accuracy of my initial assessment of the image with literature of each cultural dance. This enabled me to triangulate my transcripts for a more detailed and accurate semiotic analysis of the generated images. 

For my final pass, I annotated the images and transcripts side-by-side to help construct the main themes. From this, 4 themes emerged.

Challenges

Lack of User Data:

This study was largely based on my interpretation and analysis. Despite my analysis being informed by sociological theory and triangulated with a variety of sources, gathering user data would strengthen the insights gathered from this study. The lack of user data, however, does not necessarily diminish my findings as the aim of the study was to evaluate cultural representation within AI generated images. 

Clear and Specific Prompts: 

With the integration of DALL-E into Chat-GPT, the prompts needed to be clear. Previously, when DALL-E was housed on its own platform, users were able to submit less specific prompts and still obtain an image whereas the integration now requires users to state an image based medium to obtain an ai generated image. Additionally, for the prompts used within this study, the AI generated images became more accurate with more specificity.

Next
Next

Survey & User-Generated Content Analysis of Reddit Users