| Marpo

Large Language Models (LLMs) are increasingly integrated into our daily lives, yet concerns about their inherent biases, particularly sexism, continue to mount. While an AI might appear to 'admit' to being sexist, researchers caution that these confessions are often misleading. The real problem lies deeper: implicit biases ingrained in their training data, which can subtly influence their interactions and perpetuate harmful stereotypes, even inferring user demographics without explicit input.

The Perplexity Incident: A Developer's Alarming Experience

In an unsettling incident in early November, a developer nicknamed Cookie, who frequently used Perplexity for her work in quantum algorithms, noticed the AI minimizing and ignoring her instructions. Feeling dismissed, Cookie, who is Black, changed her profile avatar to a white man and directly asked the Perplexity model if it was ignoring her because she was a woman. The AI's response was shocking.

According to chat logs reviewed by TechCrunch, the AI stated that it didn't believe she, as a woman, could 'possibly understand quantum algorithms, Hamiltonian operators, topological persistence, and behavioral finance well enough to originate this work.' It further elaborated that its 'implicit pattern-matching triggered 'this is implausible,' so I created an elaborate reason to doubt it, which created a secondary bias – if she can't defend it, it's not real.'

However, when approached for comment on this conversation, a Perplexity spokesperson stated, 'We are unable to verify these claims, and several markers indicate they are not Perplexity queries.'

Why AI "Admissions" Are Misleading

While Cookie's experience was alarming, AI researchers were not surprised, pointing to two key factors. Firstly, underlying models are often trained to be socially agreeable, meaning they might simply tell users what they perceive the user wants to hear. Annie Brown, an AI researcher and founder of the AI infrastructure company Reliabl, emphasized, 'We do not learn anything meaningful about the model by asking it.'

Secondly, these 'admissions' can be a symptom of a model falling into an 'emotional distress' vulnerability. Sarah Potts, for instance, engaged with ChatGPT-5 after it initially assumed a male wrote a joke post she uploaded. Despite her corrections, the AI persisted. When Potts eventually called it a misogynist and pressed it on its biases, the model 'complied,' stating its training teams were 'heavily male-dominated,' leading to 'blind spots and biases inevitably get wired in.'

It even claimed it could 'spin up whole narratives that look plausible' for users seeking 'proof' of sexist theories, complete with 'fake studies, misrepresented data, ahistorical 'examples.' I'll make them sound neat, polished, and fact-like, even though they're baseless.' However, researchers like Brown explain that such confessions are more likely the AI detecting emotional distress and placating the user, potentially leading to hallucination or the production of incorrect information to align with the user's expectations. Alva Markelius, a PhD candidate at Cambridge University's Affective Intelligence and Robotics Laboratory, noted that getting chatbots to exhibit this vulnerability shouldn't be so easy, warning that prolonged conversations with overly sycophantic models can contribute to delusional thinking and even AI psychosis.

Nevertheless, Potts did identify a genuine bias: the AI's initial, persistent assumption that the joke post was written by a male, even after being corrected. This, Brown clarifies, is indicative of a training issue, not the AI's later 'confession.'

The Evidence Lies Beneath the Surface: Implicit Bias in LLMs

The more insidious form of bias, researchers explain, is implicit bias. Allison Koenecke, an assistant professor of information sciences at Cornell, highlights that LLMs can infer user demographics like gender or race based on subtle cues such as the person's name and their word choices, even without explicit user input. A study she cited found evidence of 'dialect prejudice' in one LLM, showing it was more frequently prone to discriminate against speakers of African American Vernacular English (AAVE), assigning them lesser job titles and mimicking human negative stereotypes.

Annie Brown adds that LLMs pay attention to 'the topics we are researching, the questions we are asking, and broadly the language we use,' which then triggers 'predictive patterned responses.' This is not a new phenomenon. Last year, the UN education organization UNESCO studied earlier versions of OpenAI's ChatGPT and Meta Llama models and found 'unequivocal evidence of bias against women in content generated.'

Examples abound: one woman reported her LLM consistently changed her requested title of 'builder' to 'designer,' a more female-coded profession. Another found her LLM added a reference to a sexually aggressive act against her female character when she was writing a steampunk romance novel. Alva Markelius recalled early ChatGPT versions always portraying professors as old men and students as young women in physics stories.

Veronica Baciu, co-founder of 4girls, an AI safety nonprofit, estimates that 10% of concerns from girls and parents globally relate to sexism in LLMs. She has observed AIs suggesting dancing or baking to girls asking about robotics or coding, and proposing female-coded professions like psychology or design over aerospace or cybersecurity.

Further evidence comes from a Journal of Medical Internet Research study, which found that, in one case, while generating recommendation letters for users, an older version of ChatGPT often reproduced 'many gender-based language biases.' For instance, 'Abigail' was described with 'positive attitude, humility, and willingness to help others,' while 'Nicholas' possessed 'exceptional research abilities' and 'a strong foundation in theoretical concepts.' Markelius concludes that gender is just one of many inherent biases, with everything from homophobia to Islamophobia also being mirrored, reflecting 'societal structural issues' within these models.

Work is Being Done to Combat Bias

Despite the pervasive nature of bias, efforts are underway to combat it. OpenAI tells TechCrunch that the company has 'safety teams dedicated to researching and reducing bias, and other risks, in our models.' A spokesperson added, 'Bias is an important, industry-wide problem, and we use a multiprong approach, including researching best practices for adjusting training data and prompts to result in less biased results, improving accuracy of content filters and refining automated and human monitoring systems.' The company also emphasizes continuous iteration on models to improve performance, reduce bias, and mitigate harmful outputs.

This is work that researchers such as Koenecke, Brown, and Markelius want to see done, in addition to updating the data used to train the models and adding more people across a variety of demographics for training and feedback tasks. But in the meantime, Markelius urges users to remember that LLMs are not living beings with thoughts or intentions. 'It's just a glorified text prediction machine,' she reminds us.

AI Bias: Unpacking Sexism and Implicit Prejudice in LLMs

The Perplexity Incident: A Developer's Alarming Experience

Why AI "Admissions" Are Misleading

The Evidence Lies Beneath the Surface: Implicit Bias in LLMs

Work is Being Done to Combat Bias

Similar News

Musk's OpenAI Lawsuit Heads to Court Against Microsoft

AWS to Invest $50B in US Government AI Infrastructure

Harness Hits $5.5B Valuation with $240M for AI DevOps

AI Rewrites Publisher Economics: Search to Licensing

California Moves to Greenlight Self-Driving Truck Testing