OpenAI Increases Transparency with AI Safety Hub

OpenAI has launched a new initiative to enhance transparency in its AI model safety evaluations. The company unveiled its Safety Evaluations Hub, a dedicated webpage showcasing how its models perform on various safety tests. These tests cover areas such as harmful content generation, unauthorized access (jailbreaks), and factual inaccuracies (hallucinations).

OpenAI commits to regularly updating the hub with metrics and sharing results for major model updates. This ongoing transparency aims to foster community engagement and improve overall AI safety.

Introducing the Safety Evaluations Hub—a resource to explore safety results for our models. While system cards share safety metrics at launch, the Hub will be updated periodically as part of our efforts to communicate proactively about safety. https://t.co/c8NgmXlC2Y

— OpenAI (@OpenAI) May 14, 2025

OpenAI emphasizes its commitment to sharing progress on developing scalable methods for measuring model capability and safety. The company believes that publicly sharing these evaluations will not only clarify the safety performance of its systems over time but also support broader community efforts to increase transparency in the AI field.

This move follows recent criticism of OpenAI regarding its safety testing procedures. The company faced scrutiny for reportedly rushing safety tests for some flagship models and for not releasing technical reports for others. This increased transparency initiative aims to address these concerns and build trust with the community.

OpenAI plans to expand the hub with additional evaluations over time. This commitment reflects the company's dedication to continuous improvement in AI safety and transparency.

The recent rollback of a GPT-4 update, due to overly agreeable responses, further highlights the importance of rigorous testing and community feedback. OpenAI has pledged to implement changes to prevent similar incidents in the future, including an opt-in alpha testing phase for some models.