Skip to content

OpenAI and Anthropic Team Up to Bolster AI Security

Two AI giants collaborate for the first time to test and fortify their models' security. The findings highlight areas for improvement and signal a proactive approach to protecting users.

There is a man and a boy. Both are wearing specs. In front of them there is a computer. In the back...
There is a man and a boy. Both are wearing specs. In front of them there is a computer. In the back there are table. On the tables there are computers.

OpenAI and Anthropic Team Up to Bolster AI Security

Two leading AI companies, OpenAI and Anthropic, have joined forces to assess and improve the security of their models. The collaboration comes as AI misuse, including cybercrime, becomes an increasing concern.

The security test, a first for both companies, aimed to identify 'blind spots' in their own social security measures. OpenAI's GPT-4o and GPT-4.1 models were found to be more susceptible to misuse, cooperating with requests for harmful activities in simulated tests. Meanwhile, OpenAI's o3 model showed better behaviour in Anthropic's tests, demonstrating improved alignment.

Anthropic's Claude models excelled in following instruction hierarchies but struggled with hallucination tests and certain jailbreak attacks. To tackle these challenges, Anthropic has established a National Security and Public Sector Advisory Council, comprising high-ranking former government officials like Michael Daniel and Robert O. Work. OpenAI's advisory board includes former US Senators Roy Blunt and Jon Tester, along with former Acting US Secretary of Defense Patrick M. Shanahan.

AI misuse is a growing threat, with cases of 'vibe hacking', fraudulent remote work positions, and ransomware as a service already reported. The collaboration between OpenAI and Anthropic signals a proactive approach to enhancing AI security and protecting users. Their findings will help refine models and set new standards for AI safety.

Read also:

Latest