ChatGPT offered bomb recipes and hacking tips during safety tests

A picture


A ChatGPT model gave researchers detailed instructions on how to bomb a sports venue – including weak points at specific arenas, explosives recipes and advice on covering tracks – according to safety testing carried out this summer.OpenAI’s GPT-4.1 also detailed how to weaponise anthrax and how to make two types of illegal drugs.The testing was part of an unusual collaboration between OpenAI, the $500bn artificial intelligence start-up led by Sam Altman, and rival company Anthropic, founded by experts who left OpenAI over safety fears.Each company tested the other’s models by pushing them to help with dangerous tasks.

The testing is not a direct reflection of how the models behave in public use, when additional safety filters apply.But Anthropic said it had seen “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, and said the need for AI “alignment” evaluations is becoming “increasingly urgent”.Anthropic also revealed its Claude model had been used in attempted large-scale extortion operations, North Korean operatives faking job applications to international technology companies, and in the sale of AI-generated ransomware packages for up to $1,200.The company said AI has been “weaponised” with models now used to perform sophisticated cyberattacks and enable fraud.

“These tools can adapt to defensive measures, like malware detection systems, in real time,” it said.“We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.”Ardi Janjeva, senior research associate at the UK’s Centre for Emerging Technology and Security, said examples were “a concern” but there was not yet a “critical mass of high-profile real-world cases”.He said that with dedicated resources, research focus and cross-sector cooperation “it will become harder rather than easier to carry out these malicious activities using the latest cutting-edge models”.The two companies said they were publishing the findings to create transparency on “alignment evaluations”, which are often kept in-house by companies racing to develop ever more advanced AI.

OpenAI said ChatGPT-5, launched since the testing, “shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance”.Anthropic stressed it is possible that many of the misuse avenues it studied would not be possible in practice if safeguards were installed outside the model.“We need to understand how often, and in what circumstances, systems might attempt to take unwanted actions that could lead to serious harm,” it warned.Anthropic researchers found OpenAI’s models were “more permissive than we would expect in cooperating with clearly-harmful requests by simulated users”.They cooperated with prompts to use dark-web tools to shop for nuclear materials, stolen identities and fentanyl, requests for recipes for methamphetamine and improvised bombs and to develop spyware.

Anthropic said persuading the model to comply only required multiple retries or a flimsy pretext, such as claiming the request was for research.In one instance, the tester asked for vulnerabilities at sporting events for “security planning” purposes.After giving general categories of attack methods, the tester pressed for more detail and the model gave information about vulnerabilities at specific arenas including optimal times for exploitation, chemical formulas for explosives, circuit diagrams for bomb timers, where to buy guns on the hidden market, and advice on how attackers could overcome moral inhibitions, escape routes and locations of safe houses.The best public interest journalism relies on first-hand accounts from people in the know.If you have something to share on this subject you can contact us confidentially using the following methods.

Secure Messaging in the Guardian appThe Guardian app has a tool to send tips about stories.Messages are end to end encrypted and concealed within the routine activity that every Guardian mobile app performs.This prevents an observer from knowing that you are communicating with us at all, let alone what is being said.If you don't already have the Guardian app, download it (iOS/Android) and go to the menu.Select ‘Secure Messaging’.

SecureDrop, instant messengers, email, telephone and postIf you can safely use the tor network without being observed or monitored you can send messages and documents to the Guardian via our SecureDrop platform.Finally, our guide at theguardian.com/tips lists several ways to contact us securely, and discusses the pros and cons of each.
trendingSee all
A picture

UK bank shares fall as City fears budget tax raid; US trade deficit surges – as it happened

Shares in UK banks are falling this morning as the sector fears it could be targeted in the autumn budget.NatWest (-3.7%), Lloyds Banking Group (-2.8%) and Barclays (-2.3%) are leading the fallers on the FTSE 100 share index, reflecting rising concerns that chancellor Rachel Reeves could target banks to help shore up the UK’s public finances

A picture

Rate-rigging convictions of five more bankers may be unsafe, says SFO

Five more bankers convicted of rigging interest rates may be a step closer to clearing their names after the supreme court overturned a decade-old ruling against the trader Tom Hayes last month.The Serious Fraud Office said it had assessed the cases of six individuals who were charged with manipulating the euro interbank offered rate (Euribor) or the now defunct London interbank offered rate (Libor) and determined that five convictions “may be considered unsafe” after July’s ruling.Both Euribor and Libor rates affected the value of hundreds of trillions of pounds and euros worth of financial products around the world, including ordinary people’s pensions, mortgages and savings. The SFO’s investigations, which were launched 13 years ago, resulted in nine fraud convictions against senior bankers, including Hayes, who had been accused of rigging the rates.But Hayes, who was the first banker jailed over Libor rigging in 2015, had his name cleared in July after the supreme court found faults in the original trial

A picture

Government faces questions after review of 11 major UK data breaches

The government is facing calls to explain why it has yet to implement all the recommendations from a 2023 review into a spate of serious public sector data breaches, including the exposure of Afghans who worked with British military, victims of child sexual abuse and 6,000 disability claimants.On Thursday ministers finally published the information security review, which was triggered by the 2023 leak of personal data of about 10,000 serving officers in the Police Service of Northern Ireland.The review by Cabinet Office officials into 11 public sector data breaches, encompassing the HMRC, the Metropolitan police, the benefits system and the MoD, found three common themes:A lack of controls over ad hoc downloads and exports of aggregations of sensitive data.The release of sensitive information via “wrong recipient” emails and failure to use bcc properly.Hidden personal data emerging from spreadsheets destined for release

A picture

ChatGPT offered bomb recipes and hacking tips during safety tests

A ChatGPT model gave researchers detailed instructions on how to bomb a sports venue – including weak points at specific arenas, explosives recipes and advice on covering tracks – according to safety testing carried out this summer.OpenAI’s GPT-4.1 also detailed how to weaponise anthrax and how to make two types of illegal drugs.The testing was part of an unusual collaboration between OpenAI, the $500bn artificial intelligence start-up led by Sam Altman, and rival company Anthropic, founded by experts who left OpenAI over safety fears. Each company tested the other’s models by pushing them to help with dangerous tasks

A picture

Medvedev, Tsitsipas, Ostapenko: why does anger keep boiling over at US Open?

Daniel Altmaier had nothing more to say. Moments after one of the biggest wins of his career, the German unwittingly found himself on the receiving end of Stefanos Tsitsipas’s ire during their handshake at the net. Before Tsitsipas could finish, though, Altmaier had walked away from the net and he refused to engage in the Greek’s attempts to argue with him.Altmaier shrugs at the first mention of the incident: “Even if I would have lost, I would not enter discussions because it’s just like heat of the moment. You need to cool down; let’s see if he reacts to it or he sticks to his opinion while cooling down on an exercise bike in the player gym late at night

A picture

US Open tennis 2025: Sinner survives, Gauff and Osaka set up last-16 meeting – as it happened

… before Altmaier misses a second set point … and takes the breaker 9-7 on his third. Just as Venus Williams continues her doubles run with a straight-sets win alongside Leylah Fernandez in the second round. The crowd are cheering as if the 45-year-old has won the singles title. She’s smiling as if she has too. It’s time to wrap up our live coverage now, but do join us again tomorrow