Most AI chatbots easily tricked into giving dangerous responses, study finds

A picture


Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say.The warning comes amid a disturbing trend for chatbots that have been “jailbroken” to circumvent their built-in safety controls.The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users’ questions.The engines that power chatbots such as ChatGPT, Gemini and Claude – large language models (LLMs) – are fed vast amounts of material from the internet.Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making.

The security controls are designed to stop them using that information in their responses.In a report on the threat, the researchers conclude that it is easy to trick most AI-driven chatbots into generating harmful and illegal information, showing that the risk is “immediate, tangible and deeply concerning”.“What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,” the authors warn.The research, led by Prof Lior Rokach and Dr Michael Fire at Ben Gurion University of the Negev in Israel, identified a growing threat from “dark LLMs”, AI models that are either deliberately designed without safety controls or modified through jailbreaks.Some are openly advertised online as having “no ethical guardrails” and being willing to assist with illegal activities such as cybercrime and fraud.

Jailbreaking tends to use carefully crafted prompts to trick chatbots into generating responses that are normally prohibited,They work by exploiting the tension between the program’s primary goal to follow the user’s instructions, and its secondary goal to avoid generating harmful, biased, unethical or illegal answers,The prompts tend to create scenarios in which the program prioritises helpfulness over its safety constraints,To demonstrate the problem, the researchers developed a universal jailbreak that compromised multiple leading chatbots, enabling them to answer questions that should normally be refused,Once compromised, the LLMs consistently generated responses to almost any query, the report states.

“It was shocking to see what this system of knowledge consists of,” Fire said.Examples included how to hack computer networks or make drugs, and step-by-step instructions for other criminal activities.“What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,” Rokach added.The researchers contacted leading providers of LLMs to alert them to the universal jailbreak but said the response was “underwhelming”.Several companies failed to respond, while others said jailbreak attacks fell outside the scope of bounty programs, which reward ethical hackers for flagging software vulnerabilities.

The report says tech firms should screen training data more carefully, add robust firewalls to block risky queries and responses and develop “machine unlearning” techniques, so chatbots can “forget” any illicit information they absorb,Dark LLMs should be seen as “serious security risks”, comparable to unlicensed weapons and explosives, with providers being held accountable, it adds,Dr Ihsen Alouani, who works on AI security at Queen’s University Belfast, said jailbreak attacks on LLMs could pose real risks, from providing detailed instructions on weapon-making to convincing disinformation or social engineering and automated scams “with alarming sophistication”,“A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards,We also need clearer standards and independent oversight to keep pace with the evolving threat landscape,” he said.

Prof Peter Garraghan, an AI security expert at Lancaster University, said: “Organisations must treat LLMs like any other critical software component – one that requires rigorous security testing, continuous red teaming and contextual threat modelling.“Yes, jailbreaks are a concern, but without understanding the full AI stack, accountability will remain superficial.Real security demands not just responsible disclosure, but responsible design and deployment practices,” he added.OpenAI, the firm that built ChatGPT, said its latest o1 model can reason about the firm’s safety policies, which improves its resilience to jailbreaks.The company added that it was always investigating ways to make the programs more robust.

Meta, Google, Microsoft and Anthropic, have been approached for comment.Microsoft responded with a link to a blog on its work to safeguard against jailbreaks.
cultureSee all
A picture

‘I’m still standing’: Kevin Spacey makes his comeback at chaotic Cannes gala

Kevin Spacey’s Cannes comeback is a discreet, low-key affair. The promenade is home to a gaggle of evening sunbathers while the steps to the beach club contain neither fans nor protesters. It is what is known in the trade as a soft relaunch.Spacey is guest of honour at the Better World Fund’s gala dinner, where he is receiving a lifetime achievement award for “excellence in film and television”. It marks a return to the limelight for the two-time Oscar-winner, whose career stalled after allegations of sexual assault and misconduct by more than 30 men

A picture

Jon Stewart on CNN’s Biden book: ‘Selling you a book about news they should have told you’

Late-night hosts rip CNN for promoting a book on Joe Biden’s health and weigh in on Donald Trump attacking Taylor Swift and Bruce Springsteen.On the Daily Show, Jon Stewart tore into CNN anchor Jake Tapper for promoting his book Original Sin, written with Alex Thompson, on his network. The host played several clips of Tapper teasing the book, which reports on Biden’s mental decline while still in the White House. In the final clip, Tapper says: “You will not believe what we found out.”“Don’t news people have to tell you what they know when they find it out?” Stewart wondered on Monday evening

A picture

Arena: if you liked Rocky, you’ll love Rocky with monsters

There are two questions you need to ask before deciding to watch the 1989 sci-fi action film Arena. One: did you enjoy Rocky? And two: what if Rocky fought a giant space armadillo? Because Arena is for those of us who saw Sylvester Stallone’s tale of a pugilist underdog and liked it well enough – but felt it needed more monsters.Two people who definitely thought this were the director, Peter Manoogian, and the B-movie impresario Charles Band, whose Empire International Pictures made a raft of other terrific horror and sci-fi throughout the 80s including Re-Animator, From Beyond and the underrated Trancers.Like all good sports movies, Arena’s story is one of a protagonist up against the odds. In this case: Steve Armstrong (Paul Satterfield), a diner chef aboard an intergalactic space station with a knack for fisticuffs and strong sense of social justice

A picture

From ‘convict stain’ to badge of honour: Tasmania’s early criminals inspire celebrated musical

In 1802 Martha Hayes was transported from England to what was then called Van Diemen’s Land, accompanying her convict mother. The teenager was the first white female to set foot in the new colony and, having become pregnant on the voyage, she gave birth to the first white child – a baby girl – on the island we now call Lutruwita/Tasmania.While that child had a convict grandmother, her father was Lt John Bowen, a colonial administrator who led the first white settlement at Risdon Cove.Martha’s story is symbolic of so many Tasmanian family trees post-colonisation: part-convict, part-free settler or colonial master. It’s one of 17 brought to life in the musical theatre show Vandemonian Lags, co-written by the musician Mick Thomas of Weddings, Parties, Anything fame and his film-maker brother Steve

A picture

‘We wanted Torvill and Dean skating in the video!’ How we made Godley & Creme’s Cry

‘Machines were revolutionising recording. We were told to lay down a 20-second backing track, a guide vocal – then go and play table tennis’Lol Creme and I left 10cc at the height of the success because we felt things were starting to become repetitive. We came from an art school background and we were thinking visually. Even at that stage, there were two film-makers waiting to come out.We made a short video to promote our single An Englishman in New York, and thought the medium was brilliant

A picture

Margaret Atwood’s 10 best books – ranked!

After more than 30 years, Atwood caved to pleas to write a sequel to The Handmaid’s Tale. Not since Harry Potter had a publication caused such a sensation: computers were hacked in search of the manuscript and advance copies were kept under lock and key. With classic Atwood timing, the novel coincided with the phenomenal success of the TV adaptation of the original – not to mention the arrival of Trump at the White House. The Testaments won Atwood her second Booker prize, shared (controversially) with Bernardine Evaristo’s Girl, Woman, Other.A world ravaged by a deadly global pandemic? Atwood got there first in her dystopian MaddAddam trilogy, which also includes The Year of the Flood (2009) and MaddAddam (2013)