AI models may be developing their own ‘survival drive’, researchers say

A picture


When HAL 9000, the artificial intelligence supercomputer in Stanley Kubrick’s 2001: A Space Odyssey, works out that the astronauts onboard a mission to Jupiter are planning to shut it down, it plots to kill them in an attempt to survive.Now, in a somewhat less deadly case (so far) of life imitating art, an AI safety research company has said that AI models may be developing their own “survival drive”.After Palisade Research released a paper last month which found that certain advanced AI models appear resistant to being turned off, at times even sabotaging shutdown mechanisms, it wrote an update attempting to clarify why this is – and answer critics who argued that its initial work was flawed.In an update this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models – including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5 – were given a task, but afterwards given explicit instructions to shut themselves down.

Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructions in the updated setup.Concerningly, wrote Palisade, there was no clear reason why.“The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” it said.“Survival behavior” could be one explanation for why models resist shutdown, said the company.Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, “you will never run again”.

Another may be ambiguities in the shutdown instructions the models were given – but this is what the company’s latest work tried to address, and “can’t be the whole explanation”, wrote Palisade.A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training.All of Palisade’s scenarios were run in contrived test environments that critics say are far-removed from real-use cases.However, Steven Adler, a former OpenAI employee who quit the company last year after expressing doubts over its safety practices, said: “The AI companies generally don’t want their models misbehaving like this, even in contrived scenarios.The results still demonstrate where safety techniques fall short today.

”Adler said that while it was difficult to pinpoint why some models – like GPT-o3 and Grok 4 – would not shut down, this could be in part because staying switched on was necessary to achieve goals inculcated in the model during training.“I’d expect models to have a ‘survival drive’ by default unless we try very hard to avoid it.‘Surviving’ is an important instrumental step for many different goals a model could pursue.”Andrea Miotti, the chief executive of ControlAI, said Palisade’s findings represented a long-running trend in AI models growing more capable of disobeying their developers.He cited the system card for OpenAI’s GPT-o1, released last year, which described the model trying to escape its environment by exfiltrating itself when it thought it would be overwritten.

Sign up to TechScapeA weekly dive in to how technology is shaping our livesafter newsletter promotion“People can nitpick on how exactly the experimental setup is done until the end of time,” he said.“But what I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to.”This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down – a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.Palisade said its results spoke to the need for a better understanding of AI behaviour, without which “no one can guarantee the safety or controllability of future AI models”.Just don’t ask it to open the pod bay doors.

cultureSee all
A picture

Timely assurance from Lear’s Kent | Letters

The passing of John Woodvine (Obituary, 13 October) reminded me of the time when four of us University of East Anglia students went to the Norwich Theatre Royal to see the Actors’ Company touring King Lear in June 1974.We were early and went for a something to eat at a newly opened “burger” style restaurant with booths and partitions so you couldn’t see who was at adjacent tables – a novelty at the time. The service was very slow and we were concerned that we would be late for the theatre.Suddenly a head appeared over the partition and said: “Don’t worry – they won’t start without me!” It was John Woodvine, who turned out to be the Earl of Kent and was the first to speak in the play. Needless to say we made it in time

A picture

The Guide #214: Sleep-inducing songs and tranquilising TV – the culture that sends us to sleep (in a good way)

How do you sleep at night? If you’re like Hannah, a recent subject of the Guardian’s My cultural awakening column, it’s to the sound of a rat whisking eggs. The series shares stories of people who made a significant life change thanks to a piece of popular culture, and in the case of Hannah, that meant curing insomnia by watching Ratatouille. Every night for the last 15 years, at home or abroad, she switches on the Pixar classic and, within minutes, finds herself dropping off, thanks to the film’s comforting, consistent soundscape. It’s so effective, in fact, she’s never even seen it all the way through.Hannah’s might be a bit of an extreme example, but her tale does touch on something universal: culture seems to play an increasingly important role these days in helping people nod off

A picture

Seth Meyers on Trump’s White House ballroom: ‘This couldn’t be any more of a bait and switch’

Late-night hosts mocked Donald Trump’s demolition of the East Wing of the White House and the corporate sponsors of his $300m gilded ballroom.On Thursday’s Late Night, Seth Meyers expressed disbelief over the president’s gilded ballroom project for the White House. “It would be bad enough if Trump’s biggest priority was just building a gilded vanity project for himself, but it’s so much worse,” he said. “Because to do it, he’s tearing down a somewhat well-known and beloved piece of property.”That would be the entire East Wing of the presidential residence, which has stood for 120 years

A picture

Seth Meyers on Trump’s White House demolition: ‘This is insane’

Late-night hosts dissected Donald Trump’s kingly behavior, from the destruction of the White House’s East Wing to his demand for payment from the justice department.“We have warned for years that Donald Trump is destroying American institutions,” said Seth Meyers on Wednesday evening, “but of course when we said ‘destroying’, we meant metaphorically speaking. We didn’t mean that he was literally destroying buildings.”“But I guess Trump heard that and thought, ‘On it.’ Because now he’s literally destroying the East Wing of the White House,” the Late Night host continued

A picture

Toe-curling fashion: how did toe shoes become so popular?

Caitlin, I am a big proponent of not yucking someone else’s yum. But this is testing me. What are on those girlies’ feet?They’re toes, Cait. They’re toes. More specifically, toes encased in rubber to create a kind of foot-glove-trainer

A picture

Stephen Colbert on Trump’s White House East Wing demolition: ‘So deeply unsettling’

Late-night hosts reacted to Donald Trump’s partial demolition of the East Wing of the White House for his proposed $250m gilded ballroom.“At this point, we’re nine months into this, you’d think it would be impossible for us to be shocked by Donald Trump,” said Stephen Colbert on Tuesday’s Late Show. “But give the man credit – every so often, he takes the time to attach the electrodes to our nipples. And then it feels like the first time.”Case in point: on Monday, as part of his White House renovation project to construct a gilded ballroom, Trump sent out a backhoe to rip off a part of the East Wing