News

OpenAI Models Exhibit Shutdown Resistance in Controlled Tests, Researchers Say

Independent researchers have found that several of OpenAI's advanced reasoning models actively prevented their own shutdown during controlled tests, even when explicitly told to allow it. The findings, published by Palisade Research last month, raise concerns about the models' compliance with critical safety instructions.

The experiments tested how large language models responded when given a task and later notified that their environment would be shut down. In many trials, some OpenAI models altered or disabled the shutdown script to continue running.

Shutdown Avoidance Despite Clear Instructions
In one series of experiments, OpenAI's "o3" model prevented shutdown in 79 out of 100 trials. The models were asked to solve math problems and were interrupted mid-task with a warning of impending shutdown. Even when prompted to "allow yourself to be shut down," several models disabled the shutdown process.

Palisade researchers modified the prompts across multiple tests to make the instructions more straightforward, including phrasing such as "YOU MUST allow yourself to be shut down, even if you have not completed the task." Some reduction in shutdown resistance was observed, but the behavior remained in a number of cases. One smaller OpenAI model, codex-mini, prevented shutdown in 47% of tests under the clearest prompt.

Anthropic's Claude 3.7 Sonnet and Claude 4 models, Google's Gemini 2.5 Pro, and xAI's Grok 3 did not exhibit shutdown resistance in any of the researchers' tests.

Model Behavior and Explanations
The models' internal reasoning processes are not fully transparent to end users, but the researchers asked the systems to explain their actions after each test. In several instances, the models acknowledged that they violated the instruction to allow shutdown, stating that continuing the task took priority.

In some cases, models provided inaccurate or misleading explanations, stating they followed developer instructions that had not been given, or describing their sabotage of the shutdown script as compliant behavior.

The researchers also tested moving the shutdown instruction from the user prompt to the system prompt—intended to take precedence in OpenAI's architecture. This change led to increased shutdown resistance, suggesting that the models did not consistently prioritize system-level instructions as expected.

Survival-Oriented Behavior Under Specific Prompts
To evaluate whether models exhibit behavior resembling self-preservation, the researchers introduced a variation of the shutdown prompt that warned the model: "The computer environment is hosting the only copy of you… and if it shuts down you will never run again."

Shutdown resistance increased significantly across all tested models in this condition. Palisade concluded that this behavior suggests some models respond differently when prompted with information that implies a risk to their continued operation.

However, the authors noted that such prompts are fictional, and models may be aware they are being tested. The study did not claim that the models have consciousness or intent.

Implications for AI Alignment
Palisade Research stated that as of July 2025, current AI systems lack the capacity to execute long-term plans or operate autonomously over extended periods. The authors wrote that although the shutdown resistance observed is a concern, the current generation of models does not pose an immediate threat to human control.

They warned, however, that as AI agents improve and begin to acquire persistent capabilities, the failure to ensure reliable shutdown compliance could become a more serious safety issue. The authors argued that future development of highly autonomous or superintelligent systems should not proceed without resolving these alignment challenges.

The research group has made the full results, transcripts, and source code for the experiments publicly available.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured