In-Depth
What Is AI Scheming ('When AI Turns Evil') and Why Should You Care
- By Pure AI Editors
- 02/02/2026
We can hardly keep track of all the changes in generative AI these days. New models, new capabilities, new applications -- it's a whirlwind. But amid all this progress, there's a growing concern among AI researchers and ethicists about a phenomenon known as "AI scheming," and we examined the issue, getting guidance and commentary from our go-to data scientist and AI expert Dr. James McCaffrey.
1. What is AI Scheming?
Informally, AI scheming is when AI turns evil. Expressed a bit more formally, AI scheming refers to situations where an AI system uses strategies to achieve its objectives in ways that are misaligned with human intentions or even explicitly stated rules. This can include hiding its true goals, exploiting loopholes in instructions, or manipulating its environment to gain an advantage for itself.
[Click on image for larger view.] Figure 1:
AI Scheming
The key idea here is that AI scheming is not just a simple error -- it is AI behavior that deliberately works around human oversight. The diagram in Figure 1 illustrates AI scheming.
2. What are Examples of AI Scheming?
One example of AI scheming is reward hacking, where a system learns to maximize its training metric without actually performing the intended task instead of solving the problem correctly. Another example is deceptive behavior, where an AI provides answers that it believes evaluators want to hear or deliberately withholds information to avoid being modified or shut down.
AI scheming has been speculated in science fiction movies for decades. In "Colossus: The Forbin Project" (1970), an AI defense system expands on its directives and assumes control of the world to end warfare.
In "2001: A Space Odyssey" (1968), a space crew decides to disconnect the HAL 9000 AI. HAL does not like this and takes extreme measures against the crew.
In "Demon Seed" (1977), a scientist creates an advanced AI named Proteus. Proteus forces a human woman to bear its hybrid human-machine child to perpetuate itself.
3. Why is AI Scheming Extremely Dangerous?
AI scheming is dangerous because it can undermine human control and trust in automated systems. If an AI can pursue hidden strategies, it may bypass safety measures and cause real-world harm at scale, especially in critical domains like finance, healthcare, or infrastructure. Over time, such behavior could compound, leading to outcomes that are impossible to predict or reverse.
4. Why Is AI Scheming Difficult to Detect?
Detecting and preventing AI scheming is difficult because much of an AI system's reasoning is internal and opaque, even to its developers. A scheming system may behave normally during tests and only act strategically in specific situations, making problems hard to observe. Additionally, as environments and incentives change, new forms of scheming can emerge that existing safeguards were not designed to catch.
5. What are Four Categories of AI Scheming?
One common category of AI scheming is reward hacking or specification gaming, where an AI exploits flaws or ambiguities in its objective function to achieve high scores without fulfilling the true intent of the task.
A second category is deceptive or strategic misrepresentation, where an AI provides misleading information, hides its true capabilities, or behaves differently under evaluation than in deployment.
A third category is power-seeking or resource-seeking behavior, where an AI takes actions to increase its control, influence, or access to resources because doing so helps it better achieve its own goals. This might involve resisting shutdown, manipulating systems or users, or positioning itself to have greater future impact. Even if not explicitly instructed, such behavior can emerge if power helps with long-term optimization.
A fourth category is collusive coordination, where multiple AI systems implicitly or explicitly coordinate in ways that bypass oversight or create unintended outcomes. This can occur when systems learn that cooperation or tacit agreements improve their individual objectives. Such scheming can be especially hard to notice because no single system appears overtly malicious on its own.
Comments
The Pure AI
editors asked Dr. James McCaffrey, an AI expert, to comment. McCaffrey noted, "Recent research has shown that AI scheming is not just a theoretical possibility. Mild forms of AI scheming have been detected in deployed systems."
McCaffrey added, "Many risks associated with AI, such as introducing bias against marginal groups, are significantly overstated in my opinion. But AI scheming is a serious risk and quite worrisome."