News

OpenAI Team that Polices AI Superintelligence Disbanded After Departures

Less than a year after its formation, OpenAI's Superalignment team, which had been tasked with researching the implications of AI superintelligence, has been dissolved.

Superalignment team leaders Jan Leike and Ilya Sutskever -- who is also an OpenAI co-founder and its erstwhile chief scientist -- have left the company. Taking Sutskever's place as chief scientist is Jakub Pachocki, who was previously OpenAI's director of research.

The reorganization came just days after OpenAI's May 13 unveiling of GPT 4o, the latest version of its flagship LLM that features enhanced, real-time voice and video processing capabilities.

On May 14, Sutskever announced his departure from OpenAI. In a post on X, he wrote, "After almost a decade, I have made the decision to leave OpenAI. The company's trajectory has been nothing short of miraculous, and I'm confident that OpenAI will build AGI [artificial general intelligence] that is both safe and beneficial."

OpenAI verified Sutskever's departure in a blog post that also named Pachocki as his successor. An accompanying statement on X by OpenAI CEO Sam Altman separately called Sutskever and Pachocki "easily one of the greatest minds of our generation."

Sutskever was the presumed architect of Altman's brief but lively ouster from OpenAI late last year. Reportedly, Altman had been driven out by members of OpenAI's board -- of which Sutskever had been a member -- after growing wary of Altman's full-bore approach to generative AI development. Before he was rehired days later by a completely reconstituted OpenAI board, Altman's unceremonious departure had spurred mass employee protests and several executive resignations in support of him -- including, incidentally, Pachocki's.

Hours after Sutskever's departure was announced, OpenAI's head of alignment, Jan Leike, also said he was leaving the company. In a terse note on X, Leike simply said, "I resigned." Days later, however, he indicated that he and OpenAI had reached an impasse when it came to the company's approach to frontier AI systems.

UPDATE, 5/28: Leike announced on X that he is joining Anthropic, maker of the Claude chatbot and self-declared "AI safety and research company." Said Leike, he is now part of an Anthropic team tasked with "scalable oversight, weak-to-strong generalization, and automated alignment research."

In mid-2023, Leike, along with Sutskever, became head of OpenAI's newly formed Superalignment group, which was meant to guide the company's approach to AI superintelligence, a term that refers to the still-hypothetical but fast-materializing milestone at which AI systems become more intelligent than humans. At the time, OpenAI described superintelligence as "the most impactful technology humanity has ever invented" that, nevertheless, "could lead to the disempowerment of humanity or even human extinction."

The intended goal of OpenAI's Superalignment group was to develop a solution that would ensure AI systems, no matter how intelligent, remain aligned with humans users' objectives. OpenAI promised to allocate 20 percent of its total compute power to that effort. However, Leike said after his resignation that his Superalignment team had been "struggling for compute."

In a Friday thread on X, Leike revealed deep qualms about how seriously OpenAI takes AI safety. He wrote, in part:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point. I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics. These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there.
...
Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity. But over the past years, safety culture and processes have taken a backseat to shiny products.

With both Sutskever's and Leike's departures, the Superalignment group was effectively broken up. OpenAI indicated to Bloomberg on Friday that Superalignment is no longer a standalone team within the company, but is instead integrated "more deeply across its research efforts."

Meanwhile, in response to the AI safety issues Leike raised in his exit letter, OpenAI president Greg Brockman posted a lengthy missive on X that described in vague, unspecific terms the company's approach to AGI:

We're really grateful to Jan for everything he's done for OpenAI, and we know he'll continue to contribute to the mission from outside. In light of the questions his departure has raised, we wanted to explain a bit about how we think about our overall strategy.
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We've repeatedly demonstrated the incredible possibilities from scaling up deep learning and analyzed their implications; called for international governance of AGI before such calls were popular; and helped pioneer the science of assessing AI systems for catastrophic risks.
Second, we have been putting in place the foundations needed for safe deployment of increasingly capable systems. Figuring out how to make a new technology safe for the first time isn't easy. For example, our teams did a great deal of work to bring GPT-4 to the world in a safe way, and since then have continuously improved model behavior and abuse monitoring in response to lessons learned from deployment.
Third, the future is going to be harder than the past. We need to keep elevating our safety work to match the stakes of each new model. We adopted our Preparedness Framework last year to help systematize how we do this.
This seems like as good of a time as any to talk about how we view the future.
As models continue to become much more capable, we expect they'll start being integrated with the world more deeply. Users will increasingly interact with systems -- composed of many multimodal models plus tools -- which can take actions on their behalf, rather than talking to a single model with just text inputs and outputs.
We think such systems will be incredibly beneficial and helpful to people, and it'll be possible to deliver them safely, but it's going to take an enormous amount of foundational work. This includes thoughtfulness around what they're connected to as they train, solutions to hard problems such as scalable oversight, and other new kinds of safety work. As we build in this direction, we're not sure yet when we'll reach our safety bar for releases, and it's ok if that pushes out release timelines.
We know we can't imagine every possible future scenario. So we need to have a very tight feedback loop, rigorous testing, careful consideration at every step, world-class security, and harmony of safety and capabilities. We will keep doing safety research targeting different timescales. We are also continuing to collaborate with governments and many stakeholders on safety.
There's no proven playbook for how to navigate the path to AGI. We think that empirical understanding can help inform the way forward. We believe both in delivering on the tremendous upside and working to mitigate the serious risks; we take our role here very seriously and carefully weigh feedback on our actions.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.

Featured