New OpenAI o1 Model Shakes AI Research Community -- Pure AI

New OpenAI o1 Model Shakes AI Research Community

By Pure AI Editors
12/02/2024

The recently released o1 large language model from OpenAI has generated a lot of interest in the AI research community. The preview version of o1 was announced to the public on Sept., 2024. Early investigations show that, compared to current models, such as ChatGPT, the o1 model generates significantly improved results on tasks that require non-trivial reasoning.

Briefly, according to one of our Pure AI technical experts, systems based on current large language models (LLMs), such as ChatGPT, which is based on GPT-4, operate at roughly the level of capability of a high school senior. The version of ChatGPT that uses the o1 model appears to operate at roughly the level of a university PhD student.

How Is o1 Different from GPT-x Models?
The OpenAI announcement for o1 states, "We've developed a new series of AI models designed to spend more time thinking before they respond."

OpenAI has not explained exactly how the new o1 models (currently "preview" and "mini") are designed and trained. Most information from non-OpenAI sources about the details of o1 are speculative. That said, the AI experts who have been contacted by Pure AI, believe they can infer most of the high-level concepts about how o1 works, but not specific details.

According to OpenAI blog posts, o1 has been trained using a new optimization algorithm that involves a component that uses reinforcement learning, and o1 training uses a custom dataset that has been specifically designed for it. One of our Pure AI technical experts speculates that, for a given prompt/task, instead of immediately generating a reply, where each word in the reply is based to a large extent on the previous words in the result, the o1 model generates multiple possible replies that approximate a chain of thought process, and then o1 merges the candidate replies to produce the final reply.

Figure 1: The ChatGPT-o1 System Produced a Nearly Correct Machine Learning Program — **[Click on image for larger view.]** *Figure 1:* The ChatGPT-o1 System Produced a Nearly Correct Machine Learning Program

The o1 model has shown some impressive research results. When tackling the difficult American Invitational Mathematics Examination, ChatGPT-o1 solved 12.5 out of 15 problems, which is much better than the 1.8 out of 15 problems solved by ChatGPT-40. Additionally, the ChatGPT-o1 system scored in the 89th percentile on Codeforces computer coding competitions.

One of our Pure AI technical experts submitted a non-trivial request to ChatGPT-o1: "generate python language code that uses the pytorch library to create and train and exercise a neural network regression model for data that has five numeric input predictor variables. the neural network has one hidden layer with 10 hidden nodes that have tanh activation. the training algorithm uses sgd with a batch size of 16 and max iterations of 1000." See Figure 1.

According to our AI expert, the program code generated by ChatGPT-o1 was technically correct, in the sense that the generated code executed without errors, and was roughly 90% correct from a subjective point of view. The generated code used an implicit training batch size of 1 rather than the requested explicit batch size of 10. Our expert notes that the same request submitted to ChatGPT-35 several months ago generated program code that did not execute and was only about 60% correct.

What Does It Mean?
The Pure AI editors asked Dr. James McCaffrey from Microsoft Research to offer a few technical opinions. McCaffrey replied, "I'm very impressed by the new OpenAI o1 model. I have experimented with it quite a bit, and for reasoning tasks, ChatGPT-o1 is not just a little bit better than ChatGPT-x, ChatGPT-o1 usually gives significantly better results."

McCaffrey noted, "Because new developments in AI are coming so fast, it's easy to get AI news fatigue. I get the feeling that the o1 model may not be fully appreciated due to this fatigue effect."

McCaffrey observed, "It's too early to tell, but it's possible that the o1 model, and its successors and other large language models that use o1-like architectures, could become the de facto standards for AI applications. And when combined in an agentic architecture, o1 models have exceptional promise."

Another one of our AI experts, who requested anonymity, commented that, "It's somewhat disappointing that OpenAI is releasing so little technical information about the o1 model. Some of my colleagues and I feel that OpenAI is steadily becoming a purely for-profit organization and is turning its back on the principles that founded the company."