AI Models Start To Reason Before Responding

OpenAI began testing a series of AI models that can reason "much like a person" through complex questions before responding. Initial topics are related to science, coding and math. 

The company says that through training the models can learn to refine a thinking process, try different strategies, and recognize mistakes -- something experts say is required before AI can support some of the more complex tasks being assigned to the technology.

Reasoning, for example, can support choices in developing creatives for advertising and eliminating bias. 

Early tests suggest the OpenAI o1 model performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. It excels in math and coding. In a qualifying exam for the International Mathematical Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%, the company says.

advertisement

advertisement

Still, it does not yet have many of the features that make ChatGPT useful, such as browsing the web for information and uploading files and images.

For many common cases GPT-4o will be more capable in the near term, but for reasoning tasks the latest model is a significant advancement and represents a new level of AI capability.

As part of developing these new models, OpenAI has also created a safety training approach that harnesses reasoning capabilities to make the models adhere to safety guidelines. By being able to reason about OpenAI's safety rules in context, it can apply them more effectively. 

In the future those rules can be applied to brands and advertisers, although the company did not mention this option. 

OpenAI has been measuring the promise of safety by testing how well the model continues to follow rules.

If a user tries to bypass them -- which is known as "jailbreaking" -- the model alerts the team. On one of the most detailed jailbreaking tests, GPT-4o scored 22, on a scale of 0-100, while the o1-preview model scored 84.

OpenAI said it recently formalized agreements with the U.S. and U.K. AI Safety Institutes, and is granting the institutes early access to a research version of this model. 

Google last week introduced DataGemma, which the company calls the first open models designed to connect large language models (LLMs) with extensive real-world data drawn from Google's Data Commons repository. It grounds LLMs in real-world statistical data.

Reasoning is tied to this advancement. The goal of DataGemma is to expand capabilities of Gemma models through knowledge of Data Commons that enhance LLM factuality and reasoning using two approaches: First, through RIG (Retrieval-Interleaved Generation), which enhances the capabilities of Google’s language model by querying trusted sources.

The second approach is RAG (Retrieval-Augmented Generation), which enables language models to incorporate relevant information beyond their training data, absorb more context, and enable more comprehensive and informative outputs.

Companies working on AI models have been publicly speaking about reasoning for months. In February, Microsoft-backed generative AI company, Mistral began unveiling model that the company's founder Arthur Mensch, a former Google employee, said can perform some reasoning tasks comparably with OpenAI GPT-4, and Gemini Ultra. Now reasoning is a common word to describe Mistral's models. Mistral is based in France.

Mistral’s models are reportedly fluent in English, French, Spanish, German and Italian in grammar and cultural context, according to the company. So they can be used for complex multilingual reasoning tasks, including text understanding and translation. 

Next story loading loading..