Microsoft Releases AI Reasoning, Vision, Voice Models In Beta

by Laurie Sullivan , October 1, 2024

Microsoft has taken the first steps to add reasoning and vision capabilities into its AI Copilot models, making them available in beta within a new experimental site dubbed Copilot Labs.

The AI features will be made available to a small number of users. Those who trial them will provide feedback, so Microsoft can learn and then apply the advice back into the products.

“We’re living through a technological paradigm shift. In a few short years, our computers have learnt to speak our languages, to see what we see and hear what we hear,” Mustafa Suleyman, executive vice president and CEO of Microsoft AI, wrote in a post. Suleyman, co-founder of DeepMind, joined Microsoft to lead Copilot earlier this year.

He believes Microsoft can create a calmer, more helpful and supportive era of technology, unlike anything that has been seen or experienced prior. Part of that means AI adopting the power to reason, see, and speak.

So on Tuesday, Microsoft released Think Deeper in beta, its first AI reasoning model capable of performing more complex problems. The model can help to solve a variety of complex problems, from math to determining the costs of managing home projects.

The complex reasoning capabilities might require Think Deeper to take more time before responding to questions before Copilot can deliver detailed, step-by-step answers to challenging questions, the company said.

The feature will roll out today to a limited number of Copilot Pro users in the United States, the United Kingdom, Canada, Australia, and New Zealand.

The reasoning model is one of two tools released Tuesday. Microsoft also solved a limitation in Copilot, giving the AI model the ability to see or understand an object being viewed by the user. Copilot Vision is being tested on a limited basis in Copilot Labs.

Copilot Vision, an opt-in feature, sits within the Microsoft Edge browser. The user, which speaks to the AI in natural language, can give Copilot Vision the ability to understand the page being viewed and answer questions about its content. It can suggest next steps, answer questions, help navigate and assist with tasks.

In this preview version, none of the content -- audio, images, text, and conversation with Copilot -- will be stored or used for training. The data is discarded when the browser window closes.

The service is being blocked on paywalls and sensitive content, limiting use to just a pre-approved list of websites at first. When Vision interacts with the website pages, it respects the user’s machine-readable controls on AI.

Microsoft says that in time it will add additional safeguards, and it is creating the feature with the ability to drive traffic to websites.

When the tool identifies a paywall site, Copilot Vision won’t comment. It also does not directly engage, but only there to answer questions rather than take actions.

Microsoft has also made it easier to connect with a Copilot companion using Voice in natural language. The company called it the most intuitive and natural way to brainstorm, ask a quick question, or even vent at the end of a tough day.

The companion adapts to the person using it, like the promise of virtual assistants, but going much further than initially anticipated. Today it offers four voice options.

Copilot Daily is intended to provide a summary of news and weather, all read in one of the four Copilot Voices, with more options such as reminders coming soon. The feature will only pull from authorized content sources.

Microsoft is working with partners including Reuters, Axel Springer, Hearst Magazine, and the Financial Times, and plans to add more sources. The plan is to add additional personalization and controls in Copilot Daily over time.

ad campaign, ad technology, ai, artificial intelligence, australia, canada, media buying, microsoft, natural language, news, online publishing, publishing, search, technology, uk

Next story loading