Google DeepMind on Wednesday detailed two speech-generation technologies developed to help people worldwide interact with more natural, conversational and intuitive digital assistants and AI tools.
Advertisers using these technologies can generate long-form, multi-speaker dialogue for ads or other types of content — paid, owned or organic — using written documents.
The first, NotebookLM Audio Overviews, turns uploaded documents into dialogue. The second — Illuminate — creates formal AI-generated discussions about research papers to help make knowledge more accessible and digestible.
The post has an example of a multi-speaker dialogue that had been generated by NotebookLM Audio Overview, based on a few documents. The technology required changing single-speaker generation models to multi-speaker models through data and model capacity for it to work.
advertisement
advertisement
Longer speech segments require a more efficient speech codec for compressing audio into a sequence of tokens, as low as 600 bits per second, without compromising the quality of the output, according to Google DeepMind.
Producing a two-minute dialogue requires generating more than 5,000 tokens, the company says.
And while AI can create dialogue from documents, it also can create code for some of the most sophisticated technology at Google.
Google CEO Sundar Pichai said Tuesday during the company's Q3 earnings call that more than one-quarter of the new code at Google is generated by AI and then checked by employees.
Coding with AI continues to increase productivity and efficiencies within Google, and it’s beginning to show the ability to boost profits.
He also spoke about the timeline for launching AI agents. He said the company is building experiences where AI can see and reason about the world around users that allows them to have more experiences.
"Project Astra is a glimpse of that future," he said. "We’re working to ship experiences like this as early as 2025."
Google's DeepMind project is the latest AI prototype from Google's AI division focused on artificial general intelligence (AGI), and was presented at Google I/O 2024.
The technology acts as a universal assistant, intended to improve on user interactions with their phones or other devices.
It is based on multimodal experiences that digests speech and video, encodes video frames and combines them with speech. Then it orders them in a timeline to provide greater context in what Google describes as a human-like conversational flow. It is Google version of an AI agent.
Pichai also spoke about lowering the cost of processing queries. This is what analysts have been waiting to hear, and Pichai gave them examples.
He shared that since Google first began testing AI Overviews, the company significantly lowered machine costs per query.
In eighteen months, Google reduced costs by more than 90% for these queries through hardware, engineering, and technical breakthroughs, while doubling the size of its custom Gemini model.
The news sent investors scrambling to buy Google’s stock that rose to the high of $182 per share on Wednesday morning as of writing this post.
Alphabet, Google’s parent company, also has begun to show investments in AI are paying off in other sectors like its cloud-computing business.
“In cloud, our AI solutions are helping drive deeper product adoption with existing customers, attract new customers and win larger deals,” Alphabet Chief Executive Officer Sundar Pichai said in the statement.
Sales in the cloud division rose 35% to $11.4 billion, compared with the year-ago period. Google is third in the market behind Amazon and Microsoft.