
Google DeepMind on Wednesday detailed two speech-generation
technologies developed to help people worldwide interact with more natural, conversational and intuitive digital assistants and AI tools.
Advertisers using these technologies can generate
long-form, multi-speaker dialogue for ads or other types of content — paid, owned or organic — using written documents.
The first, NotebookLM Audio Overviews, turns uploaded
documents into dialogue. The second — Illuminate — creates formal AI-generated discussions about research papers to help make knowledge more accessible and digestible.
The post has an example of a multi-speaker dialogue that had been generated by NotebookLM Audio Overview, based
on a few documents. The technology required changing single-speaker generation models to multi-speaker models through data and model capacity for it to work.
advertisement
advertisement
Longer speech
segments require a more efficient speech codec for compressing audio into a sequence of tokens, as low as 600 bits per second, without compromising the quality of the output, according to Google
DeepMind.
Producing a two-minute dialogue requires generating more than 5,000 tokens, the company says.
And while AI can create dialogue from documents, it also can create code for
some of the most sophisticated technology at Google.
Google CEO Sundar Pichai said Tuesday during the company's Q3 earnings call that more than one-quarter of the new code at Google is
generated by AI and then checked by employees.
Coding with AI continues to increase productivity and efficiencies within Google, and it’s beginning to show the ability to boost
profits.
He also spoke about the timeline for launching AI agents. He said the company is building experiences where AI can see and reason about the world around users that allows them
to have more experiences.
"Project Astra is a glimpse of that future," he said. "We’re working to ship experiences like this as early as 2025."
Google's DeepMind project is the latest AI prototype from Google's AI division focused on artificial general intelligence (AGI), and was
presented at Google I/O 2024.
The technology acts as a universal assistant, intended to improve on user interactions with their phones or other devices.
It is based on multimodal
experiences that digests speech and video, encodes video frames and combines them with speech. Then it orders them in a timeline to provide greater context in what Google describes as a human-like
conversational flow. It is Google version of an AI agent.
Pichai also spoke about lowering the cost of processing queries. This is what analysts have been waiting to hear, and Pichai gave them
examples.
He shared that since Google first began testing AI Overviews, the company significantly lowered machine costs per query.
In eighteen months, Google reduced costs by more than
90% for these queries through hardware, engineering, and technical breakthroughs, while doubling the size of its custom Gemini model.
The news sent investors scrambling to buy Google’s
stock that rose to the high of $182 per share on Wednesday morning as of writing this post.
Alphabet, Google’s parent company, also has begun to show investments in AI are paying off in
other sectors like its cloud-computing business.
“In cloud, our AI solutions are helping drive deeper product adoption with existing customers, attract new customers and win larger
deals,” Alphabet Chief Executive Officer Sundar Pichai said in the statement.
Sales in the cloud division rose 35% to $11.4 billion, compared with the year-ago period. Google
is third in the market behind Amazon and Microsoft.