One of the many bits of news from Google I/O 2019 was that Google would soon begin displaying podcasts in search results. "Soon" turned out to be very soon, as we're already seeing these results surface.
While the feature itself is interesting, and the fact that the main result goes to Apple while the episodes go to Google is entertaining, the talk out of I/O suggested something much more intriguing -- that Google would soon be indexing podcast content and returning audio clips in search results.
Can Google transcribe audio content?
Is this currently possible? In a word: yes. We know that Google has offered a speech-to-text service as part of Google Cloud Platform since 2017, which has already undergone a few iterations and upgrades. Earlier this year, Android Police spotted source-code changes which suggested that Google was proactively transcribing some podcasts on the Google Podcasts platform.
I’ve seen evidence of this capability in the broader Google ecosystem. in an automatic transcript on my Google Pixel phone for a recent call. The message left from Dick’s Sporting Goods said “Hi, this message is for Pete Meyer. This is Sarah calling from Dick’s Sporting Goods. You just placed an order for a bike to pick up in our store here in Bloomingdale. We do have the bike in stock. I was just calling to let you know I do not have a bike tech that is working tonight. I do have one tomorrow morning. So the bike will be ready after 10 a.m.”
We even see evidence of this capability in search results, but in a different medium. As early as April 2017, Google was testing suggested clips in YouTube videos. What's interesting is that variations on this search not only produce different videos in some cases, but different clips within the same video.
Now, the suggested clip is 101 seconds long and starts at the 1:54 mark. It's clear from some suggested clips that the feature is still in its infancy, but it's difficult to imagine Google being able to implement this feature dynamically without creating a transcript of the audio portion of these videos.
Why start with video? For Google, it just makes bottom-line sense. YouTube is a planetary system to the pleasant suburb of Google Podcasts and has an immensely powerful infrastructure backing it. If Google can return results based on the audio portion of a video, it's only natural they can do the same for audio files.
How will audio surface in search?
The obvious starting points will be extensions of the podcast engine, including automatic transcription and full-text (full-audio) search – both of which already seem to be in the works. Once you can search within Google Podcasts, though, expect that search capability to broaden to general Google searches.
One big question is whether Google will return audio content directly or will use transcribed text. In some cases, returning audio clips may be a better match to searcher intent. If you're searching for a movie clip or something you heard in a podcast, returning the original is a richer experience than returning plain text.
The big advantage, though, will be to voice devices, such as Google Home. Returning audio would fill a content gap for voice devices and provide a direct bridge into full podcasts and other non-text content.
How many podcasts should I start?
We do seem to be in the midst of a minor podcast revival, and audio search may spark that revival. As always, however, expect Google to release changes gradually and test them for weeks or months. If you're already producing a podcast and want to make it accessible to search, make sure you're part of the Google Podcasts ecosystem and are entering and updating the currently available meta data.
Other than having clean audio in a format Google can process, there is probably nothing specific you will have to do down the road to get that content transcribed. It may be worth thinking about how your audio content is structured.
Completely free-form content, while it certainly has a place, may be harder for Google to evaluate. Is the theme of your podcast and each episode evident? Is there a structure where a machine could potentially parse questions and answers. Are there concise takeaways – maybe a summary at the end of each episode?
Ultimately, audio SEO will mean treating our audio content in a more structured and deliberate way. The broader evolution of Google across many devices also means that we need to be more aware of what type of content best fits our audience's needs.
Is the searcher looking for text, video, or audio? Each modality fits a different need and a different device (or set of devices) in the broader search ecosystem.