How OpenAI Got 'Her' Voice

Scarlett Johansson has threatened to sue OpenAI for using a likeness of her voice to create voice app, "Sky," after the company approached the actress.

Days later Johansson rejected the proposal, OpenAI released GPT-4o with a replica voice, demonstrating it at a live and online event.

Observers found OpenAI's assistant's voice was similar to the voice of "Samantha," the character Johansson voiced in the 2013 sci-fi film "Her."

OpenAI immediately paused its AI voice assistant.

Lawyers for Johansson’s legal team are now demanding that OpenAI disclose how it developed an AI personal assistant voice the actress says sounds similar to her own. 

OpenAI said it worked with industry-leading casting and directing professionals to narrow down more than 400 submissions before selecting five voices with specific characteristics.



It also plans to give ChatGPT Plus users access to a new Voice Mode for GPT-4o in coming weeks.

GPT-4o handles interruptions smoothly, manages group conversations effectively, filters out background noise, and adapts to tone.

Some of the characteristics included: actors from diverse backgrounds or who could speak multiple languages, a voice that feels timeless, an approachable voice that inspires trust, a warm, engaging, confidence-inspiring with charismatic voice with rich tones, and a voice that sounds natural and is easy to hear.

OpenAI's Altman has said the 2013 film “Her” is his favorite movie. In a post on X, and after the incident on Sunday, OpenAI said the voice would be halted as it addresses "questions about how we chose the voices in ChatGPT." 

The X post links to a post on the company's website detailing how OpenAI developers created the voice.

In April, the company announced work on a voice engine with many possibilities for advertising, marketing, and other media. At the time it used text input and one 15-second audio sample to generate natural-sounding speech that closely resembles the original.

OpenAI first developed the voice model in 2022, and has used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud

Others have moved into voice content creation. In April, CNET Co-Founder Shelby Bonniesat down with MediaPost to talk about the creation of a video-production service that uses AI to turn a basic video shot on a mobile phone into engaging content. The system uses a variety of voices.

More than 100 voices are available to create content -- all performed by paid actors to enhance the content. The user writes a script and then selects a voice, and the technology does the rest.

Next story loading loading..