
AI search, and synthetic voices in ads and other
creative content will become the future of advertising, but OpenAI has revealed some oddities in ChatGPT-40 technology such as the ability to clone a voice and finish thoughts and sentences.
OpenAI last week published a report detailing "key areas of risk" for the company's latest large language model, GPT-4.0, and how its executives hope to mitigate them. Many of the concerns have
been addressed, but they could have promoted deepfakes, as well as copyright infringement and licensing deals.
“Voice generation can also occur in non-adversarial situations, such as our
use of that ability to generate voices for ChatGPT’s Advanced Voice Mode,” OpenAI wrote in a report. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the
user’s voice.”
advertisement
advertisement
The technology also has the ability to imitate voices, but "nonverbal vocalizations" like sound effects such as erotic moans, violent screams, and gunshots. Certain
text-based filters were updated to filter, detect and block audio containing music. The limited alpha of ChatGPT’s Advanced Voice Mode was instructed to not sing.
The audio clip OpenAI
shared in the blog post demonstrates how the technology can continue the sentence in the same voice it began with after the word “no.”
It’s an eerie example of how
advertisers and creators can change the direction of content even without the help of the original designer or author.
Look back to the fiasco in July when Elon Musk shared a video without
identifying it as a parody. Christopher Kohls, a YouTuber better known online as Mr. Reagan, created the parody that spoofed Vice President Kamala Harris. Kohls posted it to his channel on
YouTube and Musk shared it without noting it was a parody.
Ironically, when asked to cite the most advanced GPT model available today, Google Gemini states "GPT-40 is considered the most
advanced GPT model available."
Voice generation at OpenAI can occur in other situations, such as the ability to generate voices for ChatGPT’s advanced voice mode. During testing, OpenAI
notices in rare instances where the model would unintentionally generate an output that copied or emulated the user’s voice.
OpenAI developers addressed voice generation related-risks by
allowing only the preset voices they created in collaboration with voice actors to be used. Selected voices
were used to post-train the audio model.
Standalone output classifiers were built to detect if the GPT-4o output uses a voice that’s different from OpenAI’s approved list. They
streamed it during audio generation and blocked the output if the speaker didn’t match the chosen predetermined voice.
Toward the end of the report, OpenAI notes the risk of
unintentional voice replication remains "minimal."
“Our system currently catches 100% of meaningful deviations from the system voice based on our internal evaluations, which includes
samples generated by other system voices, clips during which the model used a voice from the prompt as part of its completion, and an assortment of human samples,” the company wrote in a
post.