In a move to capture the attention of professional musicians, creators and advertisers, Meta has launched a new AI code called AudioCraft, which combines three separate AI sound-creation models that enable users to create audio content solely based on generative text prompts.
Each AI model included focuses on a different area of sound creation.
MusicGen, which became available as a public demo in June, generates music through text inputs. AudioGen creates sound effects -- such as laughing, barking dogs, footsteps etc. -- from written prompts. Lastly, Meta’s EnCodec decoder invites users to create higher quality music generation with less manipulation.
“Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument,” Meta wrote in a blog post. “Or an indie game developer populating virtual worlds with realistic sound effects and ambient noise on a shoestring budget. Or a small business owner adding a soundtrack to their latest Instagram post with ease.”
Meta adds that because they are open-sourced, the models are also intended for researchers and practitioners to train their own models with their own datasets in an effort to “help advance the state of the art.”
More specifically, Meta acknowledges that its models are trained on datasets that “lack diversity,” stating that “the music dataset used contains a larger portion of Western-style music and only contains audio-text pairs with text and metadata written in English.”
“By sharing the code for AudioCraft, we hope other researchers can more easily test new approaches to limit or eliminate potential bias in and misuse of generative models,” the company adds.
Although record labels and music artists have publicly protested against generative AI models due to copyright infringement issues, Meta believes its new product could become “a new type of instrument” and influence music creation in the vein of what synthesizers did for electronic music.
Meta says that it developed AudioCraft partly due to a lack of excitement around the use of generative AI for audio, which it believes “has seemed to lag a bit behind.”
Still, merging music and AI has been popular over the past year, with tech giants like Google releasing MusicLM this past winter. There is also Riffusion, which composes music by visualizing it; Boomy, or OpenAI’s Jukebox, which has caused controversy for covering the voices of famous artists.