MusicLM is Google’s new artificial intelligence (AI) system built to generate music from any genre from simple text descriptions, such as “song played at the end of a sad movie” or “arcade game sounds.”
Here’s the catch: Google says it has no immediate plans to release the technology, citing potential risks.
There are plenty of music generator AI programs on the market such as Riffusion, which composes music by visualizing it; Boomy, which has already been used to create millions of original songs; or OpenAI’s Jukebox, which has already caused conflict in the music industry due to its ability to rewrite existing music and deepfake-style covers in the voice of famous artists.
What excites tech folks about MusicLM in particular is its potential ability to produce songs that are more complex in composition and sound, which other programs have not yet been able to accomplish.
In an academic paper, MusicLM is said to be able to generate high-fidelity music from specific detailed descriptions like “a calming violin melody backed by a distorted guitar riff.”
MusicLM “generates music at 24 kHz that remains consistent over several minutes,” the paper reads. “Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description.”
Furthermore, it has been shown that MusicLM can be conditioned on both text and a melody. In other words, it can transform whistled and hummed melodies according to the style described in a text caption.
The system was trained on a dataset of 280,000 hours of music to learn to generate coherent songs.
Even though the samples often include unwanted distortion, with lyrics remaining basic or difficult to understand, it is noteworthy how high-quality MusicLM’s samples actually sound compared to other AI music generators. Here is a list of examples.
However, unlike OpenAI, which released its ChatGPT system without taking into account the dire effects it would have on early education, for example, Google has noted ethical challenges posed by such technology, including the incorporation of copyrighted material.
This concern is based on the fact that about 1% of the music MusicLM generated was directly replicated from the songs it was trained on.
“We acknowledge the risk of potential misappropriation of creative content associated to the use case,” the co-authors of the paper wrote. “We strongly emphasize the need for more future work in tackling these risks associated to music generation.”