Audiocraft AI Music Generation – AI Research From The Lab

In Research Briefs

By Diesel Laws

August 2023

Table of contents

Print to PDF

Audiocraft AI Music Generation: Has AI generated music arrived?

Overview

Our team has recently ventured into the exciting domain of generative Al music technology, specifically Audiocraft (Meta): A Generative Al Tool For Audio and Music. AudioCraft lets you generate high-quality audio and music from text.

While the focus has predominantly been on image, language and video generation in the Al community, the domain of music generation is now gaining traction, especially with impressive offerings such as this.

Key Concepts

AudioCraft consists of three models: MusicGen, AudioGen and EnCodec. This paper focuses on MusicGen.

MusicGen is a single Language Model that operates over several streams of compressed discrete music representation.

It introduces a novel approach called Token Interleaving Patterns, which helps model the multiple parallel audio streams efficiently, resulting in high-quality and coherent music generation.

Deep Dive

Model Variations: AudioCraft offers multiple models ranging from Melody – accepting both text and melody – to large-scale transformer decoders (3.3B).

Input Flexibility: The ‘melody’ model uniquely allows reference audio, combining text prompts with derived melodies to produce customised outputs.

Sound Mechanics: AudioCraft’s unified system excels in both sound compression and generation. Its adaptability ensures scalability for further sound enhancements or compression techniques.

Setup & Testing

The deployment of AudioCraft on Google Colab was seamless, requiring 4 lines of code. Via a series of generated tests, we observed amazing results:

Effective handling of diverse genres using varying prompts
Fantastic audio quality at both default and enhanced settings
High relevance and accuracy in the output, staying true to the keywords

Our tests showcased the tool’s prowess: For instance, a prompt for a Bossa Nova inspired tune was met with an output that perfectly encapsulated the melody, rhythm, composition, and essence of the genre.

Hitting The Limits

AudioCraft’s aptitude in music generation isn’t without its limitations.

Duration Restraints: The tool’s default limit is set to a 30-second audio generation, potentially limiting extended compositions.

Processing Time: Depending on settings and prompt complexity, generation can take 5 to 10 minutes, an aspect users might find less than optimal.

Output Constraints: The primary output is an mp4 merged single track file. While WAV outputs are possible, they’re not default. Additionally, the system doesn’t offer separated instrumental tracks inherently, posing challenges for users wanting more granular control.

Continuity Hurdles: Seamlessly extending a generated track requires added steps, making it less straightforward for users. Moreover, while AudioCraft is generally adept at keyword interpretation, nuanced musical expectations might occasionally be unmet.

What’s Next?

We will continue to push the models further and see if we can generate separate instrumental tracks and wav files, which would enable full control over the mix and a higher quality end result for each track.

What’s the verdict?

Audiocraft, though in its research stage, is a testament to the growing potential of generative Al music technology.

The quality, relevance to the prompts, and musical finesse of the generated tracks are astounding. We expect fast advancements over the next few months.

Thanks for checking out our business articles. If you want to learn more, feel free to reach out to Red Marble AI. You can click on the "Let's Talk" button on our website or email Diesel, our AI expert at d.laws@redmarble.ai.

We appreciate your interest and look forward to sharing more with you!

Let’s Talk