Skip to main content

Picturing music

After discovering the joy, and should I say magic of making music with the help of an A.I. I also wanted to understand it better. I've always been curious of what makes things work as they do and this wasn't any different.

While I haven't formally studied artificial intelligence beyond the basic courses in university I've always kept myself up to date with the latest development in the field and understand the principles of it at least in elementary level. But with my background knowledge I wasn't able to come up with the theory behind making something so abstract as music. For a moment I was geeling hopeful that there is some emotional we have finally managed to teach to the machine.

But just like the generative models spitting our wise sounding words, yet still being just statistical models adhering to the wisdom of humankind and imitating it the "creations" of the music making models are as mechanical as anything. I would say even more so as they don't even really generate music.

How the text-to-music models work is that they are actually just an image generation models that are trained on visualized waveforms of different sounds, in this case music. Then those waveforms are just transformed back into sounds.

Sometimes you just should keep the magic alive and not peek behind the curtain.