In my recent post, Using Generative AI to produce Spotify Clips I listed a bunch of challenges but realized I missed a big one. While listening to IJ by Sam Gendel, I realized, what do you do about instrumental songs, solos, or songs with few lyrics?

Some sort of audio-to-video technique?

Maybe using metadata like title, genre, beats-per-minute (BPM), spectrogram, and musical notes would work?

Or generating an image and animating it or its background to add depth similar to how in the VideoPoet blog post, the image of the Mona Lisa is animated.

It would be a fun experiment.

