This project aims to generate a single image for each input acoustic music audio file using deep learning techniques. The student's role will involve: - Creating a dataset of acoustic music audio files to train the AI model on - Experimenting with different audio captioning models to generate text descriptions from the audio - Testing text-to-image generation models to convert the text descriptions into images - Comparing the performance of various model combinations - Constructing an end-to-end pipeline that takes in an acoustic music clip and outputs a corresponding generated image
The focus is specifically on acoustic music rather than speech or other audio types. The goal is to find a suitable combination of AI models that can accurately capture the mood, instruments, genre and other characteristics of a music clip and render them in an automatically generated image. CopyRetry