[PDF]

Evaluating Deep Audio Embeddings as Fitness Function for Genetic Music Generation


Samuel Berry

08/05/2025

Supervised by Crispin Cooper; Moderated by Carolina Fuentes Toro

This study evaluates the effectiveness of deep audio embeddings as a fitness function for evolutionary music generation, specifically within the domain of ambient music. Using deep audio embedding similarity as the sole fitness metric, I implemented a graph-based genetic algorithm that evolves probabilistic MIDI generators to produce audio similar to a reference track. The research addresses whether such embeddings can guide evolutionary search without relying on explicit music-theoretic rules or human feedback Results from multiple evolutionary runs demonstrate that the embedding-based fitness function provides an optimisable gradient, with consistent improvements in mean and maximum fitness across generations and characteristic diversity patterns. Qualitative analysis reveals that while high-fitness individuals successfully capture timbral and textural elements of the reference, they often lack musical coherence, particularly in harmonic relationships. My findings suggest that deep audio embeddings offer a promising but incomplete solution for automated fitness evaluation in generative music systems. I conclude that combining embeddings with lightweight musical constraints would likely yield more aesthetically satisfying results while maintaining genre flexibility. This research contributes to understanding how learned representations can guide creative evolutionary systems and highlights the gap between signal similarity and musical quality.


Initial Plan (03/02/2025) [Zip Archive]

Final Report (08/05/2025) [Zip Archive]

Publication Form