Latent Terrain Synthesis

Building new musical instruments that compose and interact with AI audio generators.

Last modified 2025-11-03

Jasper Shuoyang Zheng

Welcome

Latent terrain is a tool to build corpus-based sound spaces/maps/materials to steer neural audio autoencoders/codecs (such as RAVE, Stable Audio Open (codec), Music2Latent). A terrain is a surface map for the autoencoder's latent space, taking coordinates in a control space as inputs, and producing continuous real-time latent trajectories that can be used for sound synthesis.

Latent terrain aims to open up the creative possibilities of latent space navigation, allowing one to adapt the latent space of an autoencoder to easy-to-navigate interfaces (such as gestural controllers, stylus and tablets, XY-pads, and more), explore it like walking on a terrain surface, and build new musical instruments that compose and interact with AI audio generators.

An example latent space walk with Music2Latent:

Example applications

Supported autoencoders

Latent terrain can work with any audio autoencoder as long as it offers latent variables. However, only a limited number of them have been implemented for MaxMSP, and we have only tested the following models:

We plan to test the following model in the future:

Get started

Get in touch

Hi, this is Shuoyang (Jasper). nn.terrain~ is part of my ongoing PhD work on Discovering Musical Affordances in Neural Audio Synthesis, supervised by Anna Xambó Sedó and Nick Bryan-Kinns, and part of the work has been (will be) on putting AI audio generators into the hands of composers/musicians.

Therefore, I would love to have you involved in it - if you have any feedback, a features request, a demo / a device / or anything made with nn.terrain, I would love to hear. If you would like to collaborate on anything, please leave a message in this feedback form.

Acknowledgements

Shuoyang Zheng, the author of this work, is supported by the UKRI Centre for Doctoral Training in Artificial Intelligence and Music [EP/S022694/1].