Latent Terrain Synthesis

Building new musical instruments that compose and interact with AI audio generators.

Last modified 2025-08-18

Jasper Shuoyang Zheng

Welcome

Latent terrain is a coordinates-to-latents mapping model for neural audio autoencoders (such as RAVE, Music2Latent). A terrain is a surface map for the autoencoder's latent space, taking coordinates in a control space as inputs, and producing continuous latent vectors in real-time.

Latent terrain aims to open up the creative possibilities of latent space navigation, allowing one to adapt an autoencoder to easier-to-navigate interfaces (such as gestural controllers, stylus and tablets, XY-pads, and more), and build new musical instruments that compose and interact with AI audio generators.

An example latent space walk with Music2Latent:

Example applications

Supported autoencoders

Latent terrain can work with any audio autoencoder as long as it offers latent variables. However, only a limited number of them have been implemented for MaxMSP, and we have only tested the following models:

We plan to test the following model in the future:

Get started

Get in touch

Hi, this is Shuoyang (Jasper). nn.terrain~ is part of my ongoing PhD work on Discovering Musical Affordances in Neural Audio Synthesis, supervised by Anna Xambó Sedó and Nick Bryan-Kinns, and part of the work has been (will be) on putting AI audio generators into the hands of composers/musicians.

Therefore, I would love to have you involved in it - if you have any feedback, a features request, a demo / a device / or anything made with nn.terrain, I would love to hear. If you would like to collaborate on anything, please leave a message in this feedback form.

Acknowledgements

Shuoyang Zheng, the author of this work, is supported by the UKRI Centre for Doctoral Training in Artificial Intelligence and Music [EP/S022694/1].