Overview

What is neural audio autoencoder and why using latent terrain?

Last modified 2025-08-18

Jasper Shuoyang Zheng

A neural audio autoencoder (such as RAVE) is an AI audio generation tool, it has two components: an encoder and a decoder.

Trajectories
Audio sample, latent trajectories, and latent terrain.

It allows you to navigate latent space of the generative AI like walking on a terrain surface, tailoring the latent space to a low-dimensional (e.g., a 2D plane) control space.

And this terrain surface is nonlinear (i.e., able to produce complex sequential patterns), continuous (i.e., allows for smooth interpolations), and tailorable (i.e., DIY your own materials with interactive machine learning).

An example latent space walk with Music2Latent: