18  Latent Generative Models

Author

Mark Fuge

Published

November 11, 2025

In the previous chapters, we’ve seen powerful generative models like GANs, VAEs, and Flow Matching that can learn complex data distributions. However, many of these models face challenges when data lies on a complicated, low-dimensional manifold embedded in a high-dimensional space. For example, imagine trying to learn a distribution of images of faces. The “space of all possible faces” is a complex, twisted surface within the vast space of all possible pixel combinations.

A powerful strategy to tackle this is to combine representation learning with generative modeling. The idea is to first learn a “simpler” latent space that “unrolls” or “flattens” the data manifold. Then, we can train a generative model in this much simpler latent space, where tasks like density estimation or interpolation are significantly easier. Finally, we can map the generated latent points back to the original data space.

This is the core idea behind Latent Generative Models. In this notebook, we will explore this concept by:

  1. Learning a Latent Space: We’ll train a simple autoencoder to “unroll” the classic Swiss Roll dataset, a 2D spiral manifold.
  2. Training Generative Models: We will train two generative models: one directly on the complex Swiss Roll data and another on the simple latent representation. We will use GANs and Flow Matching, just to compare different models.
  3. Comparing the Results: We will visualize the generated samples from both approaches to see how learning in the latent space can lead to superior results.

18.1 Learning Objectives

  • Understand the motivation for latent generative models.
  • Implement a simple autoencoder to learn a latent representation of a manifold.
  • Train a Flow Matching and GAN model in both the original data space and the learned latent space.
  • Visualize and compare the quality of samples generated from both approaches.

18.2 Setup and Data Generation

We’ll import the Swiss Roll dataset and visualize it. We can see that the data lies on a complex 2D manifold embedded in 3D space.

Show Code
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.nn.utils.parametrizations import spectral_norm
import matplotlib.pyplot as plt
from sklearn.datasets import make_swiss_roll
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default = "notebook"

# for reproducibility
torch.manual_seed(42)
np.random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

# Pick device (GPU if available)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')

ambient_dim = 3
epochs_generative_model = 10000
epochs_latent_model = 20000
num_generated_samples = 2000

# Generate 3D Swiss Roll data
X_raw, color = make_swiss_roll(n_samples=num_generated_samples, noise=0.06)
# Standardize each dimension
X_raw = (X_raw - X_raw.mean(axis=0)) / X_raw.std(axis=0)
X = torch.tensor(X_raw, dtype=torch.float32, device=device)  # shape [N, 3]

print(f"Data shape (should be N x 3): {X.shape}")

# Interactive Plotly 3D scatter to verify swiss roll
X_np = X.detach().cpu().numpy()
fig = go.Figure()
fig.add_trace(go.Scatter3d(x=X_np[:, 0], y=X_np[:, 1], z=X_np[:, 2],
                           mode='markers',
                           marker=dict(size=3, color=color, colorscale='Viridis', opacity=0.8)))
fig.update_layout(title='Swiss Roll Dataset (3D)', height=600, width=600)
# Set a reasonable default camera angle
camera = dict(eye=dict(x=1.2, y=1.2, z=0.6))
fig.update_layout(scene_camera=camera)
fig.show()
Using device: cuda
Data shape (should be N x 3): torch.Size([2000, 3])

18.3 Initial Baselines: Generative Models in the Original Data Space

First we will train two generative models on the original data space: a GAN and a Flow Matching model to see how they do on this complex distribution. This will form a baseline for comparison when we introduce learning within a latent space. These are largely identical to the implementations we used in prior chapters, with minor adjustments to make them compatible with the Swiss Roll data.

18.3.1 Flow Matching Model

Show Code
class VelocityField(nn.Module):
    def __init__(self, data_dim=2, time_emb_dim=16, hidden_dim=64):
        super().__init__()
        self.data_dim = data_dim
        self.time_mlp = nn.Sequential(
            nn.Linear(1, time_emb_dim),
            nn.Softplus(),
            nn.Linear(time_emb_dim, time_emb_dim),
            nn.Softplus(),
            nn.Linear(time_emb_dim, time_emb_dim)
        )
        self.net = nn.Sequential(
            nn.Linear(data_dim + time_emb_dim, hidden_dim),
            nn.Softplus(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.Softplus(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.Softplus(),
            nn.Linear(hidden_dim, data_dim)
        )
    def forward(self, x, t):
        t_emb = self.time_mlp(t)
        x_in = torch.cat([x, t_emb], dim=-1)
        return self.net(x_in)
    
def sample_flow(model, data_dim, steps=50, n_samples=500):
    model.eval()
    dev = next(model.parameters()).device
    x = torch.randn(n_samples, data_dim, device=dev)
    trajectory = [x.detach().cpu().clone().numpy()]
    for i in range(steps):
        t = torch.full((n_samples, 1), i / steps, device=dev)
        with torch.no_grad():
            v = model(x, t)
        x = x + v / steps
        trajectory.append(x.detach().cpu().clone().numpy())
    return x.detach().cpu(), np.array(trajectory)

def train_flow(model, data, lr=1e-2, epochs=4000):
    model = model.to(data.device)
    opt = optim.AdamW(model.parameters(), lr=lr)
    losses = []
    for epoch in range(epochs):
        idx = torch.randint(0, len(data), (512,), device=data.device)
        x1 = data[idx]
        x0 = torch.randn_like(x1)
        t = torch.rand(len(x1), 1, device=data.device)
        x_t = (1 - t) * x0 + t * x1
        v_target = x1 - x0
        v_pred = model(x_t, t)
        loss = ((v_pred - v_target) ** 2).mean()
        opt.zero_grad()
        loss.backward()
        opt.step()
        if epoch % 1000 == 0:
            losses.append(loss.item())
            print(f"Flow Loss ({epoch}): {losses[-1]:.4f}")
    print(f"Final loss: {losses[-1]:.4f}")
    return model
# Train data-space flow model
print("Training data-space flow model (3D)...")
flow_data = train_flow(VelocityField(data_dim=ambient_dim).to(device), X,
                       epochs=epochs_generative_model)

# Generate samples
samples_flow_data, traj_data = sample_flow(flow_data, data_dim=ambient_dim, n_samples=num_generated_samples)
samples_flow_data_np = samples_flow_data.detach().cpu().numpy() if hasattr(samples_flow_data, 'detach') else np.array(samples_flow_data)
Training data-space flow model (3D)...
Flow Loss (0): 2.1931
Flow Loss (1000): 1.5469
Flow Loss (2000): 1.5517
Flow Loss (3000): 1.5562
Flow Loss (4000): 1.5008
Flow Loss (5000): 1.4315
Flow Loss (6000): 1.5096
Flow Loss (7000): 1.5889
Flow Loss (8000): 1.5116
Flow Loss (9000): 1.4904
Final loss: 1.4904

18.3.2 GAN Model

Show Code
# Simple MLP GAN components and training utilities
class MLPGen(nn.Module):
    def __init__(self, noise_dim: int, out_dim: int, hidden: int = 64):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(noise_dim, hidden),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden, hidden),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden, out_dim)
        )
    def forward(self, z):
        return self.net(z)

class MLPDisc(nn.Module):
    def __init__(self, in_dim: int, hidden: int = 64):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden, hidden),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden, 1)
        )
    def forward(self, x):
        return self.net(x).squeeze(-1)

def train_gan(data: torch.Tensor,
              data_dim: int,
              noise_dim: int = 8,
              epochs: int = 1200,
              batch_size: int = 512,
              lr: float = 1e-3,
              device: str = 'cpu'):
    """Train a tiny GAN on provided data tensor; returns (G,D)."""
    G = MLPGen(noise_dim, data_dim).to(device)
    D = MLPDisc(data_dim).to(device)
    optG = optim.AdamW(G.parameters(), lr=lr)
    optD = optim.AdamW(D.parameters(), lr=lr*3)
    bce = nn.BCEWithLogitsLoss()

    data = data.to(device)
    N = data.shape[0]

    for step in range(epochs):
        # Discriminator
        idx = torch.randint(0, N, (batch_size,), device=device)
        real = data[idx]
        z = torch.randn(batch_size, noise_dim, device=device)
        fake = G(z).detach()
        D_real = D(real)
        D_fake = D(fake)
        lossD = bce(D_real, torch.ones_like(D_real)) + bce(D_fake, torch.zeros_like(D_fake))
        optD.zero_grad(); lossD.backward(); optD.step()
        # Generator
        z = torch.randn(batch_size, noise_dim, device=device)
        gen = G(z)
        D_gen = D(gen)
        lossG = bce(D_gen, torch.ones_like(D_gen))
        optG.zero_grad(); lossG.backward(); optG.step()
        if (step + 1) % 1000 == 0:
            print(f"GAN step {step+1:4d}: D={lossD.item():.3f} | G={lossG.item():.3f}")
    return G, D

def sample_gan(G: nn.Module, n_samples: int = 2000, noise_dim: int = 8, device: str = 'cpu'):
    G.eval()
    with torch.no_grad():
        z = torch.randn(n_samples, noise_dim, device=device)
        x = G(z)
    return x.detach().cpu()
# Train GAN in original 3D data space
print("Training data-space GAN (3D)...")
G_data, D_data = train_gan(X, data_dim=ambient_dim, noise_dim=8, epochs=epochs_generative_model, device=device)

# Generate samples
samples_gan_data = sample_gan(G_data, n_samples=num_generated_samples, noise_dim=8, device=device)
samples_gan_data_np = samples_gan_data.detach().cpu().numpy() if hasattr(samples_gan_data, 'detach') else np.array(samples_gan_data)
Training data-space GAN (3D)...
GAN step 1000: D=1.191 | G=0.967
GAN step 2000: D=1.283 | G=0.817
GAN step 3000: D=1.305 | G=0.795
GAN step 4000: D=1.341 | G=0.805
GAN step 5000: D=1.352 | G=0.740
GAN step 6000: D=1.372 | G=0.690
GAN step 7000: D=1.332 | G=0.733
GAN step 8000: D=1.341 | G=0.779
GAN step 9000: D=1.339 | G=0.785
GAN step 10000: D=1.295 | G=0.762

18.3.3 Comparison of the Baseline Models in the Original Data Space

Now let’s compare these baseline models with the original data distribution, in 3D space.

Show Code
# Convert to numpy for Plotly
X_np = X.detach().cpu().numpy()

# Build interactive Plotly 3-panel figure
fig = make_subplots(rows=3, cols=1,
                    specs=[[{'type':'scene'}], [{'type':'scene'}], [{'type':'scene'}]],
                    subplot_titles=('Ground truth (3D)', 'Data-space flow (3D)', 'Data-space GAN (3D)'))

# Ground truth
fig.add_trace(go.Scatter3d(x=X_np[:,0], y=X_np[:,1], z=X_np[:,2],
                           mode='markers', marker=dict(size=3, color=color, colorscale='Viridis', opacity=0.8)),
              row=1, col=1)

# Data-space samples
fig.add_trace(go.Scatter3d(x=samples_flow_data_np[:,0], y=samples_flow_data_np[:,1], z=samples_flow_data_np[:,2],
                           mode='markers', marker=dict(size=3, color=samples_flow_data_np[:,0], colorscale='Viridis', opacity=0.8)),
              row=2, col=1)

# GAN Samples in Data Space
fig.add_trace(go.Scatter3d(x=samples_gan_data_np[:,0], y=samples_gan_data_np[:,1], z=samples_gan_data_np[:,2],
                           mode='markers', marker=dict(size=3, color=samples_gan_data_np[:,0], colorscale='Viridis', opacity=0.8)),
              row=3, col=1)

# set same camera for consistency
camera = dict(eye=dict(x=-0.2, y=2.2, z=0.6))
fig.update_scenes(camera=camera)
fig.update_layout(height=1800, width=600, showlegend=False, title_text='Comparing Data-Space Generative Models')
fig.show()

As we can see above, training the generative model directly in data space can perform OK in this simple example, although for higher-dimensional problems it may struggle. To demonstrate a different strategy, let’s first learn a low-dimensional compressed representation, and then train the generative models on that latent representation to see how it compares. These are generally referred to as Latent Generative Models.

18.4 Build a Least Volume Autoencoder to Learn the Latent Space

Our first goal is to learn a mapping from the complex Swiss Roll manifold to a simpler, “unrolled” latent space. As we saw in prior chapters, an autoencoder can be a good tool for this, although any dimension reduction method that allows for both compression and reconstruction could work. As we saw in the prior chapter on unsupervised Neural Network models, it consists of two parts:

  1. Encoder: A neural network that compresses the input data into a low-dimensional latent representation.
  2. Decoder: A neural network that reconstructs the original data from the latent representation.

By training the autoencoder to minimize the reconstruction error (the difference between the original and reconstructed data), the encoder learns to capture the most important features of the data in its latent space. Specifically, we will use a Least Volume Autoencoder, so that we can identify and select the lowest dimensional representation of the data. For the Swiss Roll, we can see from above that this should result in a latent space where the spiral structure is flattened out, in which case the generative models only has to learn to reconstruct a 2D distribution, rather than a 3D one.

Show Code
class _Combo(nn.Module):
    def forward(self, input):
        return self.model(input)

class LinearCombo(_Combo):
    def __init__(self, in_features, out_features, activation=nn.LeakyReLU(0.2)):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(in_features, out_features),
            activation
        )

class MLP(nn.Module):
    """Regular fully connected network generating features.

    Args:
        in_features: The number of input features.
        out_feature: The number of output features.
        layer_width: The widths of the hidden layers.
        combo: The layer combination to be stacked up.

    Shape:
        - Input: `(N, H_in)` where H_in = in_features.
        - Output: `(N, H_out)` where H_out = out_features.
    """
    def __init__(
        self, in_features: int, out_features:int, layer_width: list,
        combo = LinearCombo
        ):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.layer_width = list(layer_width)
        self.model = self._build_model(combo)

    def forward(self, input):
        return self.model(input)

    def _build_model(self, combo):
        model = nn.Sequential()
        idx = -1
        for idx, (in_ftr, out_ftr) in enumerate(self.layer_sizes[:-1]):
            model.add_module(str(idx), combo(in_ftr, out_ftr))
        model.add_module(str(idx+1), nn.Linear(*self.layer_sizes[-1])) # type:ignore
        return model

    @property
    def layer_sizes(self):
        return list(zip([self.in_features] + self.layer_width,
        self.layer_width + [self.out_features]))
    
class SNLinearCombo(_Combo):
    def __init__(self, in_features, out_features, activation=nn.LeakyReLU(0.2)):
        super().__init__()
        self.model = nn.Sequential(
            spectral_norm(nn.Linear(in_features, out_features)),
            activation
        )

class SNMLP(MLP):
    def __init__(
        self, in_features: int, out_features: int, layer_width: list,
        combo=SNLinearCombo):
        super().__init__(in_features, out_features, layer_width, combo)

    def _build_model(self, combo):
        model = nn.Sequential()
        idx = -1
        for idx, (in_ftr, out_ftr) in enumerate(self.layer_sizes[:-1]):
            model.add_module(str(idx), combo(in_ftr, out_ftr))
        model.add_module(str(idx+1), spectral_norm(nn.Linear(*self.layer_sizes[-1])))
        return model
    
width = ambient_dim * 16
# Note in particular the lack of the bottleneck choice below
# That is, we don't need to actually pick a bottleneck dimension -- LVA automatically determines this, like PCA
encoder = MLP(ambient_dim, ambient_dim, [width] * 4).to(device)
# Note also the change in the decoder to have spectral normalization
decoder = SNMLP(ambient_dim, ambient_dim, [width] * 4).to(device)

opt = torch.optim.AdamW(list(encoder.parameters()) + list(decoder.parameters()), lr=1e-3)
Show Code
η, λ = 0.01, 0.005

for i in range(epochs_latent_model):
    opt.zero_grad()
    z = encoder(X)
    rec_loss = F.mse_loss(decoder(z), X)
    # Note below the least volume loss
    vol_loss = torch.exp(torch.log(z.std(0) + η).mean())
    loss = rec_loss + λ * vol_loss
    loss.backward()
    opt.step()

    if (i+1) % 1000 == 0:
        # Print floats with 5 significant digits and fill epoch with leading spaces if under 4 digits
        print(f'Epoch {i:4}: rec = {rec_loss:.5g}, vol = {vol_loss:.5g}')
Epoch  999: rec = 0.041504, vol = 0.90292
Epoch 1999: rec = 0.026399, vol = 1.0715
Epoch 2999: rec = 0.019554, vol = 1.1883
Epoch 3999: rec = 0.015653, vol = 1.28
Epoch 4999: rec = 0.013543, vol = 1.3123
Epoch 5999: rec = 0.010756, vol = 1.3138
Epoch 6999: rec = 0.0091636, vol = 1.3596
Epoch 7999: rec = 0.0096072, vol = 1.3558
Epoch 8999: rec = 0.0073822, vol = 1.3531
Epoch 9999: rec = 0.0071178, vol = 1.3433
Epoch 10999: rec = 0.0066327, vol = 1.3494
Epoch 11999: rec = 0.0060909, vol = 1.34
Epoch 12999: rec = 0.0060377, vol = 1.3203
Epoch 13999: rec = 0.0056103, vol = 1.3167
Epoch 14999: rec = 0.0053546, vol = 1.3116
Epoch 15999: rec = 0.005258, vol = 1.3159
Epoch 16999: rec = 0.0050283, vol = 1.2932
Epoch 17999: rec = 0.0050594, vol = 1.2874
Epoch 18999: rec = 0.004842, vol = 1.2819
Epoch 19999: rec = 0.0050161, vol = 1.2694
Show Code
# Plotly version of AE reconstruction comparison (original vs reconstructed)
with torch.no_grad():
    X_np = X.cpu().numpy()
    X_rec = decoder(encoder(X)).cpu().numpy()

fig = go.Figure()
fig.add_trace(go.Scatter3d(x=X_np[:,0], y=X_np[:,1], z=X_np[:,2],
                           mode='markers', marker=dict(size=3, color='rgba(50,50,50,0.4)'),
                           name='Original'))
fig.add_trace(go.Scatter3d(x=X_rec[:,0], y=X_rec[:,1], z=X_rec[:,2],
                           mode='markers', marker=dict(size=3, color='red'),
                           name='Reconstructed'))
fig.update_layout(height=600, width=600,title='AE Reconstruction: Original vs Reconstructed')
fig.show()
Show Code
encoder.eval()
decoder.eval()

with torch.no_grad():
    z = encoder(X)

# Move latent to CPU and detach
z_cpu = z.cpu().detach()
# stable python integer indices for indexing
idx_cpu = z_cpu.std(0).argsort(descending=True).cpu().numpy().astype(int)
i0, i1, i2 = int(idx_cpu[0]), int(idx_cpu[1]), int(idx_cpu[2])

plt.scatter(z_cpu[:, i0].numpy(), z_cpu[:, i1].numpy(), s=10)
plt.gca().set_aspect('equal')
plt.xlabel('$z_0$')
plt.ylabel('$z_1$')
plt.show()

plt.scatter(z_cpu[:, i0].numpy(), z_cpu[:, i2].numpy(), s=10)
plt.gca().set_aspect('equal')
plt.xlabel('$z_0$')
plt.ylabel('$z_2$')
plt.show()

# Plot the latent STDs by magnitude in the sorted order:
stds = z_cpu.std(0).numpy()
ordered_stds = stds[idx_cpu]
plt.figure()
plt.bar(np.arange(stds.size), ordered_stds)
plt.title('latent STDs (autoencoder)')
plt.show()

As you can see, the autoencoder has unrolled the spiral structure of the Swiss Roll into a 2D distribution in the latent space. This is the “unrolled” manifold. Let’s see if learning a generative model on this simple distribution is easier for the two generative models, and then we can use the decoder to project those generated points back into 3D space.

18.5 Retrain the Generative Models on the Latent Space

Now we will repeat our two generative models from above, except this time we will train them on the latent space learned by the autoencoder (i.e., Z = encoder(X)). We will keep the same architectures and training procedures for the generative models, but the key difference is just the input data, which now lives in a 2D space.

##### Latent Flow Matching #####
# Train latent-space flow model using LVA encoder (top-k dims)
with torch.no_grad():
    Z_full = encoder(X)                 # shape [N, ambient_dim]
    latent_std = Z_full.std(0)
    top_k = 3                           # choose latent dimensionality to model
    latent_idx = torch.argsort(latent_std, descending=True)[:top_k]
    Z = Z_full[:, latent_idx]           # shape [N, top_k]

print(f"Training latent-space flow model (LVA top-{top_k} latent)...")
flow_latent = train_flow(VelocityField(data_dim=top_k).to(device), Z,
                         epochs=epochs_generative_model)

# Use the same top-k (and indices) selected during training for latent flow
samples_latent_flow, traj_latent = sample_flow(flow_latent, data_dim=top_k, n_samples=num_generated_samples)
with torch.no_grad():
    # Pad latent samples back to ambient_dim with zeros in the non-selected dims
    full_latent_flow = torch.zeros(samples_latent_flow.shape[0], ambient_dim, device=device)
    full_latent_flow[:, latent_idx] = samples_latent_flow.to(device)
    decoded_latent = decoder(full_latent_flow).detach().cpu()
print("Latent Flow sampling complete.")

print(f"Training latent-space GAN (top-{top_k} LVA dims)...")
G_latent, D_latent = train_gan(Z, data_dim=top_k, noise_dim=8, epochs=epochs_generative_model, device=device)

# Sample from both GANs
samples_latent_gan = sample_gan(G_latent, n_samples=num_generated_samples, noise_dim=8, device=device)

# Decode latent GAN samples back to 3D
with torch.no_grad():
    full_latent_gan = torch.zeros(samples_latent_gan.shape[0], ambient_dim, device=device)
    full_latent_gan[:, latent_idx] = samples_latent_gan.to(device)
    decoded_gan_latent = decoder(full_latent_gan).detach().cpu()

print("Latent GAN sampling complete.")
Training latent-space flow model (LVA top-3 latent)...
Flow Loss (0): 134.2461
Flow Loss (1000): 14.5121
Flow Loss (2000): 16.1711
Flow Loss (3000): 13.6451
Flow Loss (4000): 14.6157
Flow Loss (5000): 12.9395
Flow Loss (6000): 10.8772
Flow Loss (7000): 16.4385
Flow Loss (8000): 15.1911
Flow Loss (9000): 12.5841
Final loss: 12.5841
Latent Flow sampling complete.
Training latent-space GAN (top-3 LVA dims)...
GAN step 1000: D=1.067 | G=0.994
GAN step 2000: D=1.104 | G=0.885
GAN step 3000: D=1.140 | G=0.879
GAN step 4000: D=1.440 | G=0.645
GAN step 5000: D=1.300 | G=0.675
GAN step 6000: D=1.252 | G=0.813
GAN step 7000: D=1.295 | G=0.842
GAN step 8000: D=1.330 | G=0.839
GAN step 9000: D=1.420 | G=0.791
GAN step 10000: D=1.360 | G=0.844
Latent GAN sampling complete.

18.6 Sampling and Decoding to Compare to Original Data

Unlike the previous baseline models trained directly in data space, we now need to decode the generated latent samples back into the original 3D space for comparison. To do this, after generating samples in the latent space (2D), we must pass them through the decoder to map them back into the original 3D data space.

Below we will first plot how well the generated models do in the latent space itself, and then we will decode those samples back to 3D and compare to the original data and the data-space generative model samples.

Show Code
# Matplotlib comparison of latent-space generative models in 2D where we compare the encoded top-2 LVA latent dimensions with those generated by the latent flow and latent GAN.
fig = plt.figure(figsize=(15,5))
ax1 = fig.add_subplot(1, 3, 1)
ax1.scatter(z_cpu[:, i0].numpy(), z_cpu[:, i1].numpy(), s=10, alpha=0.5)
ax1.set_title('Encoded LVA Top-2 Latent Dimensions')
ax2 = fig.add_subplot(1, 3, 2)
ax2.scatter(samples_latent_flow[:, 0].numpy(), samples_latent_flow[:,
    1].numpy(), s=10, alpha=0.5, color='orange')
ax2.set_title('Latent Flow Generated Top-2 Latent Dimensions')
ax3 = fig.add_subplot(1, 3, 3)
ax3.scatter(samples_latent_gan[:, 0].numpy(), samples_latent_gan[:,
    1].numpy(), s=10, alpha=0.5, color='green')
ax3.set_title('Latent GAN Generated Top-2 Latent Dimensions')
plt.show()

Show Code
# Convert to numpy for Plotly
X_np = X.detach().cpu().numpy()
#samples_data_np = samples_data.detach().cpu().numpy() if hasattr(samples_data, 'detach') else np.array(samples_data)
decoded_latent_np = decoded_latent.detach().cpu().numpy() if hasattr(decoded_latent, 'detach') else np.array(decoded_latent)

# Build interactive Plotly 6-panel figure
fig = make_subplots(rows=3, cols=2,
                    specs=[[{'type':'scene'},{'type':'scene'}], [{'type':'scene'},{'type':'scene'}], [{'type':'scene'},{'type':'scene'}]],
                    subplot_titles=('Ground truth (3D)', 'Autoencoder Reconstruction', 'Data-space flow (3D)', 'Data-space GAN (2D)', 'Latent Flow → decoded (3D)','Latent GAN → decoded (3D)'))

# Ground truth
fig.add_trace(go.Scatter3d(x=X_np[:,0], y=X_np[:,1], z=X_np[:,2],
                           mode='markers', marker=dict(size=3, color=color, colorscale='Viridis', opacity=0.8)),
              row=1, col=1)

# AE Reconstruction
fig.add_trace(go.Scatter3d(x=X_np[:,0], y=X_np[:,1], z=X_np[:,2],
                           mode='markers', marker=dict(size=3, color='rgba(50,50,50,0.4)'),
                           name='Original'),
                           row=1,col=2)
fig.add_trace(go.Scatter3d(x=X_rec[:,0], y=X_rec[:,1], z=X_rec[:,2],
                           mode='markers', marker=dict(size=3, color='red'),
                           name='Reconstructed'),
                           row=1,col=2)
#fig.update_layout(title='AE Reconstruction: Original vs Reconstructed (Plotly)', height=600)

# Data-space samples
fig.add_trace(go.Scatter3d(x=samples_flow_data_np[:,0], y=samples_flow_data_np[:,1], z=samples_flow_data_np[:,2],
                           mode='markers', marker=dict(size=3, color=samples_flow_data_np[:,0], colorscale='Viridis', opacity=0.8)),
              row=2, col=1)

# GAN Samples in Data Space
fig.add_trace(go.Scatter3d(x=samples_gan_data_np[:,0], y=samples_gan_data_np[:,1], z=samples_gan_data_np[:,2],
                           mode='markers', marker=dict(size=3, color=samples_gan_data_np[:,0], colorscale='Viridis', opacity=0.8)),
              row=2, col=2)

# Decoded latent flow samples (color by first latent coordinate)
latent_color_flow = samples_latent_flow.numpy()[:, 0] if hasattr(samples_latent_flow, 'numpy') and samples_latent_flow.ndim==2 else None
fig.add_trace(go.Scatter3d(x=decoded_latent_np[:,0], y=decoded_latent_np[:,1], z=decoded_latent_np[:,2],
                           mode='markers', marker=dict(size=3, color=latent_color_flow, colorscale='Viridis', opacity=0.8)),
              row=3, col=1)

# Decoded latent GAN samples (color by first latent coordinate)
latent_color_gan = samples_latent_gan.detach().cpu().numpy()[:,0] if hasattr(samples_latent_gan, 'detach') and samples_latent_gan.ndim==2 else None
decoded_gan_latent_np = decoded_gan_latent.detach().cpu().numpy() if hasattr(decoded_gan_latent, 'detach') else np.array(decoded_gan_latent)
fig.add_trace(go.Scatter3d(x=decoded_gan_latent_np[:,0], y=decoded_gan_latent_np[:,1], z=decoded_gan_latent_np[:,2],
                           mode='markers', marker=dict(size=3, color=latent_color_gan, colorscale='Viridis', opacity=0.8)),
              row=3, col=2)

# set same camera for consistency
camera = dict(eye=dict(x=1.2, y=1.2, z=0.6))
fig.update_scenes(camera=camera)
fig.update_layout(height=1800, width=1200, showlegend=False, title_text='Comparing Data-Space vs Latent-Space Flow (interactive)')

fig.show()

18.7 Summary

This notebook demonstrated the concept of Latent Generative Models by combining representation learning with generative modeling. By first learning a low-dimensional latent space that “unrolls” the complex Swiss Roll manifold, we were able to train generative models that only need to sample points in a lower-dimensional space, which we hope is simpler. In this particular example, given that the original 3D distribution was fairly straightforward and not too difficult, both Flow Matching and GANs did fine, and we perhaps did not need the additional compression offered by the Autoencoder. However, this technique of Latent Generative Models is typically deployed on examples that have much higher complexity and dimensionality, such as images or 3D shapes. For example, commonly used latent generative models include VAE-GANs or Latent Diffusion Models, which have shown great success in generating high-quality images or 3D data. We also see similar approaches used in robotics and control, such as in model-based reinforcement learning, where learning a latent representation of the environment dynamics can greatly simplify planning and control tasks (e.g., PlaNet, Dreamer).

18.8 Review of Generative Models Covered So Far and Next Steps

This chapter concludes our exploration of generative models in this textbook so far, with the exception of auto-regressive models (e.g., transformers) which we will cover later. However, now is a good time to briefly recap what we’ve discussed so far and to compare their strengths and weaknesses or areas of common application. Specifically, we have covered:

  • Generative Adversarial Networks (GANs): Powerful for generating high-quality samples, especially in image domains. However, they can be difficult to train and may suffer from mode collapse.
  • Variational Autoencoders (VAEs): Provide a principled way to learn latent representations and generate samples. They are generally more stable to train than GANs but historically were known to produce blurrier samples, however this is not always the case in modern implementations. Importantly, they provide a fully probabilistic framework that allows us to compute posterior distributions over latent variables as well as data likelihoods.
  • Normalizing Flows: Allow for exact likelihood computation and efficient sampling through the use of invertible transformations and the change of variables formula. They are flexible but can be computationally intensive for many layers, and their main downside is that they require careful architecture design to maintain invertibility. Like VAEs, they also provide a fully probabilistic framework without requiring the variational approximations used in VAEs.
  • Continuous Normalizing Flows (CNFs) and Flow Matching: These offer a continuous-time perspective on flows, enabling more flexible transformations that do not require chaining together many discrete transformations and log determinant terms. They can be more efficient in certain scenarios but can also be complex to implement (e.g., CNFs require differentiable flow solvers and may not work well in high dimensions). Flow Matching simplifies some of these issues by avoiding the need for backpropagating through an ODE solver.
  • Diffusion Models: These model the data generation process as a Stochastic Differential Equation (SDE) or a discrete-time process that gradually denoises data from pure noise to a data sample via Score Matching. Are know for their ability to generate high-quality samples, particularly in image generation tasks. They are generally more robust than GANs but can be much slower to sample from due to the iterative denoising process needed to generate samples. In contrast, direct push-forward models like GANs or VAEs can generate samples significantly faster.
  • Latent Generative Models: These combine representation learning with generative modeling to often handle complex high dimensional data distributions more effectively. They are particularly useful when data lies on low-dimensional manifolds within high-dimensional spaces that can be “unrolled” easily into a simpler latent space.

To put some of these models into an engineering context, the next chapter will demonstrate or explore common applications of generative models in practical engineering settings, such as Inverse Design or Surrogate modeling. After that, we will address the last major class of generative models: Auto-Regressive Models, which have become extremely popular in recent years, especially in the context of large language models (LLMs) and transformers, and generally for data expressed as sequences.