← ML Research Wiki / 2506.17182

Variational Learning of Disentangled Representations

(2025)

Paper Information

arXiv ID

2506.17182

Contents

Abstract
Methods
Datasets
Results
Related Work
External Resources

Abstract

Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific.This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects.While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions.Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and conditionspecific factors.DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel reconstructions that ensure both representations remain informative; and (iii) a novel max-min objective that encourages clean separation without relying on handcrafted priors, while making only minimal assumptions.Theoretically, we show that this objective maximizes data likelihood while promoting disentanglement, and that it admits a unique equilibrium.Empirically, we demonstrate that DIS-CoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data.Together, these results establish DISCoVeR as a principled approach for learning disentangled representations in multi-condition settings.1. Latent variable conditional independence: Given the condition y, the latent representations z and w are conditionally independent: z ⊥ w | y.Sufficiency of the shared latent representation:The input x is conditionally independent of the condition y given w: x ⊥ y | w.However, note that in this model, z and w are no longer independent if conditioned also on x, that is, z ̸ ⊥ w | x, y.Target posterior structureIn our model, the observed data x is generated from two latent variables, z and w.The marginal distribution of z is independent of the condition label y, making z condition-invariant, whereas w depends on y and is therefore condition-aware.Our goal is to learn probabilistic data representations such that the marginal distributions of z and w preserve the structure z ⊥ y and w ̸ ⊥ y, and therefore yield disentangled representations.For this, we approximate the joint posterior p z,w|x,y via variational inference, enforcing that z remains independent of y, while w retains dependence on y.Our approach follows the empirical Bayes paradigm[Robbins, 1956], and in particular aligns with the principles of f -modeling introduced by Efron [2014], where the prior structure is inferred from the data to ensure consistency with observed marginals.

Summary

This paper introduces DISCoVeR, a novel variational framework for learning disentangled representations that effectively separates condition-invariant and condition-specific factors, particularly in the context of biomedical data analysis. The method incorporates a dual-latent architecture, parallel reconstructions, and a unique max-min objective to enhance disentanglement without relying on strict priors. The authors demonstrate that DISCoVeR outperforms existing methods in disentangling shared and condition-specific structures across various datasets, including synthetic data, natural images such as MNIST and CelebA, and single-cell RNA-seq data.

Methods

This paper employs the following methods:

Variational Autoencoder (VAE)
Max-Min Optimization

Models Used

DISCoVeR

Datasets

The following datasets were used in this research:

MNIST
CelebA
['None specified']

Evaluation Metrics

Negative Log-Likelihood (NLL)
Root Mean Squared Error (RMSE)
Mutual Information (I(z; w))

Results

Improved disentanglement on synthetic datasets
Enhanced reconstruction quality on MNIST and CelebA datasets
Minimal information leakage between latent variables

Technical Requirements

Number of GPUs: 1
GPU Type: H100
Compute Requirements: None specified

Papers Using Similar Methods

Generating Sentences from a Continuous Space (2015)
3D MRI brain tumor segmentation using autoencoder regularization (2018)
Ladder Variational Autoencoders (2016)
A Neural Representation of Sketch Drawings (2017)
Understanding disentangling in β-VAE (2018)

External Resources

References: 75

Variational Learning of Disentangled Representations

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Technical Requirements edit

Related Papers