← ML Research Wiki / 1611.07004

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola [email protected] (BAIR) Laboratory University of California Berkeley, Jun-Yan Zhu [email protected] (BAIR) Laboratory University of California Berkeley, Tinghui Zhou [email protected] (BAIR) Laboratory University of California Berkeley, Alexei A Efros [email protected] (BAIR) Laboratory University of California Berkeley, Berkeley Ai Research (BAIR) Laboratory University of California Berkeley (2016)

Paper Information
arXiv ID
Venue
Computer Vision and Pattern Recognition
Domain
computer vision, machine learning
SOTA Claim
Yes
Code
Reproducibility
8/10

Abstract

Abstract not available.

Summary

This paper investigates image-to-image translation using Conditional Adversarial Networks (cGANs). The authors highlight the limitations of traditional Convolutional Neural Networks (CNNs) in generating sharp images when minimizing Euclidean loss and propose a framework leveraging GANs to automatically learn appropriate loss functions that encourage realism in outputs. They discuss their contributions in demonstrating the effectiveness of cGANs across various tasks and offer a simplified framework while exploring architectural choices. The method's robustness is validated through experiments on different datasets including Cityscapes and ImageNet, showing promising results for generating realistic images from various types of input images like semantic labels and sketches. They also emphasize the need for perceptual evaluation of generated images and report qualitative and quantitative results from their experiments, concluding that cGANs provide a versatile solution for many image-to-image translation tasks.

Methods

This paper employs the following methods:

  • Convolutional Neural Networks
  • Generative Adversarial Networks
  • Conditional Generative Adversarial Networks

Models Used

  • Conditional GANs
  • U-Net
  • PatchGAN

Datasets

The following datasets were used in this research:

  • Cityscapes
  • ImageNet
  • CMP Facades
  • Google Maps
  • UT Zappos50K

Evaluation Metrics

  • FCN-score
  • AMT perceptual studies

Results

  • Conditional GANs produce reasonable results on a wide variety of image-to-image translation problems.
  • U-Net architecture with skip connections improves image generation quality compared to standard encoder-decoder models.
  • The PatchGAN discriminator is effective in producing high-quality images for local structures.

Limitations

The authors identified the following limitations:

  • The presented cGANs can lead to artifacts in some generated outputs.
  • Conditional GANs may not outperform simpler methods like L1 regression in certain vision tasks.

Technical Requirements

  • Number of GPUs: 1
  • GPU Type: Pascal Titan X

Keywords

image-to-image translation conditional GANs Pix2Pix U-Net PatchGAN

Papers Using Similar Methods

External Resources