← ML Research Wiki / 2310.02386

ScaleNet: An Unsupervised Representation Learning Method for Limited Information

Huili Huang School of Computational Science and Engineering Georgia Institute of Technology 756 W Peachtree St NW30308AtlantaGAUSA, M Mahdi [email protected] School of Computational Science and Engineering Georgia Institute of Technology 756 W Peachtree St NW30308AtlantaGAUSA (2023)

Paper Information
arXiv ID
Venue
German Conference on Pattern Recognition
Domain
computer vision

Abstract

Although large-scale labeled data are essential for deep convolutional neural networks (ConvNets) to learn high-level semantic visual representations, it is time-consuming and impractical to collect and annotate large-scale datasets.A simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available.The input images are first resized to a smaller size and fed to the ConvNet to recognize the rotation degree.Next, the ConvNet learns the rotation-prediction task for the original size images based on the parameters transferred from the previous model.The CIFAR-10 and ImageNet datasets are examined on different architectures such as AlexNet and ResNet50 in this study.The current study demonstrates that specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task.The ScaleNet supersedes the RotNet by ≈ 7% in the limited CIFAR-10 dataset.The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model.This study shows the capability of the ScaleNet method to improve other cutting-edge models such as SimCLR by learning effective features for classification tasks.

Summary

This study proposes a novel unsupervised representation learning method called ScaleNet, optimized for situations with limited information. The approach enhances the performance of convolutional neural networks (ConvNets) by utilizing a multi-scale image framework that trains the network to recognize and predict the rotation of images. The paper evaluates ScaleNet on the CIFAR-10 and ImageNet datasets using architectures like AlexNet and ResNet50. The results indicate that ScaleNet outperforms existing methods like RotNet and SimCLR, particularly in limited data conditions, by effectively learning high-level visual representations. Specifically, the ScaleNet demonstrates a 7% improvement over RotNet on CIFAR-10 and a 6% boost in performance for ImageNet classification tasks. Key findings suggest that the inclusion of specific features, such as Harris corner information, is crucial for improving efficiency in rotation-prediction tasks, demonstrating the potential of ScaleNet for advancing self-supervised learning methodologies in data-scarce environments.

Methods

This paper employs the following methods:

  • Self-supervised Learning
  • Representation Learning

Models Used

  • ScaleNet
  • RotNet
  • SimCLR
  • AlexNet
  • ResNet50

Datasets

The following datasets were used in this research:

  • CIFAR-10
  • ImageNet

Evaluation Metrics

  • None specified

Results

  • ScaleNet outperforms RotNet by ≈ 7% on CIFAR-10
  • ScaleNet improves ImageNet classification by 6% compared to RotNet
  • SimCLR's performance enhanced by ∼4% when combined with ScaleNet

Limitations

The authors identified the following limitations:

  • Limited availability of labeled data
  • Challenges in capturing high-level representations with small datasets
  • Dependence on specific features like corner information

Technical Requirements

  • Number of GPUs: 2
  • GPU Type: NVIDIA K80, RTX 2070

Keywords

ScaleNet unsupervised learning self-supervised learning rotation prediction multi-scale images

Papers Using Similar Methods

External Resources