Venue
German Conference on Pattern Recognition
Although large-scale labeled data are essential for deep convolutional neural networks (ConvNets) to learn high-level semantic visual representations, it is time-consuming and impractical to collect and annotate large-scale datasets.A simple and efficient unsupervised representation learning method named ScaleNet based on multi-scale images is proposed in this study to enhance the performance of ConvNets when limited information is available.The input images are first resized to a smaller size and fed to the ConvNet to recognize the rotation degree.Next, the ConvNet learns the rotation-prediction task for the original size images based on the parameters transferred from the previous model.The CIFAR-10 and ImageNet datasets are examined on different architectures such as AlexNet and ResNet50 in this study.The current study demonstrates that specific image features, such as Harris corner information, play a critical role in the efficiency of the rotation-prediction task.The ScaleNet supersedes the RotNet by ≈ 7% in the limited CIFAR-10 dataset.The transferred parameters from a ScaleNet model with limited data improve the ImageNet Classification task by about 6% compared to the RotNet model.This study shows the capability of the ScaleNet method to improve other cutting-edge models such as SimCLR by learning effective features for classification tasks.
This study proposes a novel unsupervised representation learning method called ScaleNet, optimized for situations with limited information. The approach enhances the performance of convolutional neural networks (ConvNets) by utilizing a multi-scale image framework that trains the network to recognize and predict the rotation of images. The paper evaluates ScaleNet on the CIFAR-10 and ImageNet datasets using architectures like AlexNet and ResNet50. The results indicate that ScaleNet outperforms existing methods like RotNet and SimCLR, particularly in limited data conditions, by effectively learning high-level visual representations. Specifically, the ScaleNet demonstrates a 7% improvement over RotNet on CIFAR-10 and a 6% boost in performance for ImageNet classification tasks. Key findings suggest that the inclusion of specific features, such as Harris corner information, is crucial for improving efficiency in rotation-prediction tasks, demonstrating the potential of ScaleNet for advancing self-supervised learning methodologies in data-scarce environments.
This paper employs the following methods:
- Self-supervised Learning
- Representation Learning
- ScaleNet
- RotNet
- SimCLR
- AlexNet
- ResNet50
The following datasets were used in this research:
- ScaleNet outperforms RotNet by ≈ 7% on CIFAR-10
- ScaleNet improves ImageNet classification by 6% compared to RotNet
- SimCLR's performance enhanced by ∼4% when combined with ScaleNet
The authors identified the following limitations:
- Limited availability of labeled data
- Challenges in capturing high-level representations with small datasets
- Dependence on specific features like corner information
- Number of GPUs: 2
- GPU Type: NVIDIA K80, RTX 2070
ScaleNet
unsupervised learning
self-supervised learning
rotation prediction
multi-scale images