Karen Simonyan Visual Geometry Group University of Oxford, Andrew Zisserman Visual Geometry Group University of Oxford (2014)
In this paper, the authors explore the impact of convolutional network (ConvNet) depth on accuracy in large-scale image recognition, reporting on a significant performance improvement achieved by using up to 19 weight layers. This work contributed to their successful submission to the ImageNet Challenge 2014, where they secured top positions in both classification and localization tasks. The authors present two architectures, "Net-D" and "Net-E", demonstrating state-of-the-art results in image classification on the ILSVRC-2012 dataset, and show that their models generalize well to other datasets, enhancing research in deep visual representations. The paper provides a detailed methodology for training, evaluation, and the architectures employed, as well as discussing the implementation and enhancements over previous ConvNet designs.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: