Geoffrey Hinton [email protected] Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View, Oriol Vinyals [email protected] Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View, Jeff Dean Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View, Geoffrey Hinton [email protected] Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View, Oriol Vinyals [email protected] Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View, Jeff Dean Google Inc. Mountain View Google Inc. Mountain View Google Inc. Mountain View (2015)
The paper discusses a method called 'distillation' for transferring knowledge from large, cumbersome neural network models or ensembles of models to smaller, more efficient models. The authors, Hinton, Vinyals, and Dean, illustrate this approach by referencing existing work on model compression and elaborating on their technique which employs soft targets derived from the cumbersome models to train smaller models. They demonstrate the effectiveness of distillation on datasets such as MNIST and in applications like automatic speech recognition, achieving significant improvements in model performance while being computationally efficient. They also introduce specialist models that can work in conjunction with a generalist model to better handle large datasets with many classes, thus improving efficiency and accuracy while mitigating overfitting through soft targets. The paper concludes by emphasizing the potential for distillation to bridge the gap between training complex models and deploying simpler ones effectively.
This paper employs the following methods:
The following datasets were used in this research: