Tomas Mikolov [email protected], Ilya Sutskever [email protected], Kai Chen, Greg Corrado [email protected], Jeffrey Dean, Google Inc. Mountain View, Google Inc. Mountain View (2013)
This paper presents the continuous Skip-gram model for learning high-quality distributed vector representations of words and phrases, highlighting several extensions to improve both the quality and training efficiency. Key enhancements include subsampling of frequent words to speed up training and improve representation for rarer words, and the introduction of negative sampling as an alternative to hierarchical softmax to optimize training. The study effectively demonstrates how idiomatic phrases can be represented as unique tokens, enhancing expressiveness. Empirical results illustrate the efficacy of these methods through analogical reasoning tasks, revealing that negative sampling outperforms hierarchical softmax, particularly when combined with frequency subsampling. The paper underscores the linear compositionality of word vectors, facilitating meaningful analogical reasoning through simple vector arithmetic. Lastly, it compares various word representation models, establishing the Skip-gram model's superiority due to its extensive training on a large corpus, resulting in significantly higher quality representations.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: