← ML Research Wiki / 1201.0490

Scikit-learn: Machine Learning in Python

Fabian Pedregosa [email protected], Gaël Varoquaux [email protected], Vincent Michel [email protected], Bertrand Thirion [email protected], Olivier Grisel [email protected], Mathieu Blondel [email protected], Gilles Louppe [email protected], Peter Prettenhofer [email protected], Ron Weiss [email protected], Vincent Dubourg [email protected], Jake Vanderplas [email protected], Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort [email protected], Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos [email protected], David Cournapeau [email protected], Matthieu Brucher [email protected], Matthieu Perrot [email protected]ŕ, Varoquaux, GramfortDuchesnay Pedregosa, Al David Cournapeau, Matthieu Brucher, Matthieu Perrot, Edouard Duchesnay [email protected], Alexandre Gramfort INRIA Saclay Neurospin, Bât 145, CEA 91191Saclay, Gif sur Yvette -France, Nuxeo 20 rue Soleillet 75020Paris -France, Dept. of EE & CS Kobe University 1-1 Rokkodai, Nada Kobe 6578501Japan, University of Liège Liège Belgium, Bauhaus-Universität Weimar Bauhausstr. 1199421Weimar -Germany, Google Inc 76 Ninth Avenue10011New York, ClermontNYUSA, Astronomy Department Université, IFMA LaMI BP 104483867, 63000Clermont-Ferrand -FranceEA, IESL Lab UMass Amherst University of Washington Box351580, 98195, 01002, CB3 0FASeattle, Amherst, CambridgeWA, MAUSA Alexandre Passos, USA, UK, CSTJF avenue Larribau Total SA 64000Pau -France, LNAO Neurospin Bât 145, CEA 91191Saclay, Gif sur Yvette -France Editor: Mikio Braun (2011)

Paper Information
arXiv ID
Venue
Journal of machine learning research
Domain
Computer science / Data science
SOTA Claim
Yes
Code
Available
Reproducibility
8/10

Abstract

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.org.

Summary

Scikit-learn is a Python module that integrates a variety of machine learning algorithms suitable for medium-scale supervised and unsupervised problems. It is designed for ease of use, performance, documentation, and API consistency, making it accessible to non-specialists in various fields. The project emphasizes code quality, community-driven development, minimal dependencies, and a rich documentation that includes user guides and examples. Scikit-learn incorporates technologies such as Numpy and Scipy and allows for efficient model selection and evaluation through cross-validation. The library is continually evolving, with plans for further enhancements including online learning.

Methods

This paper employs the following methods:

  • SVM
  • PCA
  • kNN
  • k-means
  • Elastic Net
  • LARS
  • GridSearchCV
  • Pipeline

Models Used

  • None specified

Datasets

The following datasets were used in this research:

  • Madelon

Evaluation Metrics

  • None specified

Results

  • State-of-the-art implementations of machine learning algorithms
  • Ease of use for non-specialists
  • High code quality and performance benchmarked against other libraries

Limitations

The authors identified the following limitations:

  • Performance on very large problems limited for certain algorithms

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Python machine learning scikit-learn supervised learning unsupervised learning

Papers Using Similar Methods

External Resources