← ML Research Wiki / 2404.19756

KAN: Kolmogorov-Arnold Networks

Ziming Liu [email protected] Massachusetts Institute of Technology The NSF Institute for Artificial Intelligence and Fundamental Interactions, Yixuan Wang California Institute of Technology, Sachin Vaidya Massachusetts Institute of Technology, Fabian Ruehle Northeastern University The NSF Institute for Artificial Intelligence and Fundamental Interactions, James Halverson Northeastern University The NSF Institute for Artificial Intelligence and Fundamental Interactions, Marin Soljačić Massachusetts Institute of Technology The NSF Institute for Artificial Intelligence and Fundamental Interactions, Thomas Y Hou California Institute of Technology, Max Tegmark Massachusetts Institute of Technology The NSF Institute for Artificial Intelligence and Fundamental Interactions (2024)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
Artificial Intelligence, Deep Learning, Scientific Computing
SOTA Claim
Yes
Reproducibility
8/10

Abstract

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights").KANs have no linear weights at all -every weight parameter is replaced by a univariate function parametrized as a spline.We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability, on small-scale AI + Science tasks.For accuracy, smaller KANs can achieve comparable or better accuracy than larger MLPs in function fitting tasks.Theoretically and empirically, KANs possess faster neural scaling laws than MLPs.For interpretability, KANs can be intuitively visualized and can easily interact with human users.Through two examples in mathematics and physics, KANs are shown to be useful "collaborators" helping scientists (re)discover mathematical and physical laws.In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.

Summary

This paper proposes Kolmogorov-Arnold Networks (KANs) as an innovative alternative to traditional Multi-Layer Perceptrons (MLPs). Inspired by the Kolmogorov-Arnold representation theorem, KANs employ learnable activation functions situated on edges, in lieu of fixed activation functions at nodes as seen in MLPs. This modification allows KANs to outperform MLPs in both accuracy and interpretability for small-scale AI + Science tasks. The authors demonstrate that KANs can represent functions more efficiently than MLPs, especially in scenarios involving high-dimensional data. Extensive numerical experiments showcase the theoretical and empirical benefits of KANs, elucidating their potential in scientific inquiries across mathematics and physics. The paper details KAN architecture, scaling laws, interpretability features, and applications in solving partial differential equations and scientific discovery, establishing KANs as a promising tool in the intersection of AI and scientific research.

Methods

This paper employs the following methods:

  • KAN
  • MLP

Models Used

  • KAN
  • MLP

Datasets

The following datasets were used in this research:

  • Feynman_no_units
  • Knot Theory
  • Special Functions

Evaluation Metrics

  • Accuracy
  • RMSE
  • Pareto Frontier

Results

  • KANs outperform MLPs in accuracy and interpretability on small-scale AI + Science tasks.
  • KANs possess better neural scaling laws than MLPs.
  • KANs can facilitate rediscovery of mathematical and physical laws.

Limitations

The authors identified the following limitations:

  • KANs require slower training times compared to MLPs.
  • Current implementations may still be inefficient in complex environments.

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Kolmogorov-Arnold Networks KON interpretability scientific discovery symbolic regression neural scaling laws

Papers Using Similar Methods

External Resources