Most modern learning problems are over-parameterized, where the number of learnable parameters is much greater than the number of training data points.In this over-parameterized regime, the training loss typically has infinitely many global optima that completely interpolate the data with varying generalization performance.The particular global optimum we converge to depends on the implicit bias of the optimization algorithm.The question we address in this paper is, "What is the implicit bias that leads to the best generalization performance?".To find the optimal implicit bias, we provide a precise asymptotic analysis of the generalization performance of interpolators obtained from the minimization of convex functions/potentials for over-parameterized linear regression with nonisotropic Gaussian data.In particular, we obtain a tight lower bound on the best generalization error possible among this class of interpolators in terms of the over-parameterization ratio, the variance of the noise in the labels, the eigenspectrum of the data covariance, and the underlying distribution of the parameter to be estimated.Finally, we find the optimal convex implicit bias that achieves this lower bound under certain sufficient conditions involving the log-concavity of the distribution of a Gaussian convolved with the prior of the true underlying parameter.
The paper discusses optimal implicit bias in linear regression within an over-parameterized regime, where training losses have infinite global optima that exhibit varying generalization performances. The authors derive an asymptotic analysis of generalization performance for interpolators obtained by minimizing convex functions under specific conditions. They propose a tight lower bound on attainable generalization errors, dependent on the over-parameterization ratio, noise variance, data covariance structure, and underlying parameter distribution. The study culminates in identifying the optimal implicit bias that corresponds to achieving the lowest generalization error in this context.
This paper employs the following methods:
- Asymptotic analysis
- Empirical Risk Minimization (ERM)
- Convex optimization
- Stochastic mirror descent (SMD)
The following datasets were used in this research:
- Generalization error
- Excess risk
- Tight lower bound on generalization error
- Identification of optimal convex implicit bias
- Characterization of asymptotic behavior for linear regression interpolators
- Number of GPUs: None specified
- GPU Type: None specified
- Compute Requirements: None specified