Modern machine learning methods such as multi-layer neural networks often have millions of parameters achieving near-zero training errors. Nevertheless, they maintain strong generalization capabilities, challenging traditional statistical theories based on the uniform law of large numbers. Motivated by this phenomenon, we consider high-dimensional binary classification with linearly separable data. For Gaussian covariates, we characterize linear classification problems for which the minimum norm interpolating prediction rule, namely the max-margin classification, has near-optimal generalization error.
In the second part of the talk, we consider max-margin classification with non-Gaussian covariates. In particular, we leverage universality arguments to characterize the generalization error of non-linear random features model, a two-layer neural network with random first layer weights. In the wide-network limit, where the number of neurons tends to infinity, we show how non-linear max-margin classification with random features collapse to a linear classifier with a soft-margin objective.
|