While deep learning has many remarkable success stories, finding a satisfactory mathematical explanation on why it is so effective is still considered an open challenge. One recent promising direction for this challenge is to analyse the mathematical properties of neural networks in the limit where the widths of hidden layers of the networks go to infinity. Researchers were able to prove highly-nontrivial properties of such infinitely-wide neural networks, such as the gradient-based training achieving the zero training error (so that it finds a global optimum), and the typical random initialisation of those infinitely-wide networks making them so called Gaussian processes, which are well-studied random objects in machine learning, statistics, and probability theory. These theoretical findings also led to new algorithms based on so-called kernels, which sometimes outperform existing kernel-based algorithms.
The purpose of this talk is to explain these recent theoretical results on infinitely wide neural networks. If time permits, I will briefly describe my work in this domain, which aims at developing a new neural-network architecture that has multiple nice theoretical properties in the infinite-width limit. This work is jointly pursued with Fadhel Ayed, Francois Caron, Paul Jung, Hoil Lee, and Juho Lee.
|