Deep neural networks have proven to work very well on many complicated tasks. However, theoretical explanations on why deep networks are very good at such tasks are yet to come. To give a satisfactory mathematical explanation, one recently developed theory considers an idealized network where it has infinitely many nodes on each layer and an infinitesimal learning rate. This simplifies the stochastic behavior of the whole network at initialization and during the training. This way, it is possible to answer, at least partly, why the initialization and training of such a network is good at particular tasks, in terms of other statistical tools that have been previously developed. In this talk, we consider the limiting behavior of a deep feed-forward network and its training dynamics, under the setting where the width tends to infinity. Then we see that the limiting behaviors can be related to Bayesian posterior inference and kernel methods. If time allows, we will also introduce a particular way to encode heavy-tailed behaviors into the network, as there are some empirical evidences that some neural networks exhibit heavy-tailed distributions.
|