|
2021-11-12 / 10:00 ~ 11:00
|
|
by 양홍석(카이스트, 전산학부)
|
Deep neural networks have brought remarkable progress in a wide range of applications, but a satisfactory mathematical answer on why they are so effective has yet to come. One promising direction, with a large amount of recent research activity, is to analyse neural networks in an idealised setting where the networks have infinite widths and the so-called step size becomes infinitesimal. In this idealised setting, seemingly intractable questions can be answered. For instance, it has been shown that as the widths of deep neural networks tend to infinity, the networks converge to Gaussian processes, both before and after training, if their weights are initialized with i.i.d. samples from the Gaussian distribution and normalised appropriately. Furthermore, in this setting, the training of a deep neural network is shown to achieve zero training error, and the analytic form of a fully-trained network with zero error has been identified. These results, in turn, enable the use of tools from stochastic processes and differential equations for analyzing deep neural networks in a novel way.In this talk, I will explain our efforts for extending the above analysis to a new type of neural networks that arise from recent studies on Bayesian deep neural networks, network pruning, and design of effective learning rates. In these networks, each network node is equipped with its own scala parameter that is intialised randomly and independently but is not updated during training. This scale parameter of a node determines the scale of weights of outgoing network edges from the node at initialisation, thereby introducing the dependency among the weights. Also, its square becomes the learning rate of those weights. I will show that these networks at given inputs become infinitely-divisible random variables at the infinite-width limit, and describe how this characterisation at the infinite-width limit can help us to understand the behaviour of these neural networks.This is joint work with Hoil Lee, Juho Lee, and Paul Jung at KAIST, Francois Caron at Oxford, and Fadhel Ayed at Huawei technologies
|
|
|