Department Seminars & Colloquia
When you're logged in, you can subscribe seminars via e-mail
Deep generative models have received much attention recently since they can generate very realistic synthetic images. There are two regimes for the estimation of deep generaitve models. One is generative adversarial network (GAN) and the other is variational auto encoder (VAE), Even though GAN is known to generate more clean synthetic images, it suffers from numerical instability and mode collapsing problems. VAE is an useful alternative to GAN and an important advantage of VAE is that the representational learning (i.e. learning latent variables) is possible.
In this talk, I explain my recent studies about VAE. The first topic is computation. Typically, the estimation of VAE is done by maximizing the ELBO – an upper bound of the marginal likelihood. However, it is known that ELBO is inferior to the maximum likelihood estimator. I propose an efficient EM algorithm for VAE which directly finds the maximizer of the likelihood (the maximum likelihood estimator, MLE).
The second topic is theory. I explain how the MLE of VAE behaves asymptotically. I derive a convergence rate which depends on the noise level as well as the complexity of a deep architecture. A surprising observation is that the convergence rate of the MLE becomes slower when the noise level is too low. A new technique to modify the MLE when the noise level is small is proposed and is shown to outperform the original MLE by analyzing real data.
At Data Science Group, we try to offer computational models for challenging real-world problems.
This talk will introduce two such problems that can benefit from collaboration with mathematicians and theorists. One is customs fraud detection, where the goal is to determine a small set of fraudulent transactions that will maximize the tax revenue when caught. We had previously shown
a state-of-the-art deep learning model in collaboration with the World Customs Organization [KDD2020].
The next challenge is to consider semi-supervised (i.e., using very few labels) and unsupervised (i.e., no label information) settings that better suit developing countries' conditions. Another research problem is poverty mapping, where the goal is to infer economic index from high-dimensional visual features learned from satellite images. Several innovative algorithms have been proposed for this task [Science2016, AAAI2020, KDD2020]. I will introduce how we approach this problem under extreme conditions with little validation data, as in North Korea.
Overparametrized neural networks have infinitely many solutions that achieve zero training loss, but gradient-based optimization methods succeed in finding solutions that generalize well. It is conjectured that the optimization algorithm and the network architecture induce an implicit bias towards favorable solutions, and understanding such a bias has become a popular topic. We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) applied on linear neural network training. We consider separable classification and underdetermined linear regression problems where there exist many solutions that achieve zero training error, and characterize how the network architecture and initialization affects the final solution found by gradient flow. Our results apply to a general tensor formulation of neural networks that includes linear fully-connected networks and linear convolutional networks as special cases, while removing convergence assumptions required by prior research. We also provide experiments that corroborate our theoretical analysis.
Bose-Einstein condensation (BEC) is one of the most famous phenomena, which cannot be explained by classical mechanics. Here, we discuss the time evolution of BEC in the mean-field limit. First, we review quantum mechanics briefly, and we understand the problem in a mathematically rigorous way. Then, we taste the idea of proof by using coherent state and the Fock space. Finally, some recent developments will be provided.
Many types of diffusion equations have been used to describe diverse natural phenomena. The classical heat equation describes the heat propagation in homogeneous media, and the heat equation with fractional time derivative describes anomalous diffusion, especially sub-diffusion, caused by particle sticking and trapping effects. On the other hand, space-fractional diffusion equations are related to diffusion of particles with long range jumps.
In this talk, I will introduce the following:
1. Elementary notion of stochastic parabolic equations
2. Stochastic processes with jumps and their related PDEs and Stochastic PDEs
3. Some regularity results of PDEs and Stochastic PDEs with non-local operators
Deep neural networks have shown amazing success in various domains of artificial intelligence (e.g. vision, speech, language, medicine and game playing). However, classical tools for analyzing these models and their learning algorithms are not sufficient to provide explanations for such success. Recently, the infinite-width limit of neural networks has become one of key breakthroughs in our understanding of deep learning. This limit is unique in giving an exact theoretical description of large scale neural networks. Because of this, we believe it will continue to play a transformative role in deep learning theory.
In this talk, we will first review some of the interesting theoretical questions in the deep learning community. Then we will review recent progress in the study of the infinite-width limit of neural networks focused around Neural Network Gaussian Process (NNGP) and Neural Tangent Kernel (NTK). This correspondence allows us to understand wide neural networks as different kernel based machine learning models and provides 1) exact Bayesian inference without ever initializing or training a network and 2) closed form solution of network function under gradient descent training. We will discuss recent advances, applications and remaining challenges of the infinite-width limit of neural networks.
In this talk, I start by giving a brief overview of the practice in deep learning with focus on learning (optimization) and model selection (hyperparameter optimization). In particular, I will describe and discuss black-box optimization approaches to model selection, followed by discussion on how these two stages in deep learning can be collapsed into a single optimization problem, often referred to as bilevel optimization. This allows us to extend the applicability of gradient-based optimization into model selection, although existing gradient-based model selection, or hyperparameter optimization, approaches have been limited because they require an extensive numbers of so-called roll-out. I will then explain how we can view gradient-based optimization as a recurrent network and how this enables us to view hyperparameter optimization as training a recurrent network. This is an insight that leads to a novel paradigm of online hyperparameter optimization which does not require any simulated roll-out.
Abstract: The large deviation problem for the spectrum of random matrices has attracted immense interest. It was first studied for GUE and GOE, which are exactly solvable, and subsequently studied for Wigner matrices with general distributions. Once the sparsity is induced (i.e. each entry is multiplied by the independent Bernoulli distribution, Ber(p)), eigenvalues can exhibit a drastically different behavior. For a large class of Wigner matrices, including Gaussian ensembles and the adjacency matrix of Erdos-Renyi graphs, dense behavior ceases to hold near the constant average degree of sparsity, p~1/n (up to a poly-logarithmic factor). In this talk, I will talk about the spectral large deviation for Gaussian ensembles with a sparsity p=1/n. Joint work with Shirshendu Ganguly.
ZOOM회의정보 link:https://zoom.us/j/94727585394?pwd=QlBSRUNTQi9UWXNLSTlPOTgrRnhhUT09 회의 ID: 947 2758 5394 암호: saarc
ZOOM회의정보 link:https://zoom.us/j/94727585394?pwd=QlBSRUNTQi9UWXNLSTlPOTgrRnhhUT09 회의 ID: 947 2758 5394 암호: saarc