|
2022-04-29 / 10:00 ~ 11:00
|
|
by 안정연(KAIST, 산업및시스템공학과)
|
Asymptoticswith consideration of ultra-high dimensional data must consider an increasing number of variables, i.e., dimensions, rather than growing the number of observations. High-dimensional asymptotic studies have revealed some unexpected characteristics of data with an exceedingly large number of variables, such as gene expressions. In the context of binary classification, i.e., supervised learning with dichotomous labels, data piling refers to the phenomenon that training data vectors from each class project to a single point for classification. This interesting phenomenon has been a key to understanding many distinctive properties of high-dimensional discrimination. In this talk, high-dimensional asymptoticsof data piling is investigated under equal covariance assumption, which reveals its close connection to the well-known ridged linear classifier. In particular, we show that a negatively ridged discriminant vector can asymptotically achieve data piling of independent test data, essentially yielding a perfect classification. Double data pilingis generalized to heterogeneous covariance models and we propose a data-splitting approach to estimate the direction for the second data piling of test data.
|
|
|