Research


1. Research field:

(a) Large scale graphical modeling. 

       I have been interested in log-linear modeling for a large number of categorical variables where the model structures are representable via undirected graphs. Initially, I tried conditional modeling method based on the fact that the conditional models disclose parts of the joint models (Fienberg and Kim(1999)).

       Then I turned to a method of using the information that are hidden in marginal models in search of the joint model. This approach is currently applied for analyzing fMRI data in regard to the effective connectivity in the brain. This approach can also be useful for structure learning when data are not available for all the variables that we are interested in but information is provided on parts of the model structure through marginal models.

 

((b) Exploratory structure learning.

       When a model is graphical involving a large number of variables, it is desirable that we have an initial structure of the model. If the initial structure,  M0, is close to the true model, our model building based on M0 would lead to a final model which is acceptable. In this point of view, finding a ˇ±goodˇ± initial structure is crucial to a successful model building.

      Assuming that the model is graphical, non-parametric regression method, mutualinformation method, and variations of these methods would be useful for an exploratory search of initial model structures.

 

(c) Model evaluation with sparse data.

      Large scale modeling often comes along with sparse-data issues. It seems unlikely that we can evaluate a large model with a high confidence when the data size is not large enough. A Bootstraplike method is proposed in regard to this issue (Kim, Choi, and Lee (2009)). A reasonable evaluation approach may be to split a large model and evaluate parts of the whole model rather than handling the whole model at once. An issue here would be how to split the model.


2. Applications :

Student diagnosis of knowledge states, structure learning for large-scale modeling, statistical modeling of brain functions, spatio-temporal data analysis.


3. Some keywords of my research:

Conditional model structure, graphical model, marginal model, Markovian subgraph, directed acyclic graph, undirected graph, graphical combination, structure learning.


4. Long-term research goal:

(a) Model combining algorithm. 

      Suppose that we have a number of models that are developed from different sets of data. If we can regard them marginal models of some large model, then we may be able to find the large model by combining those marginal models by applying some properties of graphical models. An algorithm will be developed in this line of work for building large graphical models, in which the notion of graph separateness plays a crucial role.

 

(b) Mathematical modeling of brain functions and their interactive rela- tionship. 

       The graphical model method is a good tool for modeling brain functioning and inter-connection between neurons. The connection can be directional or symmetric, or causal or associative. I intend to develop statistical methods of graphical modeling of those functional inter-relationships among neurons. The model-combining approach is expected to help us in building large and complicated models of neuro-functioning. This research is funded by KOSEF(Korea Science and Engineering Foundation) from September 2007 to August 2010.

 

(c) Graphical modeling for spatio-temporal data. 

       Spatio-temporal data are commonplace in bio- and neuro-sciences, financial science, social science, among others. Statistical models for such data become complicated consequently since they involve both location and time. Each location usually yields time series data. When we are interested in the inter-relationships among locations or variables, this means that we may have to deal with a model structure of (location, time)- variables. Vector time series models are a typical class of models for this sort of data, and a mixture of a structural equations model and a vector time series model would be a better fit to the data. I am working for an approach of structure learning for spatio-temporal data using marginal model structures.