We have introduced and continue to develop a collection of techniques to discover and model causal relationships in multivariate time series data, based on Granger causality. Granger causality is an operational definition of causality created by the Nobel Prize winner Clive Granger and used extensively in econometrics. We have enhanced it with sparse structure learning algorithms for graphical models via penalized regression and developed several other models to capture the dynamic properties, spatial constraints, non-linearity properties, and relational information in time-series data.
KDD 2007; KDD 2009(a); KDD 2009(b); KDD 2009(c); ICML 2010; AAAI 2010; SDM 2012; ICML 2012;SDM 2013; KDD 2013
We have developed graphical models and graph-based algorithms to capture the dependencies or constraints between input/output variables in structured data. We introduced segmentation conditional random field models (SCRF) to infer the segmentation of sequences by modeling the dependency information between segments and kernel conditional random fields (kernel CRF) to permit the use of implicit features spaces through Mercer kernels in conditional random fields. We also relaxed the common assumption that the output space has to be fixed beforehand and developed reversible jump MCMC algorithms for fast inference.
RECOMB 2005; ICML 2005; IJCAI 2007; NIPS 2011
We have developed discriminative latent graphical models to uncover the hidden semantics of text and video data by leveraging labels or dependency information between instances. We developed topic-link LDA models to uncover both the topics and community structure by leveraging link information between documents. We also developed the discriminative harmonium model to better uncover the latent topics in video data by leveraging label information.
ICML 2009; SDM 2007; KDD 2012; ICML 2012
We are investigating graph-based models to capture the dependency between features or instances for more effective anomaly detection. We developed a graph-based algorithm that makes use of a global similarity matrix motivated by manifold ranking, which results in more compact clusters for the minority classes.
ICDM 2008; SDM 2009; KDD 2009; ICDM 2012
We have developed transfer learning algorithms to solve target tasks by leveraging abundant data from related source tasks. We are especially interested in most challenging cases, where observations in the target domain are short unstructured texts with very limited information. We have also explored the potent combination of transfer learning and active learning, where learners use query labels of target examples to adapt the quality of the transfer.
CIKM 2009; KDD 2011(b,c); AAAI 2011; ICDM 2012; ICDM 2013
Global warming is one of the most critical socio-technological issues that mankind faces in the 21st century. We are working to find solutions by providing better understanding and quantifying the causal effects of climate and climate-forcing agents. We have developed a data-centric approach, namely temporal causal models and extensions of this model (group lasso, graph Laplacian, and hidden Markov random fields) that capture the spatial and temporal information in climate data.
KDD 2007; KDD 2009(a); KDD 2009(b); ICML 2010; AAAI 2010; ICML 2012; KDD 2013
We proposed graphical model solutions to a variety of problems in computational biology, proteomics, and gene microarray analysis.
Bioinformatics 2004; RECOMB 2005; Journal of Computational Biology 2008; ISMB 2009; CSB 2010
We are investigating the synergy between textual analysis and social network analysis. We are developing latent variable models that can uncover topics and author communities from blog data and transfer algorithms for sentiment prediction. We also analyze social networks and topic flows in posts from corporate discussion forum to examine the determining factors that lead to innovative ideas.
ICML, 2003; ECML, 2003; CIKM, 2002; IJCAI, 2007; ICML 2009; CIKM 2009; KDD SNA Workshop, 2007; WSDM 2013; VLDB 2013
We use Granger graphical models to uncover temporal dependencies and detect anomalies in high-dimensional time series data, motivated by applications such as oil drilling, semiconductor fabrication, and railroad operation.
This CAREER project aims to develop novel machine learning models based on Granger causality to uncover the complex dependence structures from high-dimensional time series.