The website jointggm.org introduces updates of a suite of graphical model tools we have developed for estimating relationships (in the form of graphs) among variables from heterogeneous data sets. Feel free to submit pull requests when you find my typos.

Blog Posts


DIFFEE to identify Sparse Changes in High-Dimensional Gaussian Graphical Model Structure

Tool DIFFEE: Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure

Paper: @Arxiv | To Appear at 2018 AISTAT

Poster @ NIPS 2017 workshop for Advances in Modeling and Learning Interactions from Complex Data.

Abstract

We focus on the problem of estimating the change in the dependency structures of two p-dimensional Gaussian Graphical models (GGMs). Previous studies for sparse change estimation in GGMs involve expensive and difficult non-smooth optimization. We propose a novel method, DIFFEE for estimating DIFFerential networks via an Elementary Estimator under a high-dimensional situation. DIFFEE is solved through a faster and closed form solution that enables it to work in large-scale settings. We conduct a rigorous statistical analysis showing that surprisingly DIFFEE achieves the same asymptotic convergence rates as the state-of-the-art estimators that are much more difficult to compute. Our experimental results on multiple synthetic datasets and one real-world data about brain connectivity show strong performance improvements over baselines, as well as significant computational benefits.

Citations

@article{DBLP:journals/corr/abs-1710-11223,
  author    = {Beilun Wang and
               Arshdeep Sekhon and
               Yanjun Qi},
  title     = {Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian
               Graphical Model Structure},
  journal   = {CoRR},
  volume    = {abs/1710.11223},
  year      = {2017},
  url       = {http://arxiv.org/abs/1710.11223},
  archivePrefix = {arXiv},
  eprint    = {1710.11223},
  timestamp = {Thu, 02 Nov 2017 14:25:36 +0100},
  biburl    = {http://dblp.org/rec/bib/journals/corr/abs-1710-11223},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

Support or Contact

Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.

W-SIMULE

Tool W-SIMULE: A Constrained, Weighted-L1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity Graphs

We are updating the R package: simule with one more function: W-SIMULE

install.packages("simule")
library(simule)
demo(wsimuleDemo)

Package Manual

GitHub

Paper: @Arxiv @ NIPS 2017 workshop for Advances in Modeling and Learning Interactions from Complex Data.

Presentation: @Slides

Poster: @PDF

Abstract

Determining functional brain connectivity is crucial to understanding the brain and neural differences underlying disorders such as autism. Recent studies have used Gaussian graphical models to learn brain connectivity via statistical dependencies across brain regions from neuroimaging. However, previous studies often fail to properly incorporate priors tailored to neuroscience, such as preferring shorter connections. To remedy this problem, the paper here introduces a novel, weighted-ℓ1, multi-task graphical model (W-SIMULE). This model elegantly incorporates a flexible prior, along with a parallelizable formulation. Additionally, W-SIMULE extends the often-used Gaussian assumption, leading to considerable performance increases. Here, applications to fMRI data show that W-SIMULE succeeds in determining functional connectivity in terms of (1) log-likelihood, (2) finding edges that differentiate groups, and (3) classifying different groups based on their connectivity, achieving 58.6\% accuracy on the ABIDE dataset. Having established W-SIMULE’s effectiveness, it links four key areas to autism, all of which are consistent with the literature. Due to its elegant domain adaptivity, W-SIMULE can be readily applied to various data types to effectively estimate connectivity.

W-SIMULE

W-SIMULE

Citations

@article{singh2017constrained,
  title={A Constrained, Weighted-L1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity Graphs},
  author={Singh, Chandan and Wang, Beilun and Qi, Yanjun},
  journal={arXiv preprint arXiv:1709.04090},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.

FASJEM R package is released!

R package: fasjem

install.packages("fasjem")
library(fasjem)
demo(fasjem)

Package Manual

Paper: @AISTAT17 | @Arxiv

GitHub

Poster

Abstract

Estimating multiple sparse Gaussian Graphical Models (sGGMs) jointly for many related tasks (large K) under a high-dimensional (large p) situation is an important task. Most previous studies for the joint estimation of multiple sGGMs rely on penalized log-likelihood estimators that involve expensive and difficult non-smooth optimizations. We propose a novel approach, FASJEM for fast and scalable joint structure-estimation of multiple sGGMs at a large scale. As the first study of joint sGGM using the M-estimator framework, our work has three major contributions: (1) We solve FASJEM through an entry-wise manner which is parallelizable. (2) We choose a proximal algorithm to optimize FASJEM. This improves the computational efficiency from O(Kp3 ) to O(Kp2 ) and reduces the memory requirement from O(Kp2 ) to O(K). (3) We theoretically prove that FASJEM achieves a consistent estimation with a convergence rate of O(log(Kp)/ntot). On several synthetic and four real-world datasets, FASJEM shows significant improvements over baselines on accuracy, computational complexity and memory costs.

JEM

JEM2

JEMmore

Citations

@inproceedings{wang2017fast,
  title={A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models},
  author={Wang, Beilun and Gao, Ji and Qi, Yanjun},
  booktitle={Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR:, 2017.},
  volume={54},
  pages={1168--1177},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.

SIMULE R package is released!

Tool SIMULE: A constrained l1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models

R package: simule

install.packages("simule")
library(simule)
demo(simuleDemo)

Package Manual

GitHub

Paper: @Arxiv | @Mach Learning

Talk

Abstract

Identifying context-specific entity networks from aggregated data is an important task, arising often in bioinformatics and neuroimaging. Computationally, this task can be formulated as jointly estimating multiple different, but related, sparse Undirected Graphical Models (UGM) from aggregated samples across several contexts. Previous joint-UGM studies have mostly focused on sparse Gaussian Graphical Models (sGGMs) and can’t identify context-specific edge patterns directly. We, therefore, propose a novel approach, SIMULE (detecting Shared and Individual parts of MULtiple graphs Explicitly) to learn multi-UGM via a constrained L1 minimization. SIMULE automatically infers both specific edge patterns that are unique to each context and shared interactions preserved among all the contexts. Through the L1 constrained formulation, this problem is cast as multiple independent subtasks of linear programming that can be solved efficiently in parallel. In addition to Gaussian data, SIMULE can also handle multivariate Nonparanormal data that greatly relaxes the normality assumption that many real-world applications do not follow. We provide a novel theoretical proof showing that SIMULE achieves a consistent result at the rate O(log(Kp)/n_{tot}). On multiple synthetic datasets and two biomedical datasets, SIMULE shows significant improvement over state-of-the-art multi-sGGM and single-UGM baselines.

SIMULE

Citations

@article{wang2016constrained,
  title={A constrained l1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models},
  author={Wang, Beilun and Singh, Ritambhara and Qi, Yanjun},
  journal={arXiv preprint arXiv:1605.03468},
  year={2016}
}

Support or Contact

Having trouble with our tools? Please contact Beilun and we’ll help you sort it out.

JointGGM.org is up and running!

The website JointGGM.org introduces a suite of tools we have developed for learning the structure of multiple sparse Gaussian graphical models jointly.

Background: Sparse Gaussian Graphical Model (sGGM)

The sparse Gaussian Graphical Model(sGGM) assumes data samples are independently and identically drawn from a multivariate normal distribution with mean $\mu$ and covariance matrix $\Sigma$. The graph structure $G$ among $p$ features is encoded by the sparsity pattern of the inverse covariance matrix, also named precision matrix, $\Omega$.

sGGM

In $G$ an edge does not connect $j$-th node and $k$-th node (i.e., conditional independent) if and only if $\Omega_{jk} = 0$. sGGM imposes a sparse L1 penalty on the $\Omega$.

This website: Joint learning of Multiple Sparse Gaussian Graphical Model (multi-sGGM)

Modern multi-context molecular datasets are high dimensional, heterogeneous and noisy. For such heterogeneous data samples, rather than estimating sGGM of each condition separately, a multi-task formulation that jointly estimates $K$ different but related sGGMs can lead to a better generalization.

multisGGM

We have designed a suite of novel and robust machine-learning algorithms to identify context-specific interaction graphs from such data.

So far, we have released the following two R packages:

No. Tool Name Short Description
1 SIMULE A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models
2 FASJEM A constrained l1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models
3 W-SIMULE A Constrained, Weighted-L1 Minimization Approach for Joint Discovery of Heterogeneous Neural Connectivity Graphs
4 DIFFEE Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure

Possible Applications

Helping researchers effectively translate aggregated data into knowledge that take the form of graphs, this suite of toolboxes can have important biomedical applications, such as investigating molecular signatures corresponding to different drug treatments. It is expected to impact other domains as well, for instance, to identify condition-specific functional networks about human brain connectivity.

Contact

Have questions or suggestions? Feel free to ask me on Twitter or email me.

Thanks for reading!