Genome-DeepMatching (Index of Posts):


Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification

Prototype Matching Networks : A novel deep learning architecture for Large-Scale Multi-label Genomic Sequence Classification

Paper: @Arxiv

Abstract

One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known as “motifs” and (2) interactions among TFs known as co-binding effects. In this paper, we propose a novel deep architecture, the Prototype Matching Network (PMN) to mimic the TF binding mechanisms. Our PMN model automatically extracts prototypes (“motif”-like features) for each TF through a novel prototype-matching loss. Borrowing ideas from few-shot matching models, we use the notion of support set of prototypes and an LSTM to learn how TFs interact and bind to genomic sequences. On a reference TFBS dataset with 2.1 million genomic sequences, PMN significantly outperforms baselines and validates our design choices empirically. To our knowledge, this is the first deep learning architecture that introduces prototype learning and considers TF-TF interactions for large-scale TFBS prediction. Not only is the proposed architecture accurate, but it also models the underlying biology.

Citations

@article{lanchantin2017prototype,
  title={Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification},
  author={Lanchantin, Jack and Sekhon, Arshdeep and Singh, Ritambhara and Qi, Yanjun},
  journal={arXiv preprint arXiv:1710.11238},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Jack and we’ll help you sort it out.


Memory Matching Networks for Genomic Sequence Classification

Tool Memory Matching Networks for Genomic Sequence Classification

Paper: @Arxiv

GitHub

Poster

Abstract

When analyzing the genome, researchers have discovered that proteins bind to DNA based on certain patterns of the DNA sequence known as “motifs”. However, it is difficult to manually construct motifs due to their complexity. Recently, externally learned memory models have proven to be effective methods for reasoning over inputs and supporting sets. In this work, we present memory matching networks (MMN) for classifying DNA sequences as protein binding sites. Our model learns a memory bank of encoded motifs, which are dynamic memory modules, and then matches a new test sequence to each of the motifs to classify the sequence as a binding or nonbinding site.

memo

Citations

@article{lanchantin2017memory,
  title={Memory Matching Networks for Genomic Sequence Classification},
  author={Lanchantin, Jack and Singh, Ritambhara and Qi, Yanjun},
  journal={arXiv preprint arXiv:1702.06760},
  year={2017}
}

Support or Contact

Having trouble with our tools? Please contact Jack and we’ll help you sort it out.