Tagging-Protein (Index of Posts):

This categoy of tools aims to tag local protein sequence pieces into functional classes..
This includes:


MUST-CNN- A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction

Tool MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction

Paper

GitHub

Talk Slides

Abstract

Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration from the image classification domain we propose a deep convolutional neural network architecture, MUST-CNN, to predict protein properties. This architecture uses a novel multilayer shift-and-stitch (MUST) technique to generate fully dense per-position predictions on protein sequences. Our model is significantly simpler than the state-of-the-art, yet achieves better results. By combining MUST and the efficient convolution operation, we can consider far more parameters while retaining very fast prediction speeds. We beat the state-of-the-art performance on two large protein property prediction datasets.

must1 must2 must3 must4

Citations

@inproceedings{lin2016must,
  title={MUST-CNN: a multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction},
  author={Lin, Zeming and Lanchantin, Jack and Qi, Yanjun},
  booktitle={Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence},
  pages={27--34},
  year={2016},
  organization={AAAI Press}
}

Support or Contact

Having trouble with our tools? Please contact Jack and we’ll help you sort it out.


Deep Learning for Character-based Information Extraction on Chinese and Protein Sequence

Title: Deep Learning for Character-based Information Extraction on Chinese and Protein Sequence

  • authors: Yanjun Qi, Sujatha Das, Ronan Collobert, Jason Weston

Paper ECIR

Supplementary Here

Talk: Slide

Abstract

In this paper we introduce a deep neural network architecture to perform information extraction on character-based sequences, e.g. named-entity recognition on Chinese text or secondary-structure detection on protein sequences. With a task-independent architecture, the deep network relies only on simple character-based features, which obviates the need for task-specific feature engineering. The proposed discriminative framework includes three important strategies, (1) a deep learning module mapping characters to vector representations is included to capture the semantic relationship between characters; (2) abundant online sequences (unlabeled) are utilized to improve the vector representation through semi-supervised learning; and (3) the constraints of spatial dependency among output labels are modeled explicitly in the deep architecture. The experiments on four benchmark datasets have demonstrated that, the proposed architecture consistently leads to the state-of-the-art performance.

Citations

@inproceedings{qi2014deep,
  title={Deep learning for character-based information extraction},
  author={Qi, Yanjun and Das, Sujatha G and Collobert, Ronan and Weston, Jason},
  booktitle={European Conference on Information Retrieval},
  pages={668--674},
  year={2014},
  organization={Springer}
}

Support or Contact

Having trouble with our tools? Please contact Yanjun Qi and we’ll help you sort it out.


A unified multitask architecture for predicting local structural properties on proteins

Tool Multitask-ProteinTagging: A unified multitask architecture for predicting local protein properties

Paper

GitHub

Abstract

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

multi

multi

Citations

@article{qi12plosone,
    author = {Qi, , Yanjun AND Oja, , Merja AND Weston, , Jason AND Noble, , William Stafford},
    journal = {PLoS ONE},
    publisher = {Public Library of Science},
    title = {A Unified Multitask Architecture for Predicting Local Protein Properties},
    year = {2012},
    month = {03},
    volume = {7},
    url = {http://dx.doi.org/10.1371%2Fjournal.pone.0032235},
    pages = {e32235},
    number = {3},
    doi = {10.1371/journal.pone.0032235}
}        

Support or Contact

Having trouble with our tools? Please contact Jack and we’ll help you sort it out.