Qdata Team Research Blog Site

Zhe’s PhD Defense - Toward Out-Of-Distribution Generalization Of Deep Learning Models

2024-04-02T00:00:00-04:00

Ph.D. Dissertation Defense by Zhe Wang,

Tues., 04/02/24, at 12:00PM (ET)

Committee:

Matthew Dwyer, Committee Chair (CS/SEAS/UVA) Yanjun Qi, Advisor (CS/SEAS, SDS, SM/UVA) Miaomiao Zhang (ECE/CS/SEAS/UVA) Jianhui Zhou (Statistics/College/UVA) Vicente Ordonez (CS/Rice University)

Title: Toward Out-Of-Distribution Generalization Of Deep Learning Models

Abstract : Deep learning models, especially deep neural networks (DNNs), perform extremely well when the testing and training distributions align. However, real-world scenarios often witness shifts in data distribution across domains, and tasks, over time, and are influenced by adversarial attacks. Such shifts from the training to testing distribution present challenges, resulting in performance degradation of DNNs. The varied testing distributions from diverse users underscore the urgent necessity to understand OOD problems and design methods to mitigate OOD generalization challenges. Therefore, this dissertation develops methods and strategies to enhance DNNs’ ability to generalize to unseen distributions. First, we focus on generalizing DNNs to unknown domains, in which no prior information about testing domains is available during training. We propose a novel optimization approach that learns principal gradients from eigenvectors of training optimization trajectories. This robust gradient design forces the training to ignore domain-dependent noise signals and updates all training domains with a robust direction covering the main components of parameter dynamics. Second, we focus on designing strategies to generalize DNNs to unseen tasks (i.e. meta-learning), for instance, a new unknown RL task with few demonstration trajectories. The main challenge is to infer the potential identity of a new task from a limited number of annotated samples. We propose modeling a new task’s identity as a stochastic variable and encoding it with a stochastic neural network. This task identity design helps meta-learning to adapt shared training knowledge to a new current task. When solving similar task generalization issues in offline RL, we further propose learning from the RL transition dynamic and reward function to capture a task’s identity. Third, deep learning models should not only perform well on clean, legitimate data distribution but also on data that has been subjected to adversarial attacks. Entering the era of large foundation models, we focus on techniques to craft adversarial attackers for jailbreaking pretrained large language models (LLMs) due to their prevalent recent adoptions. We design a new objective, which learns adversarial suffixes with much cheaper queries and higher attack success rate. The learned suffixes also demonstrate higher transferability across LLMs. In the thesis, we validate the effectiveness of our methods across image classification and completion, wealth index regression from satellite images, robotic control, real-world temperature forecasting, and natural language generation.

Arsh’s PhD Defense - Relational Structure Discovery for Deep Learning

2022-06-29T00:00:00-04:00

Arshdeep Sekhon’s PhD Defense

June 29, 2022.

Committee:

Yanjun Qi (Advisor)(CS/SEAS/UVA) Matthew Dwyer (CS/SEAS/UVA) Yangfeng Ji (CS/SEAS/UVA) Vicente Ordonez (CS/SEAS/UVA) Jianhui Zhou (Department of Statistics/UVA)

Title: Relational Structure Discovery for Deep Learning

Abstract: Graph structure is ubiquitous: from physical relationships to biological interactions to social networks, and many more spread across the universe. Not only is the world around us rich in relational structure, but our mental model of the world is also structured: we think, reason, and communicate in terms of entities and their relations. Such a graph-structured real world calls for artificial intelligence methods that think like humans and hence employ this structure for decision making. Realizing such a framework requires known structure/graph and models that can ingest these non-linear graphical inputs. In cases of a latent unknown graph structure, state-of-the-art deep learning models either focus on task-agnostic statistical dependency learning or diverge from explicit feature dependencies during prediction. We bridge this gap and introduce methods for jointly learning and incorporating graph-based relational knowledge into state-of-the-art deep learning models to help improve (1) predictions, (2) interpretability, (3) post-hoc interpretations, and (4) test datasets selection. Specifically, we contribute methods that enable learning graphical relationships from data without such a ground truth graph. Furthermore, we introduce plug-and-play methods that bias deep learning models to include the learned graph explicitly for improving the aforementioned downstream tasks. We demonstrate our methods’ capabilities on simulated, tabular, NLP, and vision tasks.

JackL’s PhD Defense - Modeling interactions with Deep Learning

2021-07-20T00:00:00-04:00

Ph.D. Dissertation Defense by Jack Lanchantin

Tuesday, July 20th, 2021 at 2:00 PM (ET), via Zoom.

Committee:

Vicente Ordóñez Román, Committee Chair, (CS/SEAS/UVA)
Yanjun Qi, advisor, (CS/SEAS/UVA)
Yangfeng Ji (CS/SEAS/UVA)
Clint Miller (Public Health Sciences/SOM/UVA)
Casey Greene (Biochemistry & Molecular Genetics/SOM/University of Colorado)

Title: Modeling interactions with Deep Learning

Abstract: Interacting systems are highly prevalent in many real-world settings, including genomics, proteomics, and images. The dynamics of complex systems are often explained as a composition of entities and their interaction graphs. In this dissertation, we design state-of-the-art deep neural networks for interaction-oriented representation learning. Learning such structure representations from data can provide semantic clarity, ease of reasoning for generating new knowledge, and potentially causal interpretation. We consider three different types of interactions: 1) interactions within a particular input sample, 2) interactions between multiple input samples, and 3) interactions between output labels. For each type of interaction, we design novel models to tackle a real-world problem and validate our results both quantitatively and visually.

DrQi’s tutorial talk on “Make Deep Learning Interpretable for Sequential Data Analysis in Biomedicine” (Including our work on DeepChrome - AttentiveChrome - GCNChrome - DeepMotif - DeepVHPPI - MotifTransformer)

2021-07-01T00:00:00-04:00

I gave a tutorial talk at UVA-VADC Seminar Series 2021 and at monthly NIH Data Science Showcase seminar.

Title: Make Deep Learning Interpretable for Sequential Data Analysis in Biomedicine

Slide PDF

This tutorial includes four of our recent papers:

Tool DeepChrome: deep-learning for predicting gene expression from histone modifications

Paper: @Bioinformatics
GitHub

Tool AttentiveChrome: Attend and Predict: Using Deep Attention Model to Understand Gene Regulation by Selective Attention on Chromatin

Paper: Published at NeurIPS2017
GitHub

Tool: GCNChrome: Graph Convolutional Networks for Epigenetic State Prediction Using Both Sequence and 3D Genome Data

Paper @Bioinformatics
GitHub: https://github.com/QData/ChromeGCN

Tool: Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences

PDF @ ACM BCB21
GitHub https://github.com/QData/DeepVHPPI

Thanks for reading!

ACM BCB - Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences

2021-06-11T00:00:00-04:00

Title: Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences

authors: Jack Lanchantin, Tom Weingarten, Arshdeep Sekhon, Clint Miller, Yanjun Qi
2021 ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB)

PDF @ BioArxiv

GitHub https://github.com/QData/DeepVHPPI

Talk: Slide

Abstract

Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and human proteins. However, experimental methods to find protein interactions are inadequate: large scale experiments are noisy, and small scale experiments are slow and expensive. Inspired by the recent successes of deep neural networks, we hypothesize that deep learning methods are well-positioned to aid and augment biological experiments, hoping to help identify more accurate virus-host protein interaction maps. Moreover, computational methods can quickly adapt to predict how virus mutations change protein interactions with the host proteins.

We propose DeepVHPPI, a novel deep learning framework combining a self-attention-based transformer architecture and a transfer learning training strategy to predict interactions between human proteins and virus proteins that have novel sequence patterns. We show that our approach outperforms the state-of-the-art methods significantly in predicting Virus–Human protein interactions for SARS-CoV-2, H1N1, and Ebola. In addition, we demonstrate how our framework can be used to predict and interpret the interactions of mutated SARS-CoV-2 Spike protein sequences.

We make all of our data and code available on GitHub https://github.com/QData/DeepVHPPI.

Citations

@article {Lanchantin2020.12.14.422772,
	author = {Lanchantin, Jack and Weingarten, Tom and Sekhon, Arshdeep and Miller, Clint and Qi, Yanjun},
	title = {Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences},
	elocation-id = {2020.12.14.422772},
	year = {2021},
	doi = {10.1101/2020.12.14.422772},
	publisher = {Cold Spring Harbor Laboratory},
	abstract = {Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and human proteins. However, experimental methods to find protein interactions are inadequate: large scale experiments are noisy, and small scale experiments are slow and expensive. Inspired by the recent successes of deep neural networks, we hypothesize that deep learning methods are well-positioned to aid and augment biological experiments, hoping to help identify more accurate virus-host protein interaction maps. Moreover, computational methods can quickly adapt to predict how virus mutations change protein interactions with the host proteins.We propose DeepVHPPI, a novel deep learning framework combining a self-attention-based transformer architecture and a transfer learning training strategy to predict interactions between human proteins and virus proteins that have novel sequence patterns. We show that our approach outperforms the state-of-the-art methods significantly in predicting Virus{\textendash}Human protein interactions for SARS-CoV-2, H1N1, and Ebola. In addition, we demonstrate how our framework can be used to predict and interpret the interactions of mutated SARS-CoV-2 Spike protein sequences.Availability We make all of our data and code available on GitHub https://github.com/QData/DeepVHPPI.ACM Reference Format Jack Lanchantin, Tom Weingarten, Arshdeep Sekhon, Clint Miller, and Yanjun Qi. 2021. Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences. In Proceedings of ACM Conference (ACM-BCB). ACM, New York, NY, USA, 10 pages. https://doi.org/??Competing Interest StatementThe authors have declared no competing interest.},
	URL = {https://www.biorxiv.org/content/early/2021/06/08/2020.12.14.422772},
	eprint = {https://www.biorxiv.org/content/early/2021/06/08/2020.12.14.422772.full.pdf},
	journal = {bioRxiv}
}

Support or Contact

Having trouble with our tools? Please contact Jack and we’ll help you sort it out.

Dr Qi’s Invited Talks on textattack

2021-04-13T00:00:00-04:00

On June 24th, 2021, I gave an invited talk at the Science Academy Machine Learning Summer School on “TextAttack: Generalizing Adversarial Examples to

Natural Language Processing”

TalkSlide

Previous version of the tutorial: On April 14 2021, I gave an invited talk at the UVA Human and Machine Intelligence Seminar:

TalkSlide

CVPR - General Multi-label Image Classification with Transformers

2021-03-01T00:00:00-05:00

Title: General Multi-label Image Classification with Transformers

Paper ArxivVersion

GitHub: https://github.com/QData/C-Tran

Abstract

Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image. In this work we propose the Classification Transformer (C-Tran), a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels. Our approach consists of a Transformer encoder trained to predict a set of target labels given an input set of masked labels, and visual features from a convolutional neural network. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels as positive, negative, or unknown during training. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome. Moreover, because our model explicitly represents the uncertainty of labels during training, it is more general by allowing us to produce improved results for images with partial or extra label annotations during inference. We demonstrate this additional capability in the COCO, Visual Genome, News500, and CUB image datasets.

Citations

@article{lanchantin2020general,
      title={General Multi-label Image Classification with Transformers}, 
      author={Jack Lanchantin and Tianlu Wang and Vicente Ordonez and Yanjun Qi},
      year={2020},
      eprint={2011.14027},
      archivePrefix={arXiv, CVPR2021},
      primaryClass={cs.CV}
}

Support or Contact

Having trouble with our tools? Please contact Jack Lanchantin and we’ll help you sort it out.

AAAI - Curriculum Labeling- Self-paced Pseudo-Labeling for Semi-Supervised Learning

2021-01-11T00:00:00-05:00

Title: Curriculum Labeling- Self-paced Pseudo-Labeling for Semi-Supervised Learning”

at the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) (acceptance rate: 21%))

authors: Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez

Paper Arxiv

Abstract

In this paper we revisit the idea of pseudo-labeling in the context of semi-supervised learning where a learning algorithm has access to a small set of labeled samples and a large set of unlabeled samples. Pseudo-labeling works by applying pseudo-labels to samples in the unlabeled set by using a model trained on the combination of the labeled samples and any previously pseudo-labeled samples, and iteratively repeating this process in a self-training cycle. Current methods seem to have abandoned this approach in favor of consistency regularization methods that train models under a combination of different styles of self-supervised losses on the unlabeled samples and standard supervised losses on the labeled samples. We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We identify two key factors that allow pseudo-labeling to achieve such remarkable results (1) applying curriculum learning principles and (2) avoiding concept drift by restarting model parameters before each self-training cycle. We obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of the labeled samples. The code is available at following https URL

code

Citations

@misc{grigsby2020measuring,
      title={Measuring Visual Generalization in Continuous Control from Pixels}, 
      author={Jake Grigsby and Yanjun Qi},
      year={2020},
      eprint={2010.06740},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Support or Contact

Having trouble with our tools? Please contact Jake and we’ll help you sort it out.

NeurIPS - Measuring Visual Generalization in Continuous Control from Pixels

2020-12-11T00:00:00-05:00

Title: Measuring Visual Generalization in Continuous Control from Pixels

authors: Jake Grigsby, Yanjun Qi

Paper Arxiv

Code Here

Abstract

Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents in continuous control tasks. However, it is still unclear whether current techniques can face a variety of visual conditions required by real-world environments. We propose a challenging benchmark that tests agents’ visual generalization by adding graphical variety to existing continuous control domains. Our empirical analysis shows that current methods struggle to generalize across a diverse set of visual changes, and we examine the specific factors of variation that make these tasks difficult. We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization \footnote{The benchmark and our augmented actor-critic implementation are open-sourced @ this https URL)

Citations

@misc{grigsby2020measuring,
      title={Measuring Visual Generalization in Continuous Control from Pixels}, 
      author={Jake Grigsby and Yanjun Qi},
      year={2020},
      eprint={2010.06740},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Support or Contact

Having trouble with our tools? Please contact Jake and we’ll help you sort it out.

EMNLP - Benchmarking Search Algorithms for Generating NLP Adversarial Examples

2020-10-01T00:00:00-04:00

Title: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

Abstract: We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search methods are proposed in past work, the attack search space is often modified alongside the search method. Without ablation studies benchmarking the search algorithm change with the search space held constant, an increase in attack success rate could from an improved search method or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms’ run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise.

Citations:

@misc{yoo2020searching,
    title={Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples}, 
    author={Jin Yong Yoo and John X. Morris and Eli Lifland and Yanjun Qi},
    year={2020},
    eprint={2009.06368},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Our Paper in EMNLP BlackBoxNLP.

Our search benchmarking result Github : https://github.com/QData/TextAttack-Search-Benchmark

Benchmarking Attack Recipes

As we emphasized in the above paper, we don’t recommend to directly compare Attack Recipes out of the box.
This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or transformation method or a less restrictive search space.