Reliable17-Testing and Machine Learning Basics

2 minute read

Presenter	Papers	Paper URL	Our Slides
GaoJi	A few useful things to know about machine learning	PDF	PDF
GaoJi	A few papers related to testing learning, e.g., Understanding Black-box Predictions via Influence Functions	PDF	PDF
GaoJi	Automated White-box Testing of Deep Learning Systems ¹	PDF	PDF
GaoJi	Testing and Validating Machine Learning Classifiers by Metamorphic Testing ²	PDF	PDF
GaoJi	Software testing: a research travelogue (2000–2014)	PDF	PDF

_{^{DeepXplore: Automated Whitebox Testing of Deep Learning Systems / Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana / published in SOSP’17/ Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system’s behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs. We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques. DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model’s accuracy by up to 3%.}} ↩
_{^{Testing and Validating Machine Learning Classifiers by Metamorphic Testing / 2011/ Abstract: Machine learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no ‘‘test oracle’’ to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique ‘‘metamorphic testing’’, which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.}} ↩

Twitter Facebook LinkedIn

Dr. Yanjun Qi

Reliable17-Testing and Machine Learning Basics

You May Also Enjoy

Safety Benchmark WMDP

KV Cache and Tooling

Advanced Transformer Architectures

LLM fine tuning