Robust Learning under Label Noise with Deep Neural Networks (PhD Dissertation)


Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. In the presence of noisy labels, the generalization performance of deep neural networks drastically falls down owing to their high capacity to overfit any noise labels. This overfitting issue still remains even with various conventional regularization techniques, such as dropout and batch normalization. Therefore, learning from noisy labels (robust training) has recently become one of the most active research topics in the machine learning community. In the first part, we provide the problem statement for supervised learning with noisy labels, followed by a thorough survey on the advance in recent deep learning techniques for overcoming noisy labels; we surveyed recent studies by recursively tracking relevant bibliographies in papers published at premier research conferences. Throughout this survey, we note that the main research effort has been made to answer the two following questions - (1) how to minimize the negative in uence of false-labeled samples by adjusting their loss values? and (2) how to identify true-labeled samples from noisy data?, both of which have been well-explored respectively by the two research directions, namely, loss adjustment and sample selection. In the second part, we mainly focus on understanding the pros and cons of the aforementioned research directions and, subsequently, propose a hybrid learning approach called SELFIE that takes advantage of both loss adjustment and sample selection. For the hybrid approach, a new concept of a refurbishable sample is introduced to classify the sample whose loss can be correctly adjusted with high precision. The loss of refurbishable samples is adjusted First and then combined with that of the samples chosen by a representative sample selection criterion called small-loss trick. To validate the superiority of SELFIE, we conducted extensive experimentation using both real-world or synthetic noisy datasets. The results empirically verify that SELFIE significantly outperforms state-of-the-art methods in test error by up to 10.5 percentage point. In the third part, we take a closer look at the small-loss trick adopted by SELFIE for sample selection. We argue that the trick misclassifies many false-labeled samples as clean samples in realistic noise. Hence, we present a new sample selection method called Prestopping, which derives a collection of true-labeled samples by using the early stopping mechanism. Prestopping obtains an initial safe set by stopping its learning process before the network begins to rapidly memorize false-labeled samples and, subsequently, resumes training to improve the quality and quantity of the set gradually. Compared with state-of-the-art methods including SELFIE, Prestopping further improves the test error by up to 18.1 percentage point on four real-world or synthetic noisy datasets. The main technical challenge in Prestopping is determining the best stop point for its phase transition (we call it a best transition point). In Prestopping, a clean validation set or a known true noise rate is used for supervision, but they are usually hard to acquire in practice. In the last part, we introduce a novel self-transitional learning approach called MORPH, which automatically switches its learning phase at the best transition point without any supervision. Extensive experiments using five benchmark datasets demonstrate that only MORPH succeeds to construct a collection of almost true-labeled samples in a wide range of noise types. We leave the incorporation of SELFIE with MORPH as future work.

Korea Advanced Institute of Science and Technology