Classification With Noisy Labels
Abstract: This work addresses the problem of learning a probabilistic classification model where in the learning stage the given labels are not reliable.
We concentrate on the case of logistic regression classifier and we define a noisy channel model to describe the relation between the unobserved true labels and the observed noisy labels. The parameter estimation is formed as an instance of the maximum-likelihood principle. Since we view the correct labels as hidden random variables it is natural to apply the EM algorithm for this case to learn both the classifier parameters and the noise process parameters.
In the E-step we estimate the unknown labels given the features and the noisy labels based on the current model parameters. In the M-step we update the model parameters. The noise parameters can be updated by a closed-form formula. The updated classifier parameters can be found by a soft version of the standard algorithm for training a logistic classifier.
We apply the proposed algorithm to two problems. The first problem, related to the area of computational genetics, is finding whether an intron with unknown functionality might actually be functional.
The second problem, related to the medical imaging area, is differentiating between malignant and benign tumors based on their appearance in the CC and MLO mammography views. We show that in both cases the proposed algorithm helps solving the task and shows improved results.
* This work was carried out under the supervision of Prof. Jacob Goldberger Faculty of Engineering, Bar-Ilan University as part of the research for my