A Novel Approach to Semi-Supervised Statistical Machine Learning (2023–2026)
Abstract:
Recent successes in the construction of classifiers for making diagnoses and predictions are due in part to their
using much data labelled with respect to their class of origin. But typically there are little labelled data but plentiful
unlabelled data. The goal of semi-supervised learning (SSL) is to leverage large amounts of unlabelled data to improve the performance using only small labelled datasets and so SSL is of paramount importance to
applications where it is expensive or impractical to obtain much labelled data. The project is to develop a novel
SSL approach that adopts a missingness mechanism for the missing labels to build a classifier that not only
improves accuracy but it can be greater than if the missing labels were known.