(Bin Li presenting on June 30 at 1:00 PM and 7:00 PM Eastern Time)
Whole slide imaging is one of the essential tools in modern histopathology and presents a great opportunity for machine learning applications. Whole slide images (WSIs) have very high resolutions and usually lack localized patch-level annotations. Consequently, the classification of WSIs is often cast as a multiple instance learning (MIL) problem, where a patch-based classifier is trained using slide-level labels. We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations. First, we introduce a novel MIL aggregator that models the relations of the instances in a dual-stream architecture with trainable distance measurement. Second, since WSIs can produce large or unbalanced bags that hinder the training of MIL models, we propose to use self-supervised contrastive learning to extract good representations for MIL and alleviate the issue of prohibitive memory cost for large bags. We demonstrate that self-supervised contrastive learning can produce robust representations for MIL without the need for patch-level labels, alleviating the issue of prohibitive memory cost for training deep models. Our model offers significant improvement in classification accuracy for both unbalanced and balanced WSI datasets and under both binary and multiple class MIL settings. Additional benchmark results on standard MIL datasets further demonstrate the superior performance of our MIL aggregator on general MIL problems. Collaborating with hardware and computer vision groups, we demonstrate the use of WSI classification as a driving use case for developing deep learning accelerators and efficient training paradigms for related problems such as video and language analysis.