CRISP Blog: Variational Auto-encoder for synthetic training data generation

Bin Li of U. Wisconsin Madison presenting on Wednesday, November 16, 2022 at both 1:00 & 7:00 PM EST

The scale and variety of training data are crucial to the generalizability of deep neural networks. However, obtaining labeled training data can be time-consuming and difficult in specialized domains, such as bioimaging and medical imaging. We proposed a synthetic training data generation framework for data enrichment and augmentation based on deep generative models. Our framework consists of a variational autoencoder (VAE) and a conditional generative adversarial network (cGAN). We demonstrated the use of this framework on an important bioimage analysis task named collagen fiber tracking. The VAE was trained using a limited amount of manually labeled collagen fiber images and was used to generate synthetic collagen fiber centerlines with increased varieties. The cGAN was trained to map the synthetic collagen fiber centerlines into realistic-looking collagen fiber images, resulting in a synthetic training dataset with image-centerline pairs. At last, we trained a U-Net using enriched image-centerline pairs for collagen fiber centerline tracking. Evaluations based on collagen images collected from pancreas, liver, and breast cancer samples show that our pipeline achieves better centerline tracking than several popular fiber centerline tracking tools. The generalizability of the network is further increased when synthetic data is incorporated for training.

Monday, November 14, 2022

Variational Auto-encoder for synthetic training data generation