scMUSCL: Multi-Source Transfer Learning for Clustering scRNA-seq Data
scRNA-seq analysis relies heavily on single-cell clustering to perform many downstream functions. Several machine learning methods have been proposed to improve the clustering of single cells, yet most of these methods are fully unsupervised and ignore the wealth of publicly available annotated datasets from single-cell experiments. Cells are high-dimensional entities, and unsupervised clustering might find clusters without biological meaning. Exploiting relevant annotated scRNA-seq dataset as the learning reference can provide an algorithm with the knowledge that guides it to better estimate the number of clusters and find meaningful clusters in the target dataset. Results: In this paper, we propose Single Cell MUlti-Source CLustering, scMUSCL, a novel transfer learning method for finding clusters of cells in a target dataset by transferring knowledge from multiple annotated source (reference) datasets. scMUSCL relies on a deep neural network to extract domain and batch invariant cell