Batch correction methods used in single cell RNA-sequencing analyses are often poorly calibrated
As the number of experiments that employ single-cell RNA-sequencing (scRNA-seq) grows it opens up the possibility of combining results across experiments or processing cells from the same experiment assayed in separate sequencing runs. The gain in the number of cells that can be compared comes at the cost of batch effects that may be present. Several methods have been proposed to combat this for scRNA-seq datasets. We compared seven widely used method used for batch correction of scRNA-seq datasets. We present a novel approach to measure the degree to which the methods alter the data in the process of batch correction, both at the fine scale comparing distances between cells as well as measuring effects observed across clusters of cells. We demonstrate that many of the published method are poorly calibrated in the sense that the process of correction creates measurable artifacts in the data. In particular, MNN, SCVI and LIGER performed poorly in our tests, often altering the data consi