Deep feature extraction of single-cell transcriptomes by generative adversarial network

scGAN

Abstract

Single-cell RNA-sequencing (scRNA-seq) offers the opportunity to dissect heterogeneous cellular compositions and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition designs. Here, we present a single-cell Generative Adversarial Network (scGAN) to simultaneously acquire patterns from raw data while minimizing the confounding effect driven by technical artifacts or other factors inherent to the data. Specifically, scGAN models the data likelihood of the raw scRNA-seq counts by projecting each cell onto a latent embedding. Meanwhile, scGAN attempts to minimize the correlation between the latent embeddings and the batch labels across all cells. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder. The scGAN code and the information for the public scRNA-seq datasets are available here.

Publication
Bioinformatics
Mojtaba Bahrami
Mojtaba Bahrami
PhD Student in Machine Learning and Computational Biology

My research interests include machine learning, compuational biology, and single-cell genomics.