From single-cell differential expression analysis to differential transcriptome analysis with kernel based testing.

Nom de l'orateur
Anthony Ozier-Lafontaine
Etablissement de l'orateur
LMJL
Date et heure de l'exposé
Lieu de l'exposé
Salle des Séminaires

Single-cell RNA sequencing (scRNAseq) is a high-throughput technology quantifying gene expression at the single-cell level, for thousands of cells and tens of thousands of genes. A major statistical challenge in scRNAseq data analysis is to distinguish biological information from technical noise in order to compare conditions or tissues. Differential Expression Analysis (DEA) is usually performed with univariate two-sample tests and thus does not account for the multivariate aspect of scRNAseq data that carries information about gene dependencies and underlying regulatory networks and pathways. Applying multivariate two-sample tests would allow to perform Differential Transcriptome Analysis (DTA), to assess for the global similarity of the compared datasets.

We propose a kernel based two-sample test that can be used for DEA as well as for DTA. The Maximum Mean Discrepency (MMD) test is the most famous kernel two-sample test [1], it is based on the distance between the mean embeddings of the empirical distributions in an high-dimensional feature space, obtained through a non-linear embedding called the feature map. Our package implements a normalized version of the MMD test derived from the non-linear classification method KFDA [2], then regularized by a kernel PCA-like dimension reduction [3]. Besides reaching state of the art performances in DEA with competitive computational cost, the non-linear discriminant transformation obtained from the KFDA approach offers visualization tools highlighting the main differences between the two conditions in terms of cells, allowing to identify condition-specific sub-populations.

[1] Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A Kernel Method for the Two-Sample-Problem. page 8, 2007.

[2] Zaid Harchaoui, Francis Bach, and Eric Moulines. Testing for Homogeneity with Kernel Fisher Discriminant Analysis. arXiv:0804.1026 [stat], April 2008. arXiv: 0804.1026.

[3] Zaid Harchaoui, Felicien Vallet, Alexandre Lung-Yut-Fong, and Olivier Cappe. A regularized kernel-based approach to unsupervised audio segmentation. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1665–1668, Taipei, Taiwan, April 2009. IEEE.

For kernel methods:

[4] Le cours de Jean-Philippe Vert et Julien Mairal https://members.cbio.mines-paristech.fr/~jvert/svn/kernelcourse/course/2021mva/index.html

[5] Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Sch ̈olkopf. Kernel Mean Embedding of Distributions: A Review and Beyond. Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017. arXiv: 1605.09522.