Metrics

This package contains all the metrics used for benchmarking scRNA-seq data integration performance. The metrics can be classified into biological conservation and batch removal metrics. For a detailed description of the metrics implemented in this package, please see our publication.

Biological Conservation Metrics

Biological conservation metrics quantify either the integrity of cluster-based metrics based on clustering results of the integration output, or the difference in the feature spaces of integrated and unintegrated data. Each metric is scaled to a value ranging from 0 to 1 by default, where larger scores represent better conservation of the biological aspect that the metric addresses.

`hvg_overlap`(adata_pre, adata_post, batch[, ...])	Highly variable gene overlap
`silhouette`(adata, group_key, embed[, ...])	Average silhouette width (ASW)
`isolated_labels`(adata, label_key, batch_key, ...)	Isolated label score
`nmi`(adata, group1, group2[, method, nmi_dir])	Normalized mutual information
`ari`(adata, group1, group2[, implementation])	Adjusted Rand Index
`cell_cycle`(adata_pre, adata_post, batch_key)	Cell cycle conservation score
`trajectory_conservation`(adata_pre, ...[, ...])	Trajectory conservation score
`clisi_graph`(adata, batch_key, label_key[, ...])	Cell-type LISI (cLISI) score

Batch Correction Metrics

Batch correction metrics values are scaled by default between 0 and 1, in which larger scores represent better batch removal.

`graph_connectivity`(adata, label_key)	Graph Connectivity
`silhouette_batch`(adata, batch_key, ...[, ...])	Batch ASW
`pcr_comparison`(adata_pre, adata_post, covariate)	Principal component regression score
`kBET`(adata, batch_key, label_key[, scaled, ...])	kBET score
`ilisi_graph`(adata, batch_key[, k0, type_, ...])	Integration LISI (iLISI) score

Metrics Wrapper Functions

For convenience, scib provides wrapper functions that, given integrated and unintegrated adata objects, apply multiple metrics and return all the results in a pandas.Dataframe. The main function is metrics(), that provides all the parameters for the different metrics.

scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)

The remaining functions call the metrics() for

Furthermore, metrics() is wrapped by convenience functions with preconfigured subsets of metrics based on expected computation time:

metrics_fast() only computes metrics that require little preprocessing
metrics_slim() includes all functions of metrics_fast() and adds clustering-based metrics
metrics_all() includes all metrics

`metrics`(adata, adata_int, batch_key, label_key)	Master metrics function
`metrics_fast`(adata, adata_int, batch_key, ...)	Only metrics with minimal preprocessing and runtime
`metrics_slim`(adata, adata_int, batch_key, ...)	All metrics apart from kBET and LISI scores
`metrics_all`(adata, adata_int, batch_key, ...)	All metrics

Auxiliary Functions

`lisi_graph`(adata, batch_key, label_key, **kwargs)	cLISI and iLISI scores
`pcr`(adata, covariate[, embed, n_comps, ...])	Principal component regression for anndata object
`pc_regression`(data, covariate[, pca_var, ...])	Principal component regression