Metrics

This package contains all the metrics used for benchmarking scRNA-seq data integration performance. The metrics can be classified into biological conservation and batch removal metrics. For a detailed description of the metrics implemented in this package, please see our publication.

Biological Conservation Metrics

Biological conservation metrics quantify either the integrity of cluster-based metrics based on clustering results of the integration output, or the difference in the feature spaces of integrated and unintegrated data. Each metric is scaled to a value ranging from 0 to 1 by default, where larger scores represent better conservation of the biological aspect that the metric addresses.

hvg_overlap(adata_pre, adata_post, batch[, ...])

Highly variable gene overlap

silhouette(adata, group_key, embed[, ...])

Average silhouette width (ASW)

isolated_labels(adata, label_key, batch_key, ...)

Isolated label score

nmi(adata, group1, group2[, method, nmi_dir])

Normalized mutual information

ari(adata, group1, group2[, implementation])

Adjusted Rand Index

cell_cycle(adata_pre, adata_post, batch_key)

Cell cycle conservation score

trajectory_conservation(adata_pre, ...[, ...])

Trajectory conservation score

clisi_graph(adata, batch_key, label_key[, ...])

Cell-type LISI (cLISI) score

Batch Correction Metrics

Batch correction metrics values are scaled by default between 0 and 1, in which larger scores represent better batch removal.

graph_connectivity(adata, label_key)

Graph Connectivity

silhouette_batch(adata, batch_key, ...[, ...])

Batch ASW

pcr_comparison(adata_pre, adata_post, covariate)

Principal component regression score

kBET(adata, batch_key, label_key[, scaled, ...])

kBET score

ilisi_graph(adata, batch_key[, k0, type_, ...])

Integration LISI (iLISI) score

Metrics Wrapper Functions

For convenience, scib provides wrapper functions that, given integrated and unintegrated adata objects, apply multiple metrics and return all the results in a pandas.Dataframe. The main function is metrics(), that provides all the parameters for the different metrics.

scib.metrics.metrics(adata, adata_int, ari=True, nmi=True)

The remaining functions call the metrics() for

Furthermore, metrics() is wrapped by convenience functions with preconfigured subsets of metrics based on expected computation time:

metrics(adata, adata_int, batch_key, label_key)

Master metrics function

metrics_fast(adata, adata_int, batch_key, ...)

Only metrics with minimal preprocessing and runtime

metrics_slim(adata, adata_int, batch_key, ...)

All metrics apart from kBET and LISI scores

metrics_all(adata, adata_int, batch_key, ...)

All metrics

Auxiliary Functions

lisi_graph(adata, batch_key, label_key, **kwargs)

cLISI and iLISI scores

pcr(adata, covariate[, embed, n_comps, ...])

Principal component regression for anndata object

pc_regression(data, covariate[, pca_var, ...])

Principal component regression