scib.metrics.nmi

scib.metrics.nmi(adata, cluster_key, label_key, implementation='arithmetic', nmi_dir=None)

Normalized mutual information

The normalized mutual information is a version of the mutual information corrected by the entropy of clustering and ground truth labels (e.g. cell type). The score ranges between 0 and 1, with 0 representing no sharing and 1 representing perfect sharing of information between clustering and annotated cell labels.

Parameters:

adata – anndata object with cluster assignments in adata.obs[cluster_key]
cluster_key – string of column in adata.obs containing cluster assignments
label_key – string of column in adata.obs containing labels
implementation – NMI implementation. 'max': scikit method with average_method='max'; 'min': scikit method with average_method='min'; 'geometric': scikit method with average_method='geometric'; 'arithmetic': scikit method with average_method='arithmetic'; 'Lancichinetti': implementation by A. Lancichinetti 2009 et al. https://sites.google.com/site/andrealancichinetti/mutual; 'ONMI': implementation by Aaron F. McDaid et al. https://github.com/aaronmcdaid/Overlapping-NMI
nmi_dir – directory of compiled C code if ‘Lancichinetti’ or ‘ONMI’ are specified as method. These packages need to be compiled as specified in the corresponding READMEs.

Returns:

Normalized mutual information NMI value

This function can be applied to all integration output types. The adata must contain cluster assignments that are based off the knn graph given or derived from the integration method output. For this metric you need to include all steps that are needed for clustering. See User Guide for more information on preprocessing.

Examples

# feature output
scib.pp.reduce_data(
    adata, n_top_genes=2000, batch_key="batch", pca=True, neighbors=True
)
scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype")
scib.me.nmi(adata, cluster_key="cluster", label_key="celltype")

# embedding output
sc.pp.neighbors(adata, use_rep="X_emb")
scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype")
scib.me.nmi(adata, cluster_key="cluster", label_key="celltype")

# knn output
scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype")
scib.me.nmi(adata, cluster_key="cluster", label_key="celltype")