scib.metrics.nmi
- scib.metrics.nmi(adata, cluster_key, label_key, implementation='arithmetic', nmi_dir=None)
Normalized mutual information
The normalized mutual information is a version of the mutual information corrected by the entropy of clustering and ground truth labels (e.g. cell type). The score ranges between 0 and 1, with 0 representing no sharing and 1 representing perfect sharing of information between clustering and annotated cell labels.
- Parameters:
adata – anndata object with cluster assignments in
adata.obs[cluster_key]
cluster_key – string of column in adata.obs containing cluster assignments
label_key – string of column in adata.obs containing labels
implementation – NMI implementation.
'max'
: scikit method withaverage_method='max'
;'min'
: scikit method withaverage_method='min'
;'geometric'
: scikit method withaverage_method='geometric'
;'arithmetic'
: scikit method withaverage_method='arithmetic'
;'Lancichinetti'
: implementation by A. Lancichinetti 2009 et al. https://sites.google.com/site/andrealancichinetti/mutual;'ONMI'
: implementation by Aaron F. McDaid et al. https://github.com/aaronmcdaid/Overlapping-NMInmi_dir – directory of compiled C code if ‘Lancichinetti’ or ‘ONMI’ are specified as
method
. These packages need to be compiled as specified in the corresponding READMEs.
- Returns:
Normalized mutual information NMI value
This function can be applied to all integration output types. The
adata
must contain cluster assignments that are based off the knn graph given or derived from the integration method output. For this metric you need to include all steps that are needed for clustering. See User Guide for more information on preprocessing.Examples
# feature output scib.pp.reduce_data( adata, n_top_genes=2000, batch_key="batch", pca=True, neighbors=True ) scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype") scib.me.nmi(adata, cluster_key="cluster", label_key="celltype") # embedding output sc.pp.neighbors(adata, use_rep="X_emb") scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype") scib.me.nmi(adata, cluster_key="cluster", label_key="celltype") # knn output scib.me.cluster_optimal_resolution(adata, cluster_key="cluster", label_key="celltype") scib.me.nmi(adata, cluster_key="cluster", label_key="celltype")