scib.metrics.metrics

scib.metrics.metrics(adata, adata_int, batch_key, label_key, embed='X_pca', cluster_key='cluster', cluster_nmi=None, ari_=False, nmi_=False, nmi_method='arithmetic', nmi_dir=None, silhouette_=False, si_metric='euclidean', pcr_=False, cell_cycle_=False, organism='mouse', hvg_score_=False, isolated_labels_=False, isolated_labels_f1_=False, isolated_labels_asw_=False, n_isolated=None, graph_conn_=False, trajectory_=False, kBET_=False, lisi_graph_=False, ilisi_=False, clisi_=False, subsample=0.5, n_cores=1, type_=None, verbose=False)

Master metrics function

Wrapper for all metrics used in the study. Compute of all metrics given unintegrated and integrated anndata object

Parameters:
  • adata – unintegrated, preprocessed anndata object

  • adata_int – integrated anndata object

  • batch_key – name of batch column in adata.obs and adata_int.obs

  • label_key – name of biological label (cell type) column in adata.obs and adata_int.obs

  • embed

    embedding representation of adata_int

    Used for:

    • silhouette scores (label ASW, batch ASW),

    • PC regression,

    • cell cycle conservation,

    • isolated label scores, and

    • kBET

  • cluster_key – name of column to store cluster assignments. Will be overwritten if it exists

  • cluster_nmi – Where to save cluster resolutions and NMI for optimal clustering If None, these results will not be saved

  • ari_ – whether to compute ARI using ari()

  • nmi_ – whether to compute NMI using nmi()

  • nmi_method – which implementation of NMI to use

  • nmi_dir – directory of NMI code for some implementations of NMI

  • silhouette_ – whether to compute the average silhouette width scores for labels and batch using silhouette() and silhouette_batch()

  • si_metric – which distance metric to use for silhouette scores

  • pcr_ – whether to compute principal component regression using pc_comparison()

  • cell_cycle_ – whether to compute cell cycle score conservation using cell_cycle()

  • organism – organism of the datasets, used for computing cell cycle scores on gene names

  • hvg_score_ – whether to compute highly variable gene conservation using hvg_overlap()

  • isolated_labels_ – whether to compute both isolated label scores using isolated_labels()

  • isolated_labels_f1_ – whether to compute isolated label score based on F1 score of clusters vs labels using isolated_labels()

  • isolated_labels_asw_ – whether to compute isolated label score based on ASW (average silhouette width) using isolated_labels()

  • n_isolated – maximum number of batches per label for label to be considered as isolated

  • graph_conn_ – whether to compute graph connectivity score using graph_connectivity()

  • trajectory_ – whether to compute trajectory score using trajectory_conservation()

  • kBET_ – whether to compute kBET score using kBET()

  • lisi_graph_ – whether to compute both cLISI and iLISI using lisi_graph()

  • clisi_ – whether to compute cLISI using clisi_graph()

  • ilisi_ – whether to compute iLISI using ilisi_graph()

  • subsample – subsample fraction for LISI scores

  • n_cores – number of cores to be used for LISI functions

  • type_ – one of ‘full’, ‘embed’ or ‘knn’ (used for kBET and LISI scores)