scib.metrics.kBET

scib.metrics.kBET(adata, batch_key, label_key, type_, embed=None, scaled=True, return_df=False, verbose=False)

kBET score

Compute the average of k-nearest neighbour batch effect test (kBET) score per label. This is a wrapper function of the implementation by Büttner et al. 2019. kBET measures the bias of a batch variable in the kNN graph. Specifically, kBET is quantified as the average rejection rate of Chi-squared tests of local vs global batch label distributions. This means that smaller values indicate better batch mixing. By default the original kBET score is scaled between 0 and 1 so that larger scores are associated with better batch mixing.

Parameters:
  • adata – anndata object to compute kBET on

  • batch_key – name of batch column in adata.obs

  • label_key – name of cell identity labels column in adata.obs

  • type_ – type of data integration, one of ‘knn’, ‘embed’ or ‘full’

  • embed – embedding key in adata.obsm for embedding and feature input

  • scaled – whether to scale between 0 and 1 with 0 meaning low batch mixing and 1 meaning optimal batch mixing if scaled=False, 0 means optimal batch mixing and 1 means low batch mixing

Returns:

kBET score (average of kBET per label) based on observed rejection rate. If return_df=True, also return a pd.DataFrame with kBET observed rejection rate per cluster

This function can be applied to all integration output types and recomputes the kNN graph for feature and embedding output with specific parameters. Thus, no preprocessing is required, but the correct output type must be specified in type_.

Examples

# full feature integration output or unintegrated data
scib.me.kBET(
    adata, batch_key="batch", label_key="celltype", type_="full", embed="X_pca"
)

# embedding output
scib.me.kBET(
    adata, batch_key="batch", label_key="celltype", type_="embed", embed="X_emb"
)

# kNN output
scib.me.kBET(adata, batch_key="batch", label_key="celltype", type_="knn")