scib.metrics.silhouette_batch

scib.metrics.silhouette_batch(adata, batch_key, label_key, embed, metric='euclidean', return_all=False, scale=True, verbose=True)

Batch ASW

Modified average silhouette width (ASW) of batch

This metric measures the silhouette of a given batch. It assumes that a silhouette width close to 0 represents perfect overlap of the batches, thus the absolute value of the silhouette width is used to measure how well batches are mixed. For all cells \(i\) of a cell type \(C_j\), the batch ASW of that cell type is:

\[batch \, ASW_j = \frac{1}{|C_j|} \sum_{i \in C_j} |silhouette(i)|\]

The final score is the average of the absolute silhouette widths computed per cell type \(M\).

\[batch \, ASW = \frac{1}{|M|} \sum_{i \in M} batch \, ASW_j\]

For a scaled metric (which is the default), the absolute ASW per group is subtracted from 1 before averaging, so that 0 indicates suboptimal label representation and 1 indicates optimal label representation.

\[batch \, ASW_j = \frac{1}{|C_j|} \sum_{i \in C_j} 1 - |silhouette(i)|\]
Parameters:
  • batch_key – batch labels to be compared against

  • label_key – group labels to be subset by e.g. cell type

  • embed – name of column in adata.obsm

  • metric – see sklearn silhouette score

  • scale – if True, scale between 0 and 1

  • return_all – if True, return all silhouette scores and label means default False: return average width silhouette (ASW)

  • verbose – print silhouette score per group

Returns:

Batch ASW (always) Mean silhouette per group in pd.DataFrame (additionally, if return_all=True) Absolute silhouette scores per group label (additionally, if return_all=True)

The function requires an embedding to be stored in adata.obsm and can only be applied to feature and embedding integration outputs. Please note, that the metric cannot be used to evaluate kNN graph outputs. See User Guide for more information on preprocessing.

Examples

# feature output
scib.pp.reduce_data(
    adata, n_top_genes=2000, batch_key="batch", pca=True, neighbors=False
)
scib.me.silhouette_batch(adata, batch_key="batch", label_key="celltype", embed="X_pca")

# embedding output
scib.me.silhouette_batch(adata, batch_key="batch", label_key="celltype", embed="X_emb")