scib.metrics.silhouette_batch
- scib.metrics.silhouette_batch(adata, batch_key, label_key, embed, metric='euclidean', return_all=False, scale=True, verbose=True)
Batch ASW
Modified average silhouette width (ASW) of batch
This metric measures the silhouette of a given batch. It assumes that a silhouette width close to 0 represents perfect overlap of the batches, thus the absolute value of the silhouette width is used to measure how well batches are mixed. For all cells \(i\) of a cell type \(C_j\), the batch ASW of that cell type is:
\[batch \, ASW_j = \frac{1}{|C_j|} \sum_{i \in C_j} |silhouette(i)|\]The final score is the average of the absolute silhouette widths computed per cell type \(M\).
\[batch \, ASW = \frac{1}{|M|} \sum_{i \in M} batch \, ASW_j\]For a scaled metric (which is the default), the absolute ASW per group is subtracted from 1 before averaging, so that 0 indicates suboptimal label representation and 1 indicates optimal label representation.
\[batch \, ASW_j = \frac{1}{|C_j|} \sum_{i \in C_j} 1 - |silhouette(i)|\]- Parameters:
batch_key – batch labels to be compared against
label_key – group labels to be subset by e.g. cell type
embed – name of column in adata.obsm
metric – see sklearn silhouette score
scale – if True, scale between 0 and 1
return_all – if True, return all silhouette scores and label means default False: return average width silhouette (ASW)
verbose – print silhouette score per group
- Returns:
Batch ASW (always) Mean silhouette per group in pd.DataFrame (additionally, if return_all=True) Absolute silhouette scores per group label (additionally, if return_all=True)
The function requires an embedding to be stored in
adata.obsm
and can only be applied to feature and embedding integration outputs. Please note, that the metric cannot be used to evaluate kNN graph outputs. See User Guide for more information on preprocessing.Examples
# feature output scib.pp.reduce_data( adata, n_top_genes=2000, batch_key="batch", pca=True, neighbors=False ) scib.me.silhouette_batch(adata, batch_key="batch", label_key="celltype", embed="X_pca") # embedding output scib.me.silhouette_batch(adata, batch_key="batch", label_key="celltype", embed="X_emb")