API === Preprocessing ------------- .. currentmodule:: scib.preprocessing Preprocessing functions are relevant both for preparing the data for integration as well as postprocessing the integration output. The most relevant preprocessing steps are: + Normalization + Scaling, batch-aware + Highly variable gene selection, batch-aware + Cell cycle scoring + Principle component analysis (PCA) + k-nearest neighbor graph (kNN graph) + UMAP + Clustering Note that some preprocessing steps depend on each other. Please refer to the `Single Cell Best Practices Book`_ for more details. .. _Single Cell Best Practices Book: https://www.sc-best-practices.org .. autosummary:: :toctree: api/ normalize scale_batch hvg_intersect hvg_batch score_cell_cycle get_cell_cycle_genes reduce_data Integration ----------- Integration method functions require the preprocessed ``anndata`` object (here ``adata``) and the name of the batch column in ``adata.obs`` (here ``'batch'``). The methods can be called using the following, where ``integration_method`` is the name of the integration method. .. code-block:: python scib.ig.integration_method(adata, batch="batch") For example, in order to run Scanorama, on a dataset, call: .. code-block:: python scib.ig.scanorama(adata, batch="batch") .. warning:: The following notation is deprecated. .. code-block:: python scib.integration.runIntegrationMethod(adata, batch="batch") Please use the snake_case naming without the ``run`` prefix. Some integration methods (e.g. :func:`~scib.integration.scgen`, :func:`~scib.integration.scanvi`) also use cell type labels as input. For these, you need to additionally provide the corresponding label column of ``adata.obs`` (here ``cell_type``). .. code-block:: python scib.ig.scgen(adata, batch="batch", cell_type="cell_type") scib.ig.scanvi(adata, batch="batch", labels="cell_type") .. automodapi:: scib.integration :no-heading: :skip: runBBKNN :skip: runCombat :skip: runMNN :skip: runDESC :skip: runSaucie :skip: runScanorama :skip: runScanvi :skip: runScGen :skip: runScvi :skip: runTrVae :skip: runTrVaep :skip: issparse Clustering ---------- .. currentmodule:: scib.metrics After integration, one of the first ways to determine the quality of the integration is to cluster the integrated data and compare the clusters to the original annotations. This is exactly what some of the metrics do. .. autosummary:: :toctree: api/ cluster_optimal_resolution get_resolutions opt_louvain Metrics ------- .. currentmodule:: scib.metrics This package contains all the metrics used for benchmarking scRNA-seq data integration performance. They can be applied on the integrated as well as the unintegrated data and can be classified into biological conservation and batch removal metrics. For a detailed description of the metrics implemented in this package, please see our `publication`_. .. _publication: https://doi.org/10.1038/s41592-021-01336-8 Most metrics require specific inputs that need to be preprocessed, which is described in detail under :ref:`preprocessing`. Biological Conservation Metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Biological conservation metrics quantify either the integrity of cluster-based metrics based on clustering results of the integration output, or the difference in the feature spaces of integrated and unintegrated data. Each metric is scaled to a value ranging from 0 to 1 by default, where larger scores represent better conservation of the biological aspect that the metric addresses. .. autosummary:: :toctree: api/ ari cell_cycle clisi_graph hvg_overlap isolated_labels_asw isolated_labels_f1 nmi silhouette trajectory_conservation Batch Correction Metrics ^^^^^^^^^^^^^^^^^^^^^^^^ Batch correction metrics values are scaled by default between 0 and 1, in which larger scores represent better batch removal. .. autosummary:: :toctree: api/ graph_connectivity ilisi_graph kBET pcr_comparison silhouette_batch Metrics Wrapper Functions ^^^^^^^^^^^^^^^^^^^^^^^^^ For convenience, ``scib`` provides wrapper functions that, given integrated and unintegrated adata objects, apply multiple metrics and return all the results in a ``pandas.Dataframe``. The main function is :func:`~scib.metrics.metrics`, that provides all the parameters for the different metrics. .. code-block:: python scib.metrics.metrics(adata, adata_int, ari=True, nmi=True) The remaining functions call the :func:`~scib.metrics.metrics` for Furthermore, :func:`~scib.metrics.metrics()` is wrapped by convenience functions with preconfigured subsets of metrics based on expected computation time: + :func:`~scib.metrics.metrics_fast()` only computes metrics that require little preprocessing + :func:`~scib.metrics.metrics_slim()` includes all functions of :func:`~scib.metrics.metrics_fast()` and adds clustering-based metrics + :func:`~scib.metrics.metrics_all()` includes all metrics .. autosummary:: :toctree: api/ metrics metrics_fast metrics_slim metrics_all Auxiliary Functions ^^^^^^^^^^^^^^^^^^^ Some parts of metrics can be used individually, these are listed below. .. autosummary:: :toctree: api/ cluster_optimal_resolution get_resolutions lisi_graph pcr pc_regression PCR Regression Backends ^^^^^^^^^^^^^^^^^^^^^^^ The principal component regression metric can use multiple linear regression backends. These helpers are exposed here for advanced usage and benchmarking. .. currentmodule:: scib.metrics.pcr .. autosummary:: :toctree: api/ linreg_sklearn linreg_multiple_sklearn linreg_multiple_np