scib.metrics.pc_regression

scib.metrics.pc_regression(data, covariate, pca_var=None, n_comps=50, svd_solver='arpack', linreg_method='sklearn', verbose=False, n_threads=1)

Principal component regression

Compute the overall variance contribution given a covariate according to the following formula:

\[Var(C|B) = \sum^G_{i=1} Var(C|PC_i) \cdot R^2(PC_i|B)\]

for \(G\) principal components (\(PC_i\)), where \(Var(C|PC_i)\) is the variance of the data matrix \(C\) explained by the i-th principal component, and \(R^2(PC_i|B)\) is the \(R^2\) of the i-th principal component regressed against a covariate \(B\).

Parameters:
  • data – Expression or PC matrix. Assumed to be PC, if pca_sd is given.

  • covariate – series or list of batch assignments

  • n_comps – number of PCA components for computing PCA, only when pca_sd is not given. If no pca_sd is not defined and n_comps=None, compute PCA and don’t reduce data

  • pca_var – Iterable of variances for n_comps components. If pca_sd is not None, it is assumed that the matrix contains PC, otherwise PCA is computed on data.

  • linreg_method

    Regression backend. One of 'sklearn', 'numpy', or 'sequential'.

    • 'sequential' calls linreg_sklearn(), the

      original implementation that fits one model per PC and is typically much slower.

    • 'sklearn' calls linreg_multiple_sklearn(),

      a multi-output linear regression backend.

    • 'numpy' calls linreg_multiple_np(), a

      vectorized numpy backend with a categorical one-way ANOVA shortcut.

  • svd_solver

  • n_threads – Number of threads passed to the selected regression backend.

  • verbose

Returns:

Variance contribution of regression