sliced.save.SlicedAverageVarianceEstimation

class sliced.save.SlicedAverageVarianceEstimation(n_directions='auto', n_slices=10, copy=True)[source]

Sliced Average Variance Estimation (SAVE) [1]

Linear dimensionality reduction using the conditional covariance, Cov(X|y), to identify the directions defining the central subspace of the data.

The algorithm performs a weighted principal component analysis on a transformation of slices of the covariance matrix of the whitened data, which has been sorted with respect to the target, y.

Since SAVE looks at second moment information, it may miss first-moment information. In particular, it may miss linear trends. See sliced.sir.SlicedInverseRegression, which is able to detect linear trends but may fail in other situations. If possible, both SIR and SAVE should be used when analyzing a dataset.

Parameters:
n_directions : int, str or None (default=’auto’)

Number of directions to keep. Corresponds to the dimension of the central subpace. If n_directions==’auto’, the number of directions is chosen by finding the maximum gap in the ordered eigenvalues of the var(X|y) matrix and choosing the directions before this gap. If n_directions==None, the number of directions equals the number of features.

n_slices : int (default=10)

The number of slices used when calculating the inverse regression curve. Truncated to at most the number of unique values of y.

copy : bool (default=True)

If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.

References

[1] Shao, Y, Cook, RD and Weisberg, S (2007).
“Marginal Tests with Sliced Average Variance Estimation”, Biometrika, 94, 285-296.

Examples

>>> import numpy as np
>>> from sliced import SlicedAverageVarianceEstimation
>>> from sliced.datasets import make_quadratic
>>> X, y = make_quadratic(random_state=123)
>>> save = SlicedAverageVarianceEstimation(n_directions=2)
>>> save.fit(X, y)
SlicedAverageVarianceEstimation(copy=True, n_directions=2, n_slices=10)
>>> X_save = save.transform(X)
Attributes:
directions_ : array, shape (n_directions, n_features)

The directions in feature space, representing the central subspace which is sufficient to describe the conditional distribution y|X. The directions are sorted by eigenvalues_.

eigenvalues_ : array, shape (n_directions,)

The eigenvalues corresponding to each of the selected directions. These are the eigenvalues of the covariance matrix of the inverse regression curve. Larger eigenvalues indicate more prevelant directions.

Methods

fit(X, y) Fit the model with X and y.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply dimension reduction on X.
__init__(n_directions='auto', n_slices=10, copy=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([n_directions, n_slices, copy]) Initialize self.
fit(X, y) Fit the model with X and y.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply dimension reduction on X.