sliced.sir.SlicedInverseRegression

class sliced.sir.SlicedInverseRegression(n_directions='auto', n_slices=10, alpha=None, copy=True)[source]

Sliced Inverse Regression (SIR) [1]

Linear dimensionality reduction using the inverse regression curve, E[X|y], to identify the directions defining the central subspace of the data.

The inverse comes from the fact that X and y are reversed with respect to the standard regression framework (estimating E[y|X]).

The algorithm performs a weighted principal component analysis on slices of the whitened data, which has been sorted with respect to the target, y.

For a binary target the directions found correspond to those found with Fisher’s Linear Discriminant Analysis (LDA).

Note that SIR may fail to estimate the directions if the conditional density X|y is symmetric, so that E[X|y] = 0. See sliced.save.SlicedAverageVarianceEstimation, which is able to overcome this limitation but may fail to pick up on linear trends. If possible, both SIR and SAVE should be used when analyzing a dataset.

Parameters:
n_directions : int, str or None (default=’auto’)

Number of directions to keep. Corresponds to the dimension of the central subpace. If n_directions==’auto’, the number of directions is chosen by finding the maximum gap in the ordered eigenvalues of the var(X|y) matrix and choosing the directions before this gap. If n_directions==None, the number of directions equals the number of features.

n_slices : int (default=10)

The number of slices used when calculating the inverse regression curve. Truncated to at most the number of unique values of y.

alpha : float or None (default=None)

Significance level for the two-sided t-test used to check for non-zero coefficients. Must be a number between 0 and 1. If not None, the non-zero components of each direction are determined from an asymptotic normal approximation. Useful if one desires that the directions are sparse in the number of features.

copy : bool (default=True)

If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.

References

[1] Li, K C. (1991)
“Sliced Inverse Regression for Dimension Reduction (with discussion)”, Journal of the American Statistical Association, 86, 316-342.
[2] Chen, C.H., and Li, K.C. (1998), “Can SIR Be as Popular as Multiple
Linear Regression?” Statistica Sinica, 8, 289-316.

Examples

>>> import numpy as np
>>> from sliced import SlicedInverseRegression
>>> from sliced.datasets import make_cubic
>>> X, y = make_cubic(random_state=123)
>>> sir = SlicedInverseRegression(n_directions=2)
>>> sir.fit(X, y)
SlicedInverseRegression(alpha=None, copy=True, n_directions=2, n_slices=10)
>>> X_sir = sir.transform(X)
Attributes:
directions_ : array, shape (n_directions, n_features)

The directions in feature space, representing the central subspace which is sufficient to describe the conditional distribution y|X. The directions are sorted by eigenvalues_.

eigenvalues_ : array, shape (n_directions,)

The eigenvalues corresponding to each of the selected directions. These are the eigenvalues of the covariance matrix of the inverse regression curve. Larger eigenvalues indicate more prevalent directions.

Methods

fit(X, y) Fit the model with X and y.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply dimension reduction on X.
__init__(n_directions='auto', n_slices=10, alpha=None, copy=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([n_directions, n_slices, alpha, copy]) Initialize self.
fit(X, y) Fit the model with X and y.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(X) Apply dimension reduction on X.