sliced.sir.SlicedInverseRegression¶

class sliced.sir.SlicedInverseRegression(n_directions='auto', n_slices=10, alpha=None, copy=True)[source]¶

Sliced Inverse Regression (SIR) [1]

Linear dimensionality reduction using the inverse regression curve, E[X|y], to identify the directions defining the central subspace of the data.

The inverse comes from the fact that X and y are reversed with respect to the standard regression framework (estimating E[y|X]).

The algorithm performs a weighted principal component analysis on slices of the whitened data, which has been sorted with respect to the target, y.

For a binary target the directions found correspond to those found with Fisher’s Linear Discriminant Analysis (LDA).

Note that SIR may fail to estimate the directions if the conditional density X|y is symmetric, so that E[X|y] = 0. See sliced.save.SlicedAverageVarianceEstimation, which is able to overcome this limitation but may fail to pick up on linear trends. If possible, both SIR and SAVE should be used when analyzing a dataset.

Parameters:

Parameters:	n_directions : int, str or None (default=’auto’) Number of directions to keep. Corresponds to the dimension of the central subpace. If n_directions==’auto’, the number of directions is chosen by finding the maximum gap in the ordered eigenvalues of the var(X\|y) matrix and choosing the directions before this gap. If n_directions==None, the number of directions equals the number of features. n_slices : int (default=10) The number of slices used when calculating the inverse regression curve. Truncated to at most the number of unique values of `y`. alpha : float or None (default=None) Significance level for the two-sided t-test used to check for non-zero coefficients. Must be a number between 0 and 1. If not None, the non-zero components of each direction are determined from an asymptotic normal approximation. Useful if one desires that the directions are sparse in the number of features. copy : bool (default=True) If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.

n_directions : int, str or None (default=’auto’): Number of directions to keep. Corresponds to the dimension of the central subpace. If n_directions==’auto’, the number of directions is chosen by finding the maximum gap in the ordered eigenvalues of the var(X|y) matrix and choosing the directions before this gap. If n_directions==None, the number of directions equals the number of features.
n_slices : int (default=10): The number of slices used when calculating the inverse regression curve. Truncated to at most the number of unique values of y.
alpha : float or None (default=None): Significance level for the two-sided t-test used to check for non-zero coefficients. Must be a number between 0 and 1. If not None, the non-zero components of each direction are determined from an asymptotic normal approximation. Useful if one desires that the directions are sparse in the number of features.
copy : bool (default=True): If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.

References

[1] Li, K C. (1991): “Sliced Inverse Regression for Dimension Reduction (with discussion)”, Journal of the American Statistical Association, 86, 316-342.
[2] Chen, C.H., and Li, K.C. (1998), “Can SIR Be as Popular as Multiple: Linear Regression?” Statistica Sinica, 8, 289-316.

Examples

>>> import numpy as np
>>> from sliced import SlicedInverseRegression
>>> from sliced.datasets import make_cubic
>>> X, y = make_cubic(random_state=123)
>>> sir = SlicedInverseRegression(n_directions=2)
>>> sir.fit(X, y)
SlicedInverseRegression(alpha=None, copy=True, n_directions=2, n_slices=10)
>>> X_sir = sir.transform(X)

Attributes:

Attributes:	directions_ : array, shape (n_directions, n_features) The directions in feature space, representing the central subspace which is sufficient to describe the conditional distribution y\|X. The directions are sorted by `eigenvalues_`. eigenvalues_ : array, shape (n_directions,) The eigenvalues corresponding to each of the selected directions. These are the eigenvalues of the covariance matrix of the inverse regression curve. Larger eigenvalues indicate more prevalent directions.

directions_ : array, shape (n_directions, n_features): The directions in feature space, representing the central subspace which is sufficient to describe the conditional distribution y|X. The directions are sorted by eigenvalues_.
eigenvalues_ : array, shape (n_directions,): The eigenvalues corresponding to each of the selected directions. These are the eigenvalues of the covariance matrix of the inverse regression curve. Larger eigenvalues indicate more prevalent directions.

Methods

`fit`(X, y)	Fit the model with X and y.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Apply dimension reduction on X.

__init__(n_directions='auto', n_slices=10, alpha=None, copy=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`([n_directions, n_slices, alpha, copy])	Initialize self.
`fit`(X, y)	Fit the model with X and y.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Apply dimension reduction on X.