The reproducible example to fix the discussion:
from sklearn.linear_model import RidgeCV from sklearn.datasets import load_boston from sklearn.preprocessing import scale boston = scale(load_boston().data) target = load_boston().target import numpy as np alphas = np.linspace(1.0,200.0, 5) fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target) fit0.alpha_ fit0.cv_values_[:,0]
The question: what formula is used to compute fit0.cv_values_
?
Edit:
@Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0]
, the first entry of fit0.cv_values_[:,0]
would be
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
where fit1
is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0
was removed.
Let's see:
1) create new dataset with first row of original dataset removed:
from sklearn.linear_model import Ridge boston1 = np.delete(boston, (0), axis=0) target1 = np.delete(target, (0), axis=0)
2) fit a ridge model with alpha = 1.0 on this truncated dataset:
fit1 = Ridge(alpha=1.0).fit(boston1, target1)
3) check the MSE of that model on the first data-point:
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
it is array([ 37.64650853])
which is not the same as what is produced by the fit0.cv_values_[:,0]
, ergo:
fit0.cv_values_[:,0][0]
which is 37.495629960571137
What gives?
2 Answers
Answers 1
Quoting from the Sklearn documentation:
Cross-validation values for each alpha (if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).
Since you have not provided any scoring function in the constructor and also not provided anything for the cv
argument in the constructor, this attribute should store the mean squared error for each sample using Leave-One out cross validation. The general formula for Mean Squared Error is
where the Y (with the cap) is the prediction of your regressor and the other Y is the true value.
In your case, you are doing Leave-One out cross validation. Therefore, in every fold you have only 1 test point and thus n = 1. So, in your case doing a fit0.cv_values_[:,0]
will simply give you the squared error for every point in your training data set when it was a part of the test fold and when the value of alpha was 1.0
Hope that helps.
Answers 2
Let's look - it's open source after all
The first call to fit makes a call upwards to its parent, _BaseRidgeCV (line 997, in that implementation). We haven't provided a cross-validation generator, so we make another call upwards to _RidgeGCV.fit. There' plenty of math in the documentation of this function, but we're so close to the source that I'll let you go and read about it.
Here's the actual source
v, Q, QT_y = _pre_compute(X, y) n_y = 1 if len(y.shape) == 1 else y.shape[1] cv_values = np.zeros((n_samples * n_y, len(self.alphas))) C = [] scorer = check_scoring(self, scoring=self.scoring, allow_none=True) error = scorer is None for i, alpha in enumerate(self.alphas): weighted_alpha = (sample_weight * alpha if sample_weight is not None else alpha) if error: out, c = _errors(weighted_alpha, y, v, Q, QT_y) else: out, c = _values(weighted_alpha, y, v, Q, QT_y) cv_values[:, i] = out.ravel() C.append(c)
Note the un-exciting pre_compute
function
def _pre_compute(self, X, y): # even if X is very sparse, K is usually very dense K = safe_sparse_dot(X, X.T, dense_output=True) v, Q = linalg.eigh(K) QT_y = np.dot(Q.T, y) return v, Q, QT_y
Abinav has explained what's going on on a mathematical level -it's simply accumulating the weighted mean squared error. The details of their implementation, and where it differs from your implementation, can be evaluated step-by-step from the code
0 comments:
Post a Comment