Sunday, May 13, 2018

Using Boosting tree to generate feature in sklearn

Leave a Comment

I am referring to this link to Feature Transformation using tree ensembles for the context.

Specifically for below part of code, in the sample of the link, the method of (1) using Boosting tree to generate feature, then using LR to train, outperforms (2) using Boosting tree itself. Questions,

  1. Wondering if it is true in general case using Boosting tree to generate feature (and using another classifier to classify) is better than using Boosting tree to do classification itself?
  2. And also wondering why using Boosting tree to generate feature, then using LR to train, outperforms using Boosting tree itself?

    grd = GradientBoostingClassifier(n_estimators=n_estimator) grd_enc = OneHotEncoder() grd_lm = LogisticRegression() grd.fit(X_train, y_train) grd_enc.fit(grd.apply(X_train)[:, :, 0]) grd_lm.fit(grd_enc.transform(grd.apply(X_train_lr)[:, :, 0]), y_train_lr) 

1 Answers

Answers 1

Interesting sources are paper_1 and paper_2 and additional references in them.

So to answer your questions:

  1. Very general statement, looking at some experimental results in the above papers there seem to be some exceptions. However, most of the time it does improve the score.
  2. The main idea behind doing so is to map features into a space where samples are linearly separable. If it really is the case, then linear classifiers shine.
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment