Monday, May 14, 2018

How to extract decision rules (features splits) from xgboost model in python3?

By Hường Hana 7:00 PM python, xgboost Leave a Comment

I need to extract the decision rules from my fitted xgboost model in python. I use 0.6a2 version of xgboost library and my python version is 3.5.2.

My ultimate goal is to use those splits to bin variables ( according to the splits).

I did not come across any property of the model for this version which can give me splits.

plot_tree is giving me something similar. However it is visualization of the tree.

I need something like https://stackoverflow.com/a/39772170/4559070 for xgboost model

3 Answers

Answers 1

It is possible, but not easy. I would recommend you to use GradientBoostingClassifier from scikit-learn, which is similar to xgboost, but has native access to the built trees.

With xgboost, however, it is possible to get a textual representation of the model and then parse it:

from sklearn.datasets import load_iris from xgboost import XGBClassifier # build a very simple model X, y = load_iris(return_X_y=True) model = XGBClassifier(max_depth=2, n_estimators=2) model.fit(X, y); # dump it to a text file model.get_booster().dump_model('xgb_model.txt', with_stats=True) # read the contents of the file with open('xgb_model.txt', 'r') as f:     txt_model = f.read() print(txt_model)

It will print you a textual description of 6 trees (2 estimators, each consists of 3 trees, one per class), which starts like this:

booster[0]: 0:[f2<2.45] yes=1,no=2,missing=1,gain=72.2968,cover=66.6667     1:leaf=0.143541,cover=22.2222     2:leaf=-0.0733496,cover=44.4444 booster[1]: 0:[f2<2.45] yes=1,no=2,missing=1,gain=18.0742,cover=66.6667     1:leaf=-0.0717703,cover=22.2222     2:[f3<1.75] yes=3,no=4,missing=3,gain=41.9078,cover=44.4444         3:leaf=0.124,cover=24         4:leaf=-0.0668394,cover=20.4444 ...

Now you can, for example, extract all splits from this description:

import re # trying to extract all patterns like "[f2<2.45]" splits = re.findall('\[f([0-9]+)<([0-9]+.[0-9]+)\]', txt_model) splits

It will print you the list of tuples (feature_id, split_value), like

[('2', '2.45'),  ('2', '2.45'),  ('3', '1.75'),  ('3', '1.65'),  ('2', '4.95'),  ('2', '2.45'),  ('2', '2.45'),  ('3', '1.75'),  ('3', '1.65'),  ('2', '4.95')]

You can further process this list as you wish.

Answers 2

Generally, it is not possible. As There are hundreds of thousands of trees in xgboost. Credit: Jason Brownlee

Answers 3

You need to know the name of your tree, and after that, you can insert it into your code.

Coding Question

Monday, May 14, 2018

How to extract decision rules (features splits) from xgboost model in python3?

3 Answers

Answers 1

Answers 2

Answers 3

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook