Showing posts with label xgboost. Show all posts
Showing posts with label xgboost. Show all posts

Monday, May 14, 2018

How to extract decision rules (features splits) from xgboost model in python3?

Leave a Comment

I need to extract the decision rules from my fitted xgboost model in python. I use 0.6a2 version of xgboost library and my python version is 3.5.2.

My ultimate goal is to use those splits to bin variables ( according to the splits).

I did not come across any property of the model for this version which can give me splits.

plot_tree is giving me something similar. However it is visualization of the tree.

I need something like https://stackoverflow.com/a/39772170/4559070 for xgboost model

3 Answers

Answers 1

It is possible, but not easy. I would recommend you to use GradientBoostingClassifier from scikit-learn, which is similar to xgboost, but has native access to the built trees.

With xgboost, however, it is possible to get a textual representation of the model and then parse it:

from sklearn.datasets import load_iris from xgboost import XGBClassifier # build a very simple model X, y = load_iris(return_X_y=True) model = XGBClassifier(max_depth=2, n_estimators=2) model.fit(X, y); # dump it to a text file model.get_booster().dump_model('xgb_model.txt', with_stats=True) # read the contents of the file with open('xgb_model.txt', 'r') as f:     txt_model = f.read() print(txt_model) 

It will print you a textual description of 6 trees (2 estimators, each consists of 3 trees, one per class), which starts like this:

booster[0]: 0:[f2<2.45] yes=1,no=2,missing=1,gain=72.2968,cover=66.6667     1:leaf=0.143541,cover=22.2222     2:leaf=-0.0733496,cover=44.4444 booster[1]: 0:[f2<2.45] yes=1,no=2,missing=1,gain=18.0742,cover=66.6667     1:leaf=-0.0717703,cover=22.2222     2:[f3<1.75] yes=3,no=4,missing=3,gain=41.9078,cover=44.4444         3:leaf=0.124,cover=24         4:leaf=-0.0668394,cover=20.4444 ... 

Now you can, for example, extract all splits from this description:

import re # trying to extract all patterns like "[f2<2.45]" splits = re.findall('\[f([0-9]+)<([0-9]+.[0-9]+)\]', txt_model) splits 

It will print you the list of tuples (feature_id, split_value), like

[('2', '2.45'),  ('2', '2.45'),  ('3', '1.75'),  ('3', '1.65'),  ('2', '4.95'),  ('2', '2.45'),  ('2', '2.45'),  ('3', '1.75'),  ('3', '1.65'),  ('2', '4.95')] 

You can further process this list as you wish.

Answers 2

Generally, it is not possible. As There are hundreds of thousands of trees in xgboost. Credit: Jason Brownlee

Answers 3

You need to know the name of your tree, and after that, you can insert it into your code.

Read More

Wednesday, May 31, 2017

execinfo.h missing when installing xgboost in Cygwin

Leave a Comment

I've follow the following tutorial in order to install xgboost python package within Cygwin64:

https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows

But when executing the make in dmlc-core directory I get the following errors:

harrison4@mypc ~/xgboost/dmlc-core $ mingw32-make -j4 g++ -c -O3 -Wall -Wno-unknown-pragmas -Iinclude  -std=c++0x -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -msse2 -o line_split.o src/io/line_split.cc g++ -c -O3 -Wall -Wno-unknown-pragmas -Iinclude  -std=c++0x -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -msse2 -o recordio_split.o src/io/recordio_split.cc g++ -c -O3 -Wall -Wno-unknown-pragmas -Iinclude  -std=c++0x -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -msse2 -o input_split_base.o src/io/input_split_base.cc g++ -c -O3 -Wall -Wno-unknown-pragmas -Iinclude  -std=c++0x -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -msse2 -o io.o src/io.cc src/io/line_split.cc:1:0: aviso: se descarta -fPIC para el objetivo (todo el código es independiente de posición)  // Copyright by Contributors  ^ src/io.cc:1:0: aviso: se descarta -fPIC para el objetivo (todo el código es independiente de posición)  // Copyright by Contributors  ^ src/io/input_split_base.cc:1:0: aviso: se descarta -fPIC para el objetivo (todo el código es independiente de posición)  // Copyright by Contributors  ^ src/io/recordio_split.cc:1:0: aviso: se descarta -fPIC para el objetivo (todo el código es independiente de posición)  // Copyright by Contributors  ^ In file included from include/dmlc/io.h:14:0,                  from src/io/line_split.cc:2: include/dmlc/./logging.h:18:22: error fatal: execinfo.h: No such file or directory compilación terminada. Makefile:83: recipe for target 'line_split.o' failed mingw32-make: *** [line_split.o] Error 1 mingw32-make: *** Waiting for unfinished jobs.... In file included from src/io/input_split_base.cc:2:0: include/dmlc/logging.h:18:22: error fatal: execinfo.h: No such file or directory compilación terminada. In file included from include/dmlc/io.h:14:0,                  from src/io.cc:4: include/dmlc/./logging.h:18:22: error fatal: execinfo.h: No such file or directory compilación terminada. Makefile:83: recipe for target 'input_split_base.o' failed mingw32-make: *** [input_split_base.o] Error 1 Makefile:83: recipe for target 'io.o' failed mingw32-make: *** [io.o] Error 1 In file included from include/dmlc/./io.h:14:0,                  from include/dmlc/recordio.h:12,                  from src/io/recordio_split.cc:2: include/dmlc/././logging.h:18:22: error fatal: execinfo.h: No such file or directory compilación terminada. Makefile:83: recipe for target 'recordio_split.o' failed mingw32-make: *** [recordio_split.o] Error 1 

Why am I getting this error? Let me know if you need more information, please.

1 Answers

Answers 1

You can put #undef DMLC_LOG_STACK_TRACE right after it's definition on line 45 here. See example in this gist.

execinfo.h is only available on Linux, but in this project it is used only for debugging and printing stack trace on Linux. There is a check for Mingw in their codebase, don't know why it is not defined (they've disabled it, see this PR).

You should try to change those lines and run make again.

Read More

Saturday, January 21, 2017

Confidence intervals for XGRegressor

Leave a Comment

Is there a way to get confidence intervals for XGRegressor predictions? Something like the forest-ci package for scikit random forests? I want a solution for python

0 Answers

Read More

Friday, March 25, 2016

Install xgboost on Mac - ld: library not found

Leave a Comment

I am trying to install OpenMP enabled xgboost on my mac. I installed gcc with no problem:

brew install gcc --without-multilib 

then cloned git repository:

git clone --recursive https://github.com/dmlc/xgboost cd xgboost; cp make/config.mk ./config.mk 

but I get an error when I do

make -j4 

Here is the error I get. I'd appreciate if you can help fixing this problem:

c++ -std=c++0x -Wall -O3 -msse2  -Wno-unknown-pragmas -funroll-loops -Iinclude   -Idmlc-core/include -Irabit/include -fPIC -fopenmp -o xgboost  build/cli_main.o build/learner.o build/logging.o build/c_api/c_api.o build/c_api/c_api_error.o build/common/common.o build/data/data.o build/data/simple_csr_source.o build/data/simple_dmatrix.o build/data/sparse_page_dmatrix.o build/data/sparse_page_raw_format.o build/data/sparse_page_source.o build/data/sparse_page_writer.o build/gbm/gblinear.o build/gbm/gbm.o build/gbm/gbtree.o build/metric/elementwise_metric.o build/metric/metric.o build/metric/multiclass_metric.o build/metric/rank_metric.o build/objective/multiclass_obj.o build/objective/objective.o build/objective/rank_obj.o build/objective/regression_obj.o build/tree/tree_model.o build/tree/tree_updater.o build/tree/updater_colmaker.o build/tree/updater_histmaker.o build/tree/updater_prune.o build/tree/updater_refresh.o build/tree/updater_skmaker.o build/tree/updater_sync.o dmlc-core/libdmlc.a rabit/lib/librabit.a  -pthread -lm  -fopenmp  c++ -std=c++0x -Wall -O3 -msse2  -Wno-unknown-pragmas -funroll-loops -Iinclude   -Idmlc-core/include -Irabit/include -fPIC -fopenmp -shared -o lib/libxgboost.so build/learner.o build/logging.o build/c_api/c_api.o build/c_api/c_api_error.o build/common/common.o build/data/data.o build/data/simple_csr_source.o build/data/simple_dmatrix.o build/data/sparse_page_dmatrix.o build/data/sparse_page_raw_format.o build/data/sparse_page_source.o build/data/sparse_page_writer.o build/gbm/gblinear.o build/gbm/gbm.o build/gbm/gbtree.o build/metric/elementwise_metric.o build/metric/metric.o build/metric/multiclass_metric.o build/metric/rank_metric.o build/objective/multiclass_obj.o build/objective/objective.o build/objective/rank_obj.o build/objective/regression_obj.o build/tree/tree_model.o build/tree/tree_updater.o build/tree/updater_colmaker.o build/tree/updater_histmaker.o build/tree/updater_prune.o build/tree/updater_refresh.o build/tree/updater_skmaker.o build/tree/updater_sync.o dmlc-core/libdmlc.a rabit/lib/librabit.a -pthread -lm  -fopenmp  clangclang: : warningwarning: : argument unused during compilation: '-pthread'argument unused during compilation: '-pthread'  ld: library not found for -lgomp ld: library not found for -lgomp clang: error: clanglinker command failed with exit code 1 (use -v to see invocation):  error: linker command failed with exit code 1 (use -v to see invocation) make: *** [lib/libxgboost.so] Error 1 make: *** Waiting for unfinished jobs.... make: *** [xgboost] Error 1 

2 Answers

Answers 1

A new versions of OSX has a Clang as a default c\c++ compiler. Therefore your c++ command refers to clang++.

You should define a CC\CXX environmental variables for your make command like this CC=gcc CXX=g++ make -j

Also you can build an OpenMP for clang OpenMPrt and customise your shell environment (I didn't try this by own)

(I have no mac at this moment to check this solution; just linux)

Answers 2

I have the same issue and solved it by:

brew install clang-omp export CC=clang-omp export CXX=clang-omp++ git clone --recursive https://github.com/dmlc/xgboost cd xgboost; cp make/config.mk ./config.mk; make -j4 cd python-package sudo python setup.py install 

If you used pip to install xgboost before, then delete the previous installed xgboost from your project. Then use

pip install xgboost  

to install it again.

Read More