I am trying to fit at TensorForestEstimator
model with numerical floating-point data representing 7 features and 7 labels. That is, the shape of both features
and labels
is (484876, 7)
. I set num_classes=7
and num_features=7
in ForestHParams
appropriately. The format of the data is as follows:
f1 f2 f3 f4 f5 f6 f7 l1 l2 l3 l4 l5 l6 l7 39000.0 120.0 65.0 1000.0 25.0 0.69 3.94 39000.0 39959.0 42099.0 46153.0 49969.0 54127.0 55911.0 32000.0 185.0 65.0 1000.0 75.0 0.46 2.19 32000.0 37813.0 43074.0 48528.0 54273.0 60885.0 63810.0 30000.0 185.0 65.0 1000.0 25.0 0.41 1.80 30000.0 32481.0 35409.0 39145.0 42750.0 46678.0 48595.0
When calling fit()
Python crashes with the following message:
Python quit unexpectedly while using the _pywrap_tensorflow_internal.so plug-in.
Here is the output when enabling tf.logging.set_verbosity('INFO')
:
INFO:tensorflow:training graph for tree: 0 INFO:tensorflow:training graph for tree: 1 ... INFO:tensorflow:training graph for tree: 9998 INFO:tensorflow:training graph for tree: 9999 INFO:tensorflow:Create CheckpointSaverHook. 2017-07-26 10:25:30.908894: F tensorflow/contrib/tensor_forest/kernels/count_extremely_random_stats_op.cc:404] Check failed: column < num_classes_ (39001 vs. 8) Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
I'm not sure what this error means, it doesn't really make sense since num_classes=7
, not 8 and as the shape of features and labels is (484876, 7)
, I don't know where the 39001 is coming from.
Here is the code to reproduce:
import numpy as np import pandas as pd import os def get_training_data(): training_file = "data.txt" data = pd.read_csv(training_file, sep='\t') X = np.array(data.drop('Result', axis=1), dtype=np.float32) y = [] for e in data.ResultStr: y.append(list(np.array(str(e).replace('[', '').replace(']', '').split(',')))) y = np.array(y, dtype=np.float32) features = tf.constant(X) labels = tf.constant(y) return features, labels hyperparameters = ForestHParams( num_trees=100, max_nodes=10000, bagging_fraction=1.0, num_splits_to_consider=0, feature_bagging_fraction=1.0, max_fertile_nodes=0, split_after_samples=250, min_split_samples=5, valid_leaf_threshold=1, dominate_method='bootstrap', dominate_fraction=0.99, # All parameters above are default num_classes=7, num_features=7 ) estimator = TensorForestEstimator( params=hyperparameters, # All parameters below are default device_assigner=None, model_dir=None, graph_builder_class=RandomForestGraphs, config=None, weights_name=None, keys_name=None, feature_engineering_fn=None, early_stopping_rounds=100, num_trainers=1, trainer_id=0, report_feature_importances=False, local_eval=False ) estimator.fit( input_fn=lambda: get_training_data(), max_steps=100, monitors=[ TensorForestLossHook( early_stopping_rounds=30 ) ] )
It also doesn't work if I wrap it with SKCompat
, the same error occur. What is the cause of this crash?
1 Answers
Answers 1
regression=True
needs to be specified in the ForestHParams
because TensorForestEstimator
by default assumes that it is being used to solve a classification problem, which can only output one value.
There is an implicit num_outputs
variable created upon initialization of the estimator and it is set to 1
if regression
was not specified. If regression
is specified, then num_outputs = num_classes
and checkpoints are saved normally.
0 comments:
Post a Comment