Saturday, February 18, 2017

Wide & Deep learning example unable to handle large datasets

Leave a Comment

Inserting 1MM+ rows into the wide and deep learning model throws ValueError: GraphDef cannot be larger than 2GB:

Traceback (most recent call last):   File "search_click.py", line 207, in <module>     tf.app.run()   File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run     sys.exit(main(sys.argv))   File "search_click.py", line 204, in main     train_and_eval()   File "search_click.py", line 181, in train_and_eval     m.fit(input_fn=lambda: input_fn(df_train), steps=FLAGS.train_steps)   File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 182, in fit     monitors=monitors)   File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 458, in _train_model     summary_writer=graph_actions.get_summary_writer(self._model_dir))   File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 76, in get_summary_writer     graph=ops.get_default_graph())   File "/usr/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 113, in __init__     self.add_graph(graph=graph, graph_def=graph_def)   File "/usr/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 204, in add_graph     true_graph_def = graph.as_graph_def(add_shapes=True)   File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2117, in as_graph_def     raise ValueError("GraphDef cannot be larger than 2GB.") ValueError: GraphDef cannot be larger than 2GB. 

I defined the same input_fn as in the example:

def input_fn(df):   """Input builder function."""   # Creates a dictionary mapping from each continuous feature column name (k) to   # the values of that column stored in a constant Tensor.   continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}   # Creates a dictionary mapping from each categorical feature column name (k)   # to the values of that column stored in a tf.SparseTensor.   categorical_cols = {k: tf.SparseTensor(       indices=[[i, 0] for i in range(df[k].size)],       values=df[k].values,       shape=[df[k].size, 1])                       for k in CATEGORICAL_COLUMNS}   # Merges the two dictionaries into one.   feature_cols = dict(continuous_cols)   feature_cols.update(categorical_cols)   # Converts the label column into a constant Tensor.   label = tf.constant(df[LABEL_COLUMN].values)   # Returns the feature columns and the label.   return feature_cols, label 

Is there a replacement to tf.constant and tf.SparseTensor that allows for inserting in batches and avoiding the memory error?

1 Answers

Answers 1

The example of wide and deep need load the data set all into memory. If you have large data set, you may need tf.decode_csv for csv formatted input. If your input format is self defined, you should create a custom data reader .

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment