Saturday, August 4, 2018

Why is TensorFlow's `tf.data` package slowing down my code?

Leave a Comment

I'm just learning to use TensorFlow's tf.data API, and I've found that it is slowing my code down a lot, measured in time per epoch. This is the opposite of what it's supposed to do, I thought. I wrote a simple linear regression program to test it out.

Tl;Dr: With 100,000 training data, tf.data slows time per epoch down by about a factor of ten, if you're using full batch training. Worse if you use smaller batches. The opposite is true with 500 training data.

My question: What is going on? Is my implementation flawed? Other sources I've read have tf.data improving speeds by about 30%.

import tensorflow as tf  import numpy as np import timeit  import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' tf.logging.set_verbosity(tf.logging.ERROR)  n_epochs = 10 input_dimensions_list = [10]  def function_to_approximate(x):     return np.dot(x, random_covector).astype(np.float32) + np.float32(.01) * np.random.randn(1,1).astype(np.float32)  def regress_without_tfData(n_epochs, input_dimension, training_inputs, training_labels):     tf.reset_default_graph()     weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))      X = tf.placeholder(tf.float32, shape=(None, input_dimension), name='X')     Y = tf.placeholder(tf.float32, shape=(None, 1), name='Y')     prediction = tf.matmul(X,weights)     loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))     loss_op = tf.train.AdamOptimizer(.01).minimize(loss)      init = tf.global_variables_initializer()      with tf.Session() as sess:         sess.run(init)         for _ in range(n_epochs):             sess.run(loss_op, feed_dict={X: training_inputs, Y:training_labels})  def regress_with_tfData(n_epochs, input_dimension, training_inputs, training_labels, batch_size):     tf.reset_default_graph()     weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))      X,Y = data_set.make_one_shot_iterator().get_next()      prediction = tf.matmul(X, weights)     loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))     loss_op = tf.train.AdamOptimizer(.01).minimize(loss)      init = tf.global_variables_initializer()      with tf.Session() as sess:         sess.run(init)         while True:             try:                  sess.run(loss_op)             except tf.errors.OutOfRangeError:                 break  for input_dimension in input_dimensions_list:     for data_size in [500, 100000]:          training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)         random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))         training_labels = function_to_approximate(training_inputs)          print("Not using tf.data, with data size "         "{}, input dimension {} and training with "         "a full batch, it took an average of "         "{} seconds to run {} epochs.\n".             format(                 data_size,                 input_dimension,                 timeit.timeit(                     lambda: regress_without_tfData(                         n_epochs, input_dimension,                          training_inputs, training_labels                     ),                      number=3                 ),                 n_epochs))  for input_dimension in input_dimensions_list:     for data_size, batch_size in [(500, 50), (500, 500), (100000, 50), (100000, 100000)]:          training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)         random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))         training_labels = function_to_approximate(training_inputs)          data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))         data_set = data_set.repeat(n_epochs)         data_set = data_set.batch(batch_size)          print("Using tf.data, with data size "         "{}, and input dimension {}, and training with "         "batch size {}, it took an average of {} seconds "         "to run {} epochs.\n".             format(                 data_size,                 input_dimension,                 batch_size,                 timeit.timeit(                     lambda: regress_with_tfData(                         n_epochs, input_dimension,                          training_inputs, training_labels,                          batch_size                     ),                     number=3                 )/3,                 n_epochs             )) 

This outputs for me:

Not using tf.data, with data size 500, input dimension 10 and training with a full batch, it took an average of 0.20243382899980134 seconds to run 10 epochs.

Not using tf.data, with data size 100000, input dimension 10 and training with a full batch, it took an average of 0.2431719040000644 seconds to run 10 epochs.

Using tf.data, with data size 500, and input dimension 10, and training with batch size 50, it took an average of 0.09512088866661846 seconds to run 10 epochs.

Using tf.data, with data size 500, and input dimension 10, and training with batch size 500, it took an average of 0.07286913600000844 seconds to run 10 epochs.

Using tf.data, with data size 100000, and input dimension 10, and training with batch size 50, it took an average of 4.421892363666605 seconds to run 10 epochs.

Using tf.data, with data size 100000, and input dimension 10, and training with batch size 100000, it took an average of 2.2555197536667038 seconds to run 10 epochs.

Edit: Fixed an important issue that Fred Guth pointed out. It didn't much affect the results, though.

1 Answers

Answers 1

First:

You are recreating the dataset unnecessarily.

data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))

Create the dataset prior to the loop and change the regress_with_tfData input signature to use dataset instead of training_inputs and training_labels.

Second:

The problem here is that minibatches of size 50 or even 500 are too small to compensate the cost of td.data building latency. You should increase the minibatch size. Interestingly you did so with a minibatch of size 100000, but then maybe it is too big ( I am not certain of this, I think it would need more tests).

There are a couple of things you could try:

1) Increase the minibatch size to something like 10000 and see if you get an improvement 2) Change your pipeline to use an iterator, example:

    data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))     data_set = data_set.repeat(n_epochs)     data_set = data_set.batch(batch_size)     iterator = data_set.make_one_shot_iterator()     ....     next_element = iterator.get_next() 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment