Sunday, April 1, 2018

Memory error when using Keras ImageDataGenerator

Leave a Comment

I am attempting to predict features in imagery using keras with a TensorFlow backend. Specifically, I am attempting to use a keras ImageDataGenerator. The model is set to run for 4 epochs and runs fine until the 4th epoch where it fails with a MemoryError.

I am running this model on an AWS g2.2xlarge instance running Ubuntu Server 16.04 LTS (HVM), SSD Volume Type.

The training images are 256x256 RGB pixel tiles (8 bit unsigned) and the training mask is 256x256 single band (8 bit unsigned) tiled data where 255 == a feature of interest and 0 == everything else.

The following 3 functions are the ones pertinent to this error.

How can I resolve this MemoryError?


def train_model():         batch_size = 1         training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')         training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')         dl_model = create_model()         print(dl_model.summary())         model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)         dl_model.fit_generator(generator(training_imgs, training_masks, batch_size), steps_per_epoch=(len(training_imgs)/batch_size), epochs=4,verbose=1,callbacks=[model_checkpoint])  def generator(train_imgs, train_masks=None, batch_size=None):  # Create empty arrays to contain batch of features and labels#          if train_masks is not None:                 train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))                 train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))                  while True:                         for i in range(batch_size):                                 # choose random index in features                                 index= random.choice(range(len(train_imgs)))                                 train_imgs_batch[i] = train_imgs[index]                                 train_masks_batch[i] = train_masks[index]                         yield train_imgs_batch, train_masks_batch         else:                 rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))                 while True:                         for i in range(batch_size):                                 # choose random index in features                                 index= random.choice(range(len(train_imgs)))                                 rec_imgs_batch[i] = train_imgs[index]                         yield rec_imgs_batch  def train_generator(train_images,train_masks,batch_size):         data_gen_args=dict(rotation_range=90.,horizontal_flip=True,vertical_flip=True,rescale=1./255)         image_datagen = ImageDataGenerator()         mask_datagen = ImageDataGenerator() # # Provide the same seed and keyword arguments to the fit and flow methods         seed = 1         image_datagen.fit(train_images, augment=True, seed=seed)         mask_datagen.fit(train_masks, augment=True, seed=seed)         image_generator = image_datagen.flow(train_images,batch_size=batch_size)         mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)         return zip(image_generator, mask_generator) 

The following os the output from the model detailing the epochs and the error message:

Epoch 00001: loss improved from inf to 0.01683, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5 Epoch 2/4 7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0027 - jaccard_coef_int: 0.9983    Epoch 00002: loss improved from 0.01683 to 0.00492, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5 Epoch 3/4 7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0026 - jaccard_coef_int: 0.9982    Epoch 00003: loss improved from 0.00492 to 0.00488, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5 Epoch 4/4 7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0074 - binary_crossentropy: 0.0042 - jaccard_coef_int: 0.9975    Epoch 00004: loss did not improve Traceback (most recent call last):   File "image_rec.py", line 291, in <module>     train_model()   File "image_rec.py", line 208, in train_model     dl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)   File "image_rec.py", line 274, in train_generator     image_datagen.fit(train_images, augment=True, seed=seed)   File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 753, in fit     x = np.copy(x)   File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1505, in copy     return array(a, order=order, copy=True) MemoryError 

5 Answers

Answers 1

it seems your problem is due to the data is too huge. I can see two solutions. The first one is run your code in a distributed system by means of spark, I guess you do not have this support, so let us move on to the other.

The second one is which I think is viable. I would slice the data and I would try feeding the model incrementally. We can do this with Dask. This library can slice the data and save in objects which then you can retrieve reading from disk, only in the part you want.

If you have a image which size is an matrix of 100x100, we can retrieve each array without the needed to load the 100 arrays in memory. We can load array by array in memory (releasing the previous one), which would be the input in your Neural Network.

To do this, you can to transform your np.array to dask array and assign the partitions. For example:

>>> k = np.random.randn(10,10) # Matrix 10x10 >>> import dask.array as da >>> k2 = da.from_array(k,chunks = 3) dask.array<array, shape=(10, 10), dtype=float64, chunksize=(3, 3)> >>> k2.to_delayed() array([[Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 0)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 1)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 2)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 3))],    [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 0)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 1)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 2)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 3))],    [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 0)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 1)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 2)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 3))],    [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 0)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 1)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 2)),     Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 3))]],   dtype=object) 

Here, you can see how the data is saved in objects, and then you can retrieve in parts to feed your model.

To implement this solution you must introduce a loop in your function which call each partition and feed the NN to get the incremental trainning.

For more information, see Dask documentation

Answers 2

Generally Keras/Tensorflow is very good with resource usage, but there is a known memory leak that has caused problems in the past. To make sure that's not the one causing your problems, try including these two lines of code to your training script:

# load the backend from keras import backend as K  # prevent Tensorflow memory leakage K.clear_session() 

Answers 3

You provided quite confusing code (in my opinion), ie. no call to the train_generator is visible. I am not sure that this is a problem of insufficient memore due to a big data, since you use memmap for that, but lets assume for now it is.

  • If the data is quite big and since you're loading the images from directory anyway, it might be worthy considering to use ImageDataGenerator's flow_from_directory method. It would require a slight change of design, tho, which might not be what you want.

You can load it in the following manner:

train_datagen = ImageDataGenerator() train_generator = train_datagen.flow_from_directory(         'data/train',         target_size=(256, 256),         batch_size=batch_size,         ...  # other configurations) 

More on that in the Keras documentation.

  • Also note that if you have 32bit, the memmap does not allow more than 2GB.

  • Do you use tensorflow-gpu, by any chance? Maybe your gpu is not sufficient, you could try this with the tensorflow package.

I would strongly suggest to try some memory profiling to see where bigger allocations of memory happen.


If it was not the case of insufficient memory, It might be wrong handling of the data in your model, since your loss function is not improving at all, it could be miswired for example.


Finally, the last note here .. it is good practice to load the memmap of training data as read-only, since you don't want to accidentaly mess the data.

UPDATE: I can see that you've updated the post and provided the code for the train_generator method, but there is still no call to that method in your call.

If I assume that you have a typo in the call - train_generator instead of the generator method in your d1_model.fit_generator method, it is possible that the fit_generator method is not working on a batch of data, but actually on the whole training_imgs and it copys over the whole set in the np.copy(x) call.

Also, as mentioned already, there indeed are (you can find some of them, fe. here is an open one) a few issues with Keras memory leak when using the fit and fit_generator methods.

Answers 4

This is common when running 32bit if the float precision is too high. Are you running 32bit? You may also consider casting or rounding the array.

Answers 5

In your code ,where ever your creating a numpy array of zeros,add the data type as float16

numpy.zeros(shape, dtype=float16) 

By default its float64,so we would be saving memory 4 folds.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment