Sunday, June 4, 2017

Callbackfunction modelcheckpoint causes error in keras

Leave a Comment

I seem to get this error when I am using the callback function modelcheckpoint..

I read from a github issue that the solution would be make use of model.get_weight, but I am implicitly only storing that since i am only storing the one with best weight.

Keras only seem to save weights using h5, which make me question is there any other way to do store them using the eras API, if so how? If not, how do i store it?

Made an example to recreate the problem:

#!/usr/bin/python   import glob, os import sys from os import listdir from os.path import isfile, join import numpy as np import warnings import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from keras.utils import np_utils from keras import metrics import keras from keras import backend as K from keras.models import Sequential from keras.optimizers import SGD, Adam from keras.layers.core import Dense, Activation, Lambda, Reshape,Flatten from keras.layers import Conv1D,Conv2D,MaxPooling2D, MaxPooling1D, Reshape #from keras.utils.visualize_util import plot from keras.models import Model from keras.layers import Input, Dense from keras.layers.merge import Concatenate, Add import h5py import random import tensorflow as tf import math from keras.callbacks import CSVLogger from keras.callbacks import ModelCheckpoint   if len(sys.argv) < 5:     print "Missing Arguments!"     print "python keras_convolutional_feature_extraction.py <workspace> <totale_frames> <fbank-dim> <window-height> <batch_size>"     print "Example:"     print "python keras_convolutional_feature_extraction.py deltas 15 40 5 100"     sys.exit()   total_frames = int(sys.argv[2]) total_frames_with_deltas = total_frames*3 dim = int(sys.argv[3]) window_height = int(sys.argv[4]) inserted_batch_size = int(sys.argv[5]) stride = 1 splits = ((dim - window_height)+1)/stride  #input_train_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_input" #output_train_data ="/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_output" #input_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_input" #output_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_output"  #train_files =[f for f in listdir(input_train_data) if isfile(join(input_train_data, f))] #test_files =[f for f in listdir(input_test_data) if isfile(join(input_test_data, f))]  #print len(train_files) np.random.seed(100) print "hallo" def train_generator():     while True: #        input = random.choice(train_files) #        h5f = h5py.File(input_train_data+'/'+input, 'r') #        train_input = h5f['train_input'][:] #        train_output = h5f['train_output'][:] #        h5f.close()         train_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))         train_list_list = []         train_input = train_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))         train_input_list = np.split(train_input,splits*total_frames_with_deltas,axis=1)         for i in range(len(train_input_list)):             train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,window_height,3)           #for i in range(len(train_input_list)):         #    train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)          train_output = np.random.randint(5, size = (1,total_frames,5))         middle = int(math.ceil(total_frames/2))          train_output = train_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))         #print train_output.shape         #print len(train_input_list)         #print train_input_list[0].shape         yield (train_input_list, train_output) print "hallo" def test_generator():     while True: #        input = random.choice(test_files) #        h5f = h5py.File(input_test_data+'/'+input, 'r') #        test_input = h5f['test_input'][:] #        test_output = h5f['test_output'][:] #        h5f.close()         test_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))         test_input = test_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))         test_input_list = np.split(test_input,splits*total_frames_with_deltas,axis=1)         #test_input_list = np.split(test_input,45,axis=3)          for i in range(len(test_input_list)):             test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,window_height,3)          #for i in range(len(test_input_list)):         #    test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)          test_output = np.random.randint(5, size = (1,total_frames,5))          middle = int(math.ceil(total_frames/2))          test_output = test_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))          yield (test_input_list, test_output) print "hallo"  def fws():     #print "Inside"     #   Params:     #   batch ,  lr, decay , momentum, epochs     #     #Input shape: (batch_size,40,45,3)     #output shape: (1,15,50)     # number of unit in conv_feature_map = splitd     next(train_generator())     model_output = []     list_of_input = [Input(shape=(8,3)) for i in range(splits*total_frames_with_deltas)]     output = []      #Conv     skip = total_frames_with_deltas     for steps in range(total_frames_with_deltas):         conv = Conv1D(filters = 100, kernel_size = 8)         column = 0         for  _ in range(splits):             #print "column " + str(column) + "steps: " + str(steps)             output.append(conv(list_of_input[(column*skip)+steps]))             column = column + 1      #print len(output)     #print splits*total_frames_with_deltas       conv = []     for section in range(splits):         column = 0         skip = splits         temp = []         for _ in range(total_frames_with_deltas):             temp.append(output[((column*skip)+section)])             column = column + 1         conv.append(Add()(temp))         #print len(conv)        output_conc = Concatenate()(conv)     #print output_conc.get_shape     output_conv = Reshape((splits, -1))(output_conc)     #print output_conv.get_shape      #Pool     pooled = MaxPooling1D(pool_size = 6, strides = 2)(output_conv)     reshape = Reshape((1,-1))(pooled)      #Fc     dense1 = Dense(units = 1024, activation = 'relu',    name = "dense_1")(reshape)     #dense2 = Dense(units = 1024, activation = 'relu',    name = "dense_2")(dense1)     dense3 = Dense(units = 1024, activation = 'relu',    name = "dense_3")(dense1)     final = Dense(units = 5, activation = 'relu',    name = "final")(dense3)      model = Model(inputs = list_of_input , outputs = final)     sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)     model.compile(loss="categorical_crossentropy", optimizer=sgd , metrics = ['accuracy'])     print "compiled"      model_yaml = model.to_yaml()     with open("model.yaml", "w") as yaml_file:         yaml_file.write(model_yaml)      print "Model saved!"      log= CSVLogger('/home/carl/kaldi-trunk/dnn/experimental/yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+".csv")     filepath='yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+"weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"     checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_weights_only=True, mode='max')       print "log"     #plot_model(model, to_file='model.png')     print "Fit"     hist_current = model.fit_generator(train_generator(),                         steps_per_epoch=444,#len(train_files),                         epochs = 10000,                         verbose = 1,                         validation_data = test_generator(),                         validation_steps=44,#len(test_files),                         pickle_safe = True,                         workers = 4,                         callbacks = [log,checkpoint])  fws() 

Execute the script by: python name_of_script.py yens 50 40 8 1

which give me a full traceback:

full traceback Error:

carl@ca-ThinkPad-T420s:~/Dropbox$ python mini.py yesno 50 40 8 1 Using TensorFlow backend. Couldn't import dot_parser, loading of dot files will not be possible. hallo hallo hallo compiled Model saved! log Fit /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:2252: UserWarning: Expected no kwargs, you passed 1 kwargs passed to function are ignored with Tensorflow backend   warnings.warn('\n'.join(msg)) Epoch 1/10000 2017-05-26 13:01:45.851125: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-26 13:01:45.851345: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-26 13:01:45.851392: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 443/444 [============================>.] - ETA: 4s - loss: 100.1266 - acc: 0.3138Epoch 00000: saving model to yesno_cnn_50_training_total_frames_50_dim_40_window_height_8weights-improvement-00-0.48.hdf5 Traceback (most recent call last):   File "mini.py", line 205, in <module>    File "mini.py", line 203, in fws    File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 88, in wrapper     return func(*args, **kwargs)   File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1933, in fit_generator     callbacks.on_epoch_end(epoch, epoch_logs)   File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 77, in on_epoch_end     callback.on_epoch_end(epoch, logs)   File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 411, in on_epoch_end     self.model.save_weights(filepath, overwrite=True)   File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2503, in save_weights     save_weights_to_hdf5_group(f, self.layers)   File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2746, in save_weights_to_hdf5_group     f.attrs['layer_names'] = [layer.name.encode('utf8') for layer in layers]   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)   File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 93, in __setitem__     self.create(name, data=value, dtype=base.guess_dtype(value))   File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 183, in create     attr = h5a.create(self._id, self._e(tempname), htype, space)   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)   File "h5py/h5a.pyx", line 47, in h5py.h5a.create (/tmp/pip-4rPeHA-build/h5py/h5a.c:1904) RuntimeError: Unable to create attribute (Object header message is too large) 

3 Answers

Answers 1

A simple solution, albeit possibly not the most elegant, could be to run a while loop with epochs = 1.

  1. Get the weights at the end of every epoch together with the accuracy and the loss
  2. Save the weights to file 1 with model.get_weight
  3. if accuracy is greater than at the previous epoch (i.e. loop), store the weights to a different file (file 2)
  4. Run the loop again loading the weights from file 1
  5. Break the loops setting a manual early stopping so that it breaks if the loss does not improve for a certain number of loops

Answers 2

You can use get_weights() together with numpy.save.

It's not the best solution, because it will save several files, but it actually works.

The problem is that you won't have the "optimizer" saved with the current states. But you can perhaps work around that by using smaller learning rates after loading.

Custom callback using numpy.save:

def myCallback(epoch,logs):     global storedLoss     #do your comparisons here using the "logs" var.     print(logs)       if (logs['loss'] < storedLoss):          storedLoss = logs['loss']         for i in range(len(model.layers)):              WandB = model.layers[i].get_weights()              if len (WandB) > 0: #necessary because some layers have no weights                  np.save("W" + "-" + str(i), WandB[0],False)                  np.save("B" + "-" + str(i), WandB[1],False)       #remember that get and set weights use a list: [weights,biases]        #it may happen (not sure) that there is no bias, and thus you may have to check it (len(WandB)==1).    

The logs var brings a dictionary with named metrics, such as "loss", and "accuracy", if you used it.

You can store the losses within the callback in a global var, and compare if each loss is better or worse than the last.

When fitting, use the lambda callback:

from keras.callbacks import LambdaCallback model.fit(...,callbacks=[LambdaCallback(on_epoch_end=myCallback)])    

In the example above, I used the LambdaCallback, which has more possibilities than just on_epoch_end.

For loading, do a similar loop:

#you have to create the model first and then set the layers def loadModel(model):     for i in range(len(model.layers)):         WandBForCheck = model.layers[i].get_weights()           if len (WandBForCheck) > 0: #necessary because some layers have no weights             W = np.load(Wfile + str(i))                B = np.load(Bfile + str(i))             model.layers[i].set_weights([W,B]) 

Answers 3

See follow-up at https://github.com/fchollet/keras/issues/6766 and https://github.com/farizrahman4u/keras-contrib/pull/90.

I saw the YAML and the root cause is probably that you have so many Inputs. A few Inputs with many dimensions is preferred to many Inputs, especially if you can use scanning and batch operations to do everything efficiently.

Now, ignoring that entirely, here is how you can save and load your model if it has too much stuff to save as JSON efficiently:

You can pass save_weights_only=True. That won't save optimizer weights, so isn't a great solution.

Just put together a PR for saving model weights and optimizer weights but not configuration. When you want to load, first instantiate and compile the model as you did when you were going to train it, then use load_all_weights to load the model and optimizer weights into that model. I'll try to merge it soon so you can use it from the master branch.

You could use it something like this:

from keras.callbacks import LambdaCallback from keras_contrib.utils.save_load_utils import save_all_weights, load_all_weights # do some stuff to create and compile model # use `save_all_weights` as a callback to checkpoint your model and optimizer weights model.fit(..., callbacks=[LambdaCallback(on_epoch_end=lambda epoch, logs: save_all_weights(model, "checkpoint-{:05d}.h5".format(epoch))]) # use `load_all_weights` to load model and optimizer weights into an existing model # if not compiled (no `model.optimizer`), this will just load model weights load_all_weights(model, 'checkpoint-1337.h5') 

So I don't endorse the model, but if you want to get it to save and load anyways this should probably work for you.

As a side note, if you want to save weights in a different format, something like this would work.

pickle.dump([K.get_value(w) for w in model.weights], open( "save.p", "wb" ) ) 

Cheers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment