Friday, September 15, 2017

Load saved checkpoint and predict not producing same results as in training

Leave a Comment

I'm training based on a sample code I found on the Internet. The accuracy in testing is at 92% and the checkpoints are saved in a directory. In parallel (the training is running for 3 days now) I want to create my prediction code so I can learn more instead of just waiting.

This is my third day of deep learning so I probably don't know what I'm doing. Here's how I'm trying to predict:

  • Instantiate the model using the same code as in training
  • Load the last checkpoint
  • Try to predict

The code works but the results are nowhere near 90%.

Here's how I create the model:

INPUT_LAYERS = 2 OUTPUT_LAYERS = 2 AMOUNT_OF_DROPOUT = 0.3 HIDDEN_SIZE = 700 INITIALIZATION = "he_normal"  # : Gaussian initialization scaled by fan_in (He et al., 2014) CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")  def generate_model(output_len, chars=None):     """Generate the model"""     print('Build model...')     chars = chars or CHARS     model = Sequential()     # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE     # note: in a situation where your input sequences have a variable length,     # use input_shape=(None, nb_feature).     for layer_number in range(INPUT_LAYERS):         model.add(recurrent.LSTM(HIDDEN_SIZE, input_shape=(None, len(chars)), init=INITIALIZATION,                          return_sequences=layer_number + 1 < INPUT_LAYERS))         model.add(Dropout(AMOUNT_OF_DROPOUT))     # For the decoder's input, we repeat the encoded input for each time step     model.add(RepeatVector(output_len))     # The decoder RNN could be multiple layers stacked or a single layer     for _ in range(OUTPUT_LAYERS):         model.add(recurrent.LSTM(HIDDEN_SIZE, return_sequences=True, init=INITIALIZATION))         model.add(Dropout(AMOUNT_OF_DROPOUT))      # For each of step of the output sequence, decide which character should be chosen     model.add(TimeDistributed(Dense(len(chars), init=INITIALIZATION)))     model.add(Activation('softmax'))      model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])     return model 

In a separate file predict.py I import this method to create my model and try to predict:

...import code model = generate_model(len(question), dataset['chars']) model.load_weights('models/weights.204-0.20.hdf5')  def decode(pred):     return character_table.decode(pred, calc_argmax=False)   x = np.zeros((1, len(question), len(dataset['chars']))) for t, char in enumerate(question):     x[0, t, character_table.char_indices[char]] = 1.  preds = model.predict_classes([x], verbose=0)[0]  print("======================================") print(decode(preds)) 

I don't know what the problem is. I have about 90 checkpoints in my directory and I'm loading the last one based on accuracy. All of them saved by a ModelCheckpoint:

checkpoint = ModelCheckpoint(MODEL_CHECKPOINT_DIRECTORYNAME + '/' + MODEL_CHECKPOINT_FILENAME,                          save_best_only=True) 

I'm stuck. What am I doing wrong?

3 Answers

Answers 1

In the repo you provided, the training and validation sentences are inverted before being fed into the model (as commonly done in seq2seq learning).

dataset = DataSet(DATASET_FILENAME) 

As you can see, the default value for inverted is True, and the questions are inverted.

class DataSet(object):     def __init__(self, dataset_filename, test_set_fraction=0.1, inverted=True):         self.inverted = inverted      ...          question = question[::-1] if self.inverted else question         questions.append(question) 

You can try to invert the sentences during prediction. Specifically,

x = np.zeros((1, len(question), len(dataset['chars']))) for t, char in enumerate(question):     x[0, len(question) - t - 1, character_table.char_indices[char]] = 1. 

Answers 2

When you generate model in your predict.py file:

model = generate_model(len(question), dataset['chars']) 

is your first parameter the same as in your training file? Or is the question length dynamic? If so, you are generating different model, thus your saved checkpoint doesn't work.

Answers 3

It can be the dimentionality of arrays/df passed not matching what is expected by the functions you call. When a single dimention is expected by the called method, try ravel on what you expect to be a single dimention

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment