I am running an LSTM network for language modeling and I am using the tf.Supervisor
for saving and restoring the session.
After each epoch, I print the perplexity value. The first time that I restore the session, I notice how the perplexity is lower than in the previous run, but afterwards although I see the perplexity go down in further epochs, when I restore the model the perplexity is the same as it was after the first time I saved it. I always print the global_step
, so I am sure that I am loading the latest checkpoint.
Perplexity values in first run:
Global step 0 Epoch: 1 Train perplexity: 1053.873 Eval perplexity: 994.486 Epoch: 2 Train perplexity: 559.507 Eval perplexity: 803.345 Epoch: 3 Train perplexity: 377.886 Eval perplexity: 606.682 Epoch: 4 Train perplexity: 282.728 Eval perplexity: 472.485 Epoch: 5 Train perplexity: 229.564 Eval perplexity: 433.604
Second run after restore:
Global step 830 Epoch: 1 Train perplexity: 394.555 Eval perplexity: 562.316 Epoch: 2 Train perplexity: 280.981 Eval perplexity: 440.451 Epoch: 3 Train perplexity: 226.292 Eval perplexity: 384.905 Epoch: 4 Train perplexity: 189.826 Eval perplexity: 340.012 Epoch: 5 Train perplexity: 166.766 Eval perplexity: 328.017
Third run after restore:
Global step 1648 Epoch: 1 Train perplexity: 374.898 Eval perplexity: 508.347 Epoch: 2 Train perplexity: 271.804 Eval perplexity: 419.742 Epoch: 3 Train perplexity: 224.735 Eval perplexity: 367.012 Epoch: 4 Train perplexity: 192.667 Eval perplexity: 336.119 Epoch: 5 Train perplexity: 170.210 Eval perplexity: 303.626
All further runs deliver similar results, the global_step
continues to go up, but the perplexity value stays the same.
Below a snippet of the code from where I start the session.
def run_epoch(session: tf.Session, model, is_train=False, verbose=False): costs = 0 costs_list = [] iters = 0 start_time = time.time() fetches = {'cost': model.cost, 'final_state': model.final_state, 'outputs': model.outputs, 'states': model.states } if is_train: fetches.update({'train_op': model._train_op}) state = session.run(model.initial_state) for step in range(model.input.epoch_size): feed_dict = {} for i, (c, h) in enumerate(model.initial_state): feed_dict[c] = state[i].c feed_dict[h] = state[i].h vals = session.run(fetches, feed_dict) cost = vals["cost"] state = vals["final_state"] costs += cost costs_list.append(cost) iters += model.input.num_steps if verbose and step % (model.input.epoch_size // 10) == 10: print('mean_cost:', np.array(costs_list).mean(), 'costs:', costs, 'iters:', iters) print("%.3f perplexity: %.3f speed: %.0f wps" % (step * 1.0 / model.input.epoch_size, np.exp(costs / iters), iters * model.input.batch_size * max(1, 1) / (time.time() - start_time))) costs_list = [] return np.exp(costs / iters) def train_model(): config = ModelConfig() raw_data, id_to_word = embed_to_vocab(config, TRAIN_SOURCE) word_to_id = {w: i for i, w in id_to_word.items()} raw_eval_data, id_to_word_eval = embed_to_vocab(config, EVAL_SOURCE, word_to_id=word_to_id) with tf.Graph().as_default(): with tf.name_scope('train'): with tf.variable_scope('Model'): data_input = LstmInput(config, raw_data) model = LstmNet(config, data_input) print('train model created') with tf.name_scope('eval'): with tf.variable_scope('Model', reuse=True): eval_input = LstmInput(config, raw_eval_data) eval_model = LstmNet(config, eval_input, is_training=False) print('eval model created') config_proto = tf.ConfigProto(allow_soft_placement=False) sv = tf.train.Supervisor(logdir=SAVE_PATH, save_model_secs=7) with sv.managed_session(config=config_proto) as session: print('Global step', sv.global_step.eval(session=session)) for i in range(EPOCHS): print('Epoch: %s' % (i + 1)) train_perplexity = run_epoch(session, model, verbose=False, is_train=True) print('Train perplexity: {:.3f}'.format(train_perplexity)) eval_perplexity = run_epoch(session, eval_model) print('Eval perplexity: {:.3f}'.format(eval_perplexity)) print(sv.global_step.eval(session=session))
0 comments:
Post a Comment