Tuesday, September 18, 2018

Trying to retrain a tensorflow model, input and output nodes disappear

By Hường Hana 3:30 PM tensorflow Leave a Comment

I am trying to retrain the tensorflow deeplab model using MobileNet_V2. I have downloaded the checkpoint from the deeplab model zoo, about halfway down this page: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md Specifically, the mobilenetv2_coco_voc_trainaug one. I would like my retrained output to have the same graph, but different parameters as this one. (Well, almost the same graph, the final tensor should probably have a different shape because I am trying to work with a different number of classes.)

I assembled my own images into a tfrecord, labelled with just one class for now. This is practice for a dataset with 4 classes.

I then ran the following to retrain the network, producing .pbtxt, .meta, .index and .data-00000-of-00001 files:

PATH_TO_INITIAL_CHECKPOINT=/path/to/unzipped/files/model.ckpt-30000.index PATH_TO_TRAIN_DIR=/path/to/checkpoints/ PATH_TO_DATASET=/path/to/tfrecord python /path/to/tensorflow/models/research/deeplab/train.py \     --logtostderr \     --training_number_of_steps=900 \ # 90000 \     --train_split="train" \     --model_variant="mobilenet_v2" \     --output_stride=16 \     --decoder_output_stride=4 \     --train_crop_size=128 \     --train_crop_size=128 \     --train_batch_size=1 \     --dataset="cityscapes" \     --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \     --train_logdir=${PATH_TO_TRAIN_DIR} \     --dataset_dir=${PATH_TO_DATASET} \     --initialize_last_layer=False \     --last_layers_contain_logits_only=True \     --fine_tune_batch_norm=False

Running bazel's summarize_graph on the downloaded file gives:

Found 1 possible inputs: (name=ImageTensor, type=uint8(4), shape=[1,?,?,3])  No variables spotted. Found 1 possible outputs: (name=SemanticPredictions, op=Slice)

When I scan the nodes of the .pbtxt file, I can't find any nodes called ImageTensor or SemanticPredictions. I have tried with tensorboard, bazel's summarize_graph, and programmatically (e.g. here, here, or here). Summarize_graph says No inputs spotted and Found 664 possible outputs:.

This then leads to problems with freeze_graph.py. If I choose output_node_names from what I can see on tensorbord, then freeze_graph.py runs, and I am able to get a frozen graph. But running that model gives me

TypeError: Cannot interpret feed_dict key as Tensor: The name  'ImageTensor:0' refers to a Tensor which does not exist. The operation,  'ImageTensor', does not exist in the graph.

I'm definitely doing something wrong here. The question is: what? I suspect it could be the arguments I supply to train.py, but really, that's just a shot in the dark. It could be that this is not how train.py is intended to be used, or deeplab's train.py is not compatible with MobileNetV2.

Edit: After a closer look at the options available in train.py, I have updated my command. Cleaning previous failed models from the TRAIN_DIR was also helpful to avoid the error:

Restoring from checkpoint failed. This is most likely due to a mismatch  between the current graph and the graph from the checkpoint. Please ensure  that you have not altered the graph expected based on the checkpoint.

Coding Question

Tuesday, September 18, 2018

Trying to retrain a tensorflow model, input and output nodes disappear

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook