pieces of numerically identical code produce drastically different results

I define my own op and its gradient in TensorFlow with this function.

    # define gradient of a python function def py_func_with_grad(func, inp, Tout, stateful=True, name=None, grad=None):      num = []     for i in range(100):         num.append(str(np.random.randint(0,10)))     rnd_name = 'PyFuncGrad' + ''.join(num)     tf.RegisterGradient(rnd_name)(grad)     g = tf.get_default_graph()     with g.gradient_override_map({"PyFunc": rnd_name}):         return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

I have a neural network that contains the following code snippet, where I have 5 numerically identical lines (I use one of them once). They produce drastically different results. I wonder if anybody has any clue. Thanks!!

For example, it's so weird that in (1) and (2) by merely replacing x with a TF variable (s_final) can make such a difference. I thought since they are numerically the same, there shouldn't be any difference.

s_final is a Tensorflow non-trainable variable.

    def _idenity_func(x,s):          return s     def _dummy_grad(op,grad):         return grad*0,grad*0      assign_op_s_final_2 = s_final.assign(x)     with tf.control_dependencies( [assign_op_s_final_2] ):         x = tf.identity(x)      x = tf.stop_gradient(x)      # the three following lines should be numerically identical. since s_final has been assigned the value of x. but...     # (1) use the following line, the network does not learn AT ALL!!     x_revised = py_func_with_grad(_idenity_func, [x, s_final], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad) )     # (2) use the following line, the network learns, even if x does not need any gradient (since there is tf.stop_gradient)     # x_revised = py_func_with_grad(_idenity_func, [x, x], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad))      # (3) use the following line, the network learns as well as (2)     # x_revised = tf.stop_gradient(x)      # (4) use the following line, the network learns, but seems not as well as (2)       # x_revised = tf.stop_gradient(s_final)     # (5) use the following line, the network does not learn AT ALL!!     # x_revised = py_func_with_grad(_idenity_func, [x, tf.stop_gradient(s_final)], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad) )

Code is provided (requires tensorflow 0.12.1. Does not work with version >=1 because the implementation of HyperNetworks does not support tensorflow version >=1):

https://www.dropbox.com/s/58khyqdy3mtnri7/tensorflow_clean_ver01.zip?dl=0

The above lines are in the code we provide. Change them and run the model to see difference. Let me know any question about the code.

You can install tensorflow 0.12.1 to a temporary folder:

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl pip install --target=$HOME/tensorflow_versions/tf-0.12.1  --upgrade $TF_BINARY_URL

Then the path is added when you run the provided code. I use this approach to have multiple versions of Tensorflow on my computer.

1 Answers

Answers 1

Works fine in my experiments: I added code that uses x_revised and looked at the values of gradients with respect to other variables involved. The mistake must be in the code that's not posted.

Coding Question

Thursday, May 25, 2017

pieces of numerically identical code produce drastically different results

1 Answers

Answers 1

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook