How to get bias and neuron weights in optimizer?

In a TensorFlow optimizer (python) the method apply_dense does get called for the neuron weights (layer connections) and the bias weights but I would like to use both in this method.

def _apply_dense(self, grad, weight):     ...

For example: A fully connected neural network with two hidden layer with two neurons and a bias for each.

If we take a look at layer 2 we get in apply_dense a call for the neuron weights:

and a call for the bias weights:

But I would either need both matrix in one call of apply_dense or a weight matrix like this:

X_2X_4, B_1X_4, ... is just a notation for the weight of the connection between the two neurons. Therefore B_1X_4 ist only a placeholder for the weight between B_1 and X_4.

How to do this?

MWE

For an minimal working example here a stochastic gradient descent optimizer implementation with a momentum. For every layer the momentum of all incoming connections from other neurons is reduced to the mean (see ndims == 2). What i need instead is the mean of not only the momentum values from the incoming neuron connections but also from the incoming bias connections (as described above).

from __future__ import absolute_import from __future__ import division from __future__ import print_function  import tensorflow as tf from tensorflow.python.training import optimizer   class SGDmomentum(optimizer.Optimizer):     def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):         super(SGDmomentum, self).__init__(use_locking, name)         self._lr = learning_rate         self._mu = mu          self._lr_t = None         self._mu_t = None      def _create_slots(self, var_list):         for v in var_list:             self._zeros_slot(v, "a", self._name)      def _apply_dense(self, grad, weight):         learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)         mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)         momentum = self.get_slot(weight, "a")          if momentum.get_shape().ndims == 2:  # neuron weights             momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)         elif momentum.get_shape().ndims == 1:  # bias weights             momentum_mean = momentum         else:             momentum_mean = momentum          momentum_update = grad + (mu_t * momentum_mean)         momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)          weight_update = learning_rate_t * momentum_t         weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)          return tf.group(*[weight_t, momentum_t])      def _prepare(self):         self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")         self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")

For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (only change the optimizer to the custom SGDmomentum optimizer)

1 Answers

Answers 1

I'm not 100% clear on what you are trying to do, so I'm not sure if this really answers your question.

Let's say you have a dense layer transforming an input of size M to an output of size N. According to the convention you show, you'd have an N × M weights matrix W and a N-sized bias vector B. Then, an input vector X of size M (or a batch of inputs of size M × K) would be processed by the layer as W · X + B, and then applying the activation function (in the case of a batch, the addition would be a "broadcasted" operation). In TensorFlow:

X = ...  # Input batch of size M x K W = ...  # Weights of size N x M B = ...  # Biases of size N  Y = tf.matmul(W, X) + B[:, tf.newaxis]  # Output of size N x K # Activation...

If you want, you can always put W and B together in a single extended weights matrix W*, basically adding B as a new row in W, so W* would be (N + 1) × M. Then you just need to add a new element to the input vector X containing a constant 1 (or a new row if it's a batch), so you would get X* with size N + 1 (or (N + 1) × K for a batch). The product W* · X* would then give you the same result as before. In TensorFlow:

X = ...  # Input batch of size M x K W_star = ...  # Extended weights of size (N + 1) x M # You can still have a "view" of the original W and B if you need it W = W_star[:N] B = W_star[-1]  X_star = tf.concat([X, tf.ones_like(X[:1])], axis=0) Y = tf.matmul(W_star, X_star)  # Output of size N x K # Activation...

Now you can compute gradients and updates for weights and biases together. A drawback of this approach is that if you want to apply regularization then you should be careful to apply it only on the weights part of the matrix, not on the biases.

Coding Question

Wednesday, July 19, 2017

How to get bias and neuron weights in optimizer?

1 Answers

Answers 1

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook