Sunday, April 29, 2018

Getting same predicted values for all inputs in trained tensor flow network

Leave a Comment

I have created a tensorflow network designed to read data from this dataset (note: the information in this dataset is designed purely for test purposes and is not real):enter image description here and am trying to build a tensorflow network designed to essentially predict values in the 'Exited' column. My network is structured to take 11 inputs, pass through 2 hidden layers (6 neurons each) with relu activation, and output a single binary value using a sigmoid activation function in order to produce a probability distribution. I am using a gradient descent optimizer and a mean squared error cost function. However, after training the network on my training data and predicting off my testing data, all my predicted values are greater than 0.5 meaning likely to be true and I'm not sure what the problem is:

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size=0.2, random_state=101) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.fit_transform(X_test)  training_epochs = 200 n_input = 11 n_hidden_1 = 6 n_hidden_2 = 6 n_output = 1  def neuralNetwork(x, weights):      layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])      layer_1 = tf.nn.relu(layer_1)      layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])      layer_2 = tf.nn.relu(layer_2)      output_layer = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])      output_layer = tf.nn.sigmoid(output_layer)      return output_layer  weights = {     'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),     'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),     'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output])) }  biases = {     'b1': tf.Variable(tf.random_uniform([n_hidden_1])),     'b2': tf.Variable(tf.random_uniform([n_hidden_2])),     'output': tf.Variable(tf.random_uniform([n_output])) }  x = tf.placeholder('float', [None, n_input]) # [?, 11] y = tf.placeholder('float', [None, n_output]) # [?, 1]  output = neuralNetwork(x, weights) cost = tf.reduce_mean(tf.square(output - y)) optimizer = tf.train.AdamOptimizer().minimize(cost)  with tf.Session() as session:     session.run(tf.global_variables_initializer())     for epoch in range(training_epochs):         session.run(optimizer, feed_dict={x:X_train, y:y_train.reshape((-1,1))})     print('Model has completed training.')     test = session.run(output, feed_dict={x:X_test})     predictions = (test>0.5).astype(int)     print(predictions) 

All help is appreciated! I have been looking through questions related to my problem but none of the suggestions have seemed to help.

1 Answers

Answers 1

Initial assumption: I won't access data from a personal link for security reasons. It's your responsibility to create a reproducible code snippet based solely on secure/persistent artifacts.
However, I can confirm your problem happens when your code is ran against keras.datasets.mnist, with a small change: each sample is associated with a label 0: odd or 1: even.

Short answer: you messed up the initialization. Change tf.random_uniform to tf.random_normal and set biases to a deterministic 0.

Actual answer: ideally, you want the model to start predicting randomly, close to the 0.5. This will prevent the saturation of the sigmoid's output and result in large gradients in early stages of training.

The sigmoid's eq. is s(y) = 1/(1 + e**-y), and s(y) = 0.5 <=> y = 0. Therefore, the layer's output y = w * x + b must be 0.

If you used StandardScaler, then your input data follows a Gaussian distribution, mean = 0.5, std = 1.0. Your parameters must sustain this distribution! However, you've initialized your biases with tf.random_uniform, which uniformly draws values from the [0, 1) interval.

By starting your biases at 0, y will be close to 0:

y = w * x + b = sum(.1 * -1, .9 * -.9, ..., .1 * 1, .9 * .9) + 0 = 0 

So your biases should be:

biases = {     'b1': tf.Variable(tf.zeros([n_hidden_1])),     'b2': tf.Variable(tf.zeros([n_hidden_2])),     'output': tf.Variable(tf.zeros([n_output])) } 

This is sufficient to output numbers smaller than 0.5:

[1.        0.4492423 0.4492423 ... 0.4492423 0.4492423 1.       ] predictions mean: 0.7023628 confusion matrix: [[4370 1727]  [1932 3971]] accuracy: 0.6950833333333334 

Further corrections:

  • Your neuralNetwork function does not take a biases parameter. It instead uses the one defined in the other scope, which seems like a mistake.

  • You should not fit the scaler to the test data, because you will lose the statistics from train and because it violates the principle that that chunk of data is purely observational. Do this:

    scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_test = scaler.transform(x_test) 
  • It's very uncommon to use MSE with sigmoid output. Use binary cross-entropy instead:

    logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output']) output = tf.nn.sigmoid(logits) cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits) 
  • It's more reliable to initialize the weights from a normal distribution:

    weights = {     'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),     'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),     'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output])) } 
  • You are feeding the entire train dataset at each epoch, instead of batching it, which is the default in Keras. Therefore, it's reasonable to assume Keras implementation will converge faster and the results might differ.

By making a few teaks, I manage to achieve this results:

import tensorflow as tf from keras.datasets.mnist import load_data from sacred import Experiment from sklearn.metrics import accuracy_score, confusion_matrix from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler  ex = Experiment('test-16')   @ex.config def my_config():     training_epochs = 200     n_input = 784     n_hidden_1 = 32     n_hidden_2 = 32     n_output = 1   def neuralNetwork(x, weights, biases):     layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])     layer_1 = tf.nn.relu(layer_1)     layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])     layer_2 = tf.nn.relu(layer_2)     logits = tf.add(tf.matmul(layer_2, weights['output']), biases['output'])     predictions = tf.nn.sigmoid(logits)     return logits, predictions   @ex.automain def main(training_epochs, n_input, n_hidden_1, n_hidden_2, n_output):     (x_train, y_train), _ = load_data()     x_train = x_train.reshape(x_train.shape[0], -1).astype(float)     y_train = (y_train % 2 == 0).reshape(-1, 1).astype(float)      x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, test_size=0.2, random_state=101)     print('y samples:', y_train, y_test, sep='\n')      scaler = StandardScaler()     x_train = scaler.fit_transform(x_train)     x_test = scaler.transform(x_test)      weights = {         'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),         'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),         'output': tf.Variable(tf.random_normal([n_hidden_2, n_output]))     }      biases = {         'b1': tf.Variable(tf.zeros([n_hidden_1])),         'b2': tf.Variable(tf.zeros([n_hidden_2])),         'output': tf.Variable(tf.zeros([n_output]))     }      x = tf.placeholder('float', [None, n_input])  # [?, 11]     y = tf.placeholder('float', [None, n_output])  # [?, 1]      logits, output = neuralNetwork(x, weights, biases)     # cost = tf.reduce_mean(tf.square(output - y))     cost = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)     optimizer = tf.train.AdamOptimizer().minimize(cost)      with tf.Session() as session:         session.run(tf.global_variables_initializer())         try:             for epoch in range(training_epochs):                 print('epoch #%i' % epoch)                 session.run(optimizer, feed_dict={x: x_train, y: y_train})          except KeyboardInterrupt:             print('interrupted')          print('Model has completed training.')         p = session.run(output, feed_dict={x: x_test})         p_labels = (p > 0.5).astype(int)          print(p.ravel())         print('predictions mean:', p.mean())          print('confusion matrix:', confusion_matrix(y_test, p_labels), sep='\n')         print('accuracy:', accuracy_score(y_test, p_labels)) 
[0.        1.        0.        ... 0.0302309 0.        1.       ] predictions mean: 0.48261687 confusion matrix: [[5212  885]  [ 994 4909]] accuracy: 0.8434166666666667 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment