Wednesday, September 6, 2017

What's wrong with my implementation of Neural Networks?

Leave a Comment

enter image description hereI want to plot the learning error curve of a neural net with respect to the number of training examples. Here is the code :

import sklearn import numpy as np from sklearn.model_selection import learning_curve import matplotlib.pyplot as plt from sklearn import neural_network from sklearn import cross_validation  myList=[] myList2=[] w=[]  dataset=np.loadtxt("data", delimiter=",") X=dataset[:, 0:6] Y=dataset[:,6] clf=sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(2,3),activation='tanh')  # split the data between training and testing X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.25, random_state=33)  # begin with few training datas X_eff=X_train[0:int(len(X_train)/150), : ] Y_eff=Y_train[0:int(len(Y_train)/150)]  k=int(len(X_train)/150)-1   for m in range (140) :       print (m)      w.append(k)      # train the model and store the training error     A=clf.fit(X_eff,Y_eff)     myList.append(1-A.score(X_eff,Y_eff))        # compute the testing error     myList2.append(1-A.score(X_test,Y_test))      # add some more training datas     X_eff=np.vstack((X_eff,X_train[k+1:k+101,:]))     Y_eff=np.hstack((Y_eff,Y_train[k+1:k+101]))     k=k+100  plt.figure(figsize=(8, 8)) plt.subplots_adjust() plt.title("Erreur d'entrainement et de test") plt.plot(w,myList,label="training error") plt.plot(w,myList2,label="test error") plt.legend() plt.show() 

However, I get a very strange result, with curves fluctuating, the training error very close to the testing error which does not appear to be normal. Where is the mistake? I can't understand why there are so many ups and downs and why the training error does not increase, as it would be expected to.Any help would be appreciated !

EDIT : the dataset I am using is https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29 where I got rid of the classes having less than 1000 instances. I manually re-encoded the litteral data.

3 Answers

Answers 1

I think that the reason you're seeing this kind of curve is that the performance metric you are measuring is different from the performance metric that you are optimizing.

Optimization metric

The neural network minimizes a loss function, and in the case of tanh activiations, I assume you are using a modified version of the cross entropy loss. If you were to plot the loss over time, you would see a more monotonically decreasing error function like you expect. (Not actually monotonic because neural networks are non-convex, but that's beside the point.)

Performance metric

The performance metric that you are measuring is the percent accuracy, which is different from the loss. Why are these different? The loss function tells us how much error we have in a differentiable way (which is important for fast optimization methods). The accuracy metric tells us how well we predict, which is useful for application of the neural network.

Putting it together

Because you are plotting the performance of a related metric, you can expect that the plot will look similar to that of your optimized metric. However because they are not the same, you may be introducing some unaccounted-for variance in your plot (as evidenced by the plot you posted).

There are a couple of ways to fix this.

  1. Plot the loss instead of the accuracy. This doesn't actually fix your problem if you actually need the accuracy plot, but it will give you much more smooth curves.
  2. Plot an average over multiple runs. Save the accuracy plots over 20 independent runs of your algorithm (as in train the network 20 times), then average them together and plot this. That will greatly reduce the variance.

TL;DR

Don't expect the accuracy plot to always be smooth and monotonically decreasing, it won't be.

Answers 2

Randomize training set and repeat

If you would like a fair comparison of the effect of the number of training sample on the accuracy, I suggest to randomly pick n_samples from your training set instead of adding 100 samples to the previous batch. You would also repeat the fit N_repeat times for each n_samples value.

This would give something like this (not tested):

n_samples_array = np.arange(100,len(X_train),100) N_repeat = 10  for n_samples in n_samples_array:     print(n_samples)      # repeat the fit several times and take the mean     myList_tmp, myList2_tmp = [],[]     for repeat in range(0,N_repeat):         # Randomly pick samples         selection = np.random.choice(range(0,len(X_train)),n_samples,repeat=False)          # train the model and store the training error         A=clf.fit(X_train[selection],Y_train[selection])         myList_tmp.append(1-A.score(X_train[selection],Y_train[selection]))            # compute the testing error         myList2_tmp.append(1-A.score(X_test,Y_test))      myList.append(np.mean(myList_tmp))     myList2.append(np.mean(myList2_tmp)) 

Warm start

When you use the fit function, you restart the optimization from scratch. If you would like to see the improvement on your optimization when adding a few sample to the same previously trained network, you can use the option warm_start=True

As per the documentation:

warm_start : bool, optional, default False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

Answers 3

In addition to the previous answers you should also have in mind, that you might have to tweak the learning rate (by setting learning_rate = value in the initializer) of the network. If you choose the rate to big, you will jump from on local minimum to another or circle around these points, but won't actually converge (see the image below, taken from here).

Furthermore, please also plot the loss and not just the accuracy of your network. This will give you better insights about it.

Also, keep in mind, that you have to use a lot of training and test data to get a more or less "smooth" curve, or even a representative curve; if you are using just a few (maybe a few hundred) data points, the resulting metrics will not actually be very accurate, as they contain a lot of stochastic things. To solve this error you should not train the network with the same examples every time, but rather change the orders of your training data, and maybe split it up on different mini batches. I am very confident, that you can solve or even reduce your problem by trying to mind these aspects and to implement them.

Depending on your kind of problem, you should change the activation function to something different than the tanh function. Performing a classification, a OneHotEncoder might also be useful (if your data is not already one hot encoded); the sklearn framework is offering an implementation of this, too.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment