Friday, March 31, 2017

system auto reboot when tensorflow model is too large

Leave a Comment

I'm using a nvidia GTX1080 gpu(8GB) to run Inception model on tensorflow, when I set batch_size = 16 and image_size = 400, then after I start the program, my ubuntu14.04 will auto reboot.

1 Answers

Answers 1

Make sure it is not a power supply unit problem. I was observing strange occasional reboots on my development machine. As I was increasing the size of input (batch size, larger NN) the rate of reboots was increasing as well. Turned out to be a PSU problem. A quick check is to limit GPU power consumption and see if this behavior will go away. For instance, you can limit power to about 150 watts with this command (you'll need a sudo rights):

sudo nvidia-smi -pl 150 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment