Sunday, June 25, 2017

How to speedup my tensorflow execution on hadoop?

Leave a Comment

The following script executes very slow. I just want to count the total number of lines in the twitter-follwer-graph (textfile with ~26 GB).

I need to perform a machine learning task. This is just a test on accessing data from the hdfs by tensorflow.

import tensorflow as tf import time  filename_queue = tf.train.string_input_producer(["hdfs://default/twitter/twitter_rv.net"], num_epochs=1, shuffle=False)  def read_filename_queue(filename_queue):     reader = tf.TextLineReader()     _, line = reader.read(filename_queue)     return line  line = read_filename_queue(filename_queue)  session_conf = tf.ConfigProto(intra_op_parallelism_threads=1500,inter_op_parallelism_threads=1500)  with tf.Session(config=session_conf) as sess:     sess.run(tf.initialize_local_variables())     coord = tf.train.Coordinator()     threads = tf.train.start_queue_runners(coord=coord)      start = time.time()     i = 0     while True:         i = i + 1         if i%100000 == 0:             print(i)             print(time.time() - start)          try:             sess.run([line])         except tf.errors.OutOfRangeError:             print('end of file')             break     print('total number of lines = ' + str(i))     print(time.time() - start) 

The process needs about 40 secs for the first 100000 lines. I tried to set intra_op_parallelism_threads and inter_op_parallelism_threads to 0, 4, 8, 40, 400 and 1500. But it didn't effect the execution time significantly ...

Can you help me?


system specs:

  • 16 GB RAM
  • 4 CPU cores

2 Answers

Answers 1

You can split the big file into smaller ones, it may help. And set intra_op_parallelism_threads and inter_op_parallelism_threads to 0

Answers 2

Try this and it should improve your timing:

session_conf = tf.ConfigProto    (intra_op_parallelism_threads=0,inter_op_parallelism_threads=0) 

It is not good to take the Config in your own hands when you do not know what is an optimum value.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment