Sunday, June 25, 2017

How to speedup my tensorflow execution on hadoop?

By Hường Hana 4:00 AM hadoop, python, tensorflow Leave a Comment

The following script executes very slow. I just want to count the total number of lines in the twitter-follwer-graph (textfile with ~26 GB).

I need to perform a machine learning task. This is just a test on accessing data from the hdfs by tensorflow.

import tensorflow as tf import time  filename_queue = tf.train.string_input_producer(["hdfs://default/twitter/twitter_rv.net"], num_epochs=1, shuffle=False)  def read_filename_queue(filename_queue):     reader = tf.TextLineReader()     _, line = reader.read(filename_queue)     return line  line = read_filename_queue(filename_queue)  session_conf = tf.ConfigProto(intra_op_parallelism_threads=1500,inter_op_parallelism_threads=1500)  with tf.Session(config=session_conf) as sess:     sess.run(tf.initialize_local_variables())     coord = tf.train.Coordinator()     threads = tf.train.start_queue_runners(coord=coord)      start = time.time()     i = 0     while True:         i = i + 1         if i%100000 == 0:             print(i)             print(time.time() - start)          try:             sess.run([line])         except tf.errors.OutOfRangeError:             print('end of file')             break     print('total number of lines = ' + str(i))     print(time.time() - start)

The process needs about 40 secs for the first 100000 lines. I tried to set intra_op_parallelism_threads and inter_op_parallelism_threads to 0, 4, 8, 40, 400 and 1500. But it didn't effect the execution time significantly ...

Can you help me?

system specs:

16 GB RAM
4 CPU cores

2 Answers

Answers 1

You can split the big file into smaller ones, it may help. And set intra_op_parallelism_threads and inter_op_parallelism_threads to 0

Answers 2

Try this and it should improve your timing:

session_conf = tf.ConfigProto    (intra_op_parallelism_threads=0,inter_op_parallelism_threads=0)

It is not good to take the Config in your own hands when you do not know what is an optimum value.

Coding Question

Sunday, June 25, 2017

How to speedup my tensorflow execution on hadoop?

2 Answers

Answers 1

Answers 2

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook