Wednesday, August 29, 2018

Meaning of TensorFlow operation ` IsExpensive()`?

Leave a Comment

There is a method in OpKernel

 // Returns true iff this op kernel is considered "expensive". The  // runtime may use this flag to optimize graph execution for example  // to "inline" inexpensive kernels.  virtual bool IsExpensive() { return expensive_; } 

It seems that by default all operations on the GPU are considered as inexpensive whilst CPU, SYSL are flagged as expensive.

It is a bit hard to figure out the definition and effect of expensive. The is no information in the guide.

  1. Is there any specific guideline when IsExpensive should be false, true?
  2. What's the effect if an operation is flagged as expensive? So far I can only tell, that active profiling uses this just as a hint ? The only place querying this property is in the scheduler but without explaining what being inline means.
  3. In conjunction with "1." should I care about it in my custom Ops?
  4. While it makes sense, that any AsyncOp (like RemoteFusedGraphExecuteOp) is expensive, MPIAllgatherOp seems to be defined as not expensive. Isn't this a contradiction?

I am asking, because the IdentityOp is explicitly marked as inexpensive. I wonder, if I should override this method in my custom ops as well, since each CPU version (even any custom code) is flagged as expensive.

The entire logic of XLA seems to be about wether an instruction is expensive or not. So it might be an important part to consider. Therefore, a coin-toss about true/false might be not the best way to decide the return value in my custom op.

1 Answers

Answers 1

Before answering your questions I think it is worth trying to understand how TensorFlow uses threads in order to get your work done. For this, I suggest you read this related and very good SO post.

You will find that TensorFlow uses a thread-pool in order to get you work done. The expensive Ops are being scheduled for execution on the thread-pool, whereas the cheap Ops are executed "inline" meaning by the same thread which schedules the tasks (Sidenote: from the source file you have linked you find only one exception, i.e. when the inline_ready queue is empty the thread can execute the last expensive Op by itself.).

With this in mind, let us try to answer your questions.

  1. Is there any specific guideline when IsExpensive should be false, true?

I could not find a specific guideline in the TensorFlow manual, however, from the internals of what we discussed above a Op should be marked to be expensive, when the offset of scheduling a task to the thread pool is neglectable in comparison to the time the task needs to be executed.

  1. What's the effect if an operation is flagged as expensive? So far I can only tell, that active profiling uses this just as a hint ? The only place querying this property is in the scheduler but without explaining what being inline means.

The effect is the following, everytime an Ops IsExpensive method returns false it might be pushed to the inline_ready queue and may block the thread from performing further tasks hence stalling your programm. In contrast, if the Ops IsExpensive method returns true, it will be scheduled for execution on the thread pool and the scheduling thread is free to continue doing its tasks in the process loop.

  1. In conjunction with "1." should I care about it in my custom Ops?

I think you should care and try to reason as much as possible about the execution time of you Op. After that decide how you implement the IsExpensive method.

  1. While it makes sense, that any AsyncOp (like RemoteFusedGraphExecuteOp) is expensive, MPIAllgatherOp seems to be defined as not expensive. Isn't this a contradiction?

No, it is not a contradiction. If you read the comment of MPIAllgatherOp you will find the following:

// Although this op is handled asynchronously, the ComputeAsync call is // very inexpensive. It only sets up a CollectiveOpRecord and places it // in the table for the background thread to handle. Thus, we do not need // a TF pool thread to perform the op. 

Which clearly states, that scheduling this task for the thread pool would almost only be overhead. Therefore performing it inline makes a lot of sense.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment