Sunday, September 24, 2017

In order to get a speed boost for my python program, should I spawn a separate thread or a separate process for logging?

Leave a Comment

In order to get a speed boost for my python program, should I spawn a separate thread or a separate process for logging? My program uses a lot of logging and I am not sure if threading is suitable because of GIL. A lot of resources seem to suggest that it should be fine for I/O. I think that logging is I/O but I am not sure what does "should be fine" mean for most of the resources out there. I just need speed.

2 Answers

Answers 1

Before you start trying to optimize a program, there are some things you should do.

To start with, you should profile you programs. You could e.g. use line_profiler.

If it turns out that your software spends a considerable amount of time logging, there are two easy options.

  • Set the loglevel in production code so that no or few(er) messages are logged. There will still be some overhead left, but it should be much reduced.
  • Use mechanical means (like sed or grep) to completely remove the logging calls from the production code. If this doesn't improve the speed/throughput of your program, logging wasn't an issue.

If neither of those is suitable and logging is a significant fraction of your programs time you can try to implement thread or process based logging.

If you want to use threading for logging, you will besically need a list and a lock. The function that is called from the main thread to do logging grabs the lock, appends the text to log to the list and releases the lock. The second thread waits for the lock, grabs the lock, pops a couple of items from the list, releases the lock and writes the items to a file. Since the GIL makes sure that only one thread at a time is running Python bytecode, this will reduce the performance of your program somewhat; part of its time is spent running the bytecode from the logging thread.

Using multiprocessing is slightly different, in that you probably want to use e.g. a Queue to send logging messages from the main process to the logging process. The logging process takes items from the Queue and writes them to disk. This means that the time spent writing the logging actions to disk is spent in a different program. But there is some overhead associated with using a Queue as well.

You would have to measure to see which method uses less time in your program.

Answers 2

I am going by these assumptions:

  • You already determine that it is your logging that is bottlenecking your program.
  • You have a good reason why you are logging what you are logging.

The perceived slowness is most likely due to the success or failure acknowledgement from the logging action. To avoid this "command queuing" make the calls to a separate process asynchonously and skipping the callback. This may end up consuming more resources, but this will alleviate the backlog in your main program. Nodejs handles this naturally or you can roll your own python listener. Since this will be a separate process. You can redirect the logging feature of your other programs to this one. You can even have a separate machine to handle this workload.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment