Sunday, July 8, 2018

python 3.6 sum of the short periods between timestamps

Leave a Comment

I'm doing some work with logs. Need to calculate a sum of time duration when the process was running without long interruptions. Set the maximum possible interruption to 30 seconds. Logs are emitted every 3 seconds.

So, for example if it was running since 10:20:00 (hours) to 10:30:00 and was interrupted from 10:24:10 to 10:27:10, the desired result is the sum of 10:24:10 - 10:20:00 and 10:30:00 - 10:27:10 = 420 (in seconds). However, calculating time difference using datetime types does not provide a valid solution - I suppose it calculates a difference without including a start/end seconds.

here is the solution I came up with (['timestamps'] is a list of datetime timestamps normally emitted every 3 sec):

for k, v in proc_activity.items():         proc_activity[k]['duration'] = 0          start, next = v['timestamps'][0], ''         for time in v['timestamps']:             next = time             diff = next - start              if diff.seconds < 30:                 proc_activity[k]['duration'] += diff.seconds             else:                 print("diff: %s" % diff.seconds)              start = next          print(f"added: {proc_activity[k]['duration']}")         diff = v['timestamps'][-1] - v['timestamps'][0]         print(f"real: {diff.seconds}") 

output:

added: 39 real: 45 added: 39 real: 45 diff: 36 added: 155 real: 218 

any suggestion how to fix it?

update, sample input data:

{'service_0': {'timestamps': [datetime.datetime(2018, 7, 1, 22, 33, 39, 86170),                                      datetime.datetime(2018, 7, 1, 22, 33, 42, 33213),                                      datetime.datetime(2018, 7, 1, 22, 33, 44, 898234),                                      datetime.datetime(2018, 7, 1, 22, 33, 47, 893731),                                      datetime.datetime(2018, 7, 1, 22, 33, 50, 928946),                                      datetime.datetime(2018, 7, 1, 22, 33, 53, 895617),                                      datetime.datetime(2018, 7, 1, 22, 35, 7, 116182),                                      datetime.datetime(2018, 7, 1, 22, 35, 10, 105035),                                      datetime.datetime(2018, 7, 1, 22, 35, 13, 193428),                                      datetime.datetime(2018, 7, 1, 22, 35, 16, 210135),                                      datetime.datetime(2018, 7, 1, 22, 35, 19, 168881),                                      datetime.datetime(2018, 7, 1, 22, 35, 22, 114653),                                      datetime.datetime(2018, 7, 1, 22, 35, 25, 102365),                                      datetime.datetime(2018, 7, 1, 22, 35, 43, 46950),                                      datetime.datetime(2018, 7, 1, 22, 35, 46, 15435),                                      datetime.datetime(2018, 7, 1, 22, 35, 49, 23333),                                      datetime.datetime(2018, 7, 1, 22, 35, 52, 22164),                                      datetime.datetime(2018, 7, 1, 22, 35, 55, 78615),                                      datetime.datetime(2018, 7, 1, 22, 35, 58, 78573)]}} 

2 Answers

Answers 1

In short, I think the key thing you are missing is to use timedelta.total_seconds() rather than timedelta.seconds

This seems to work fine for me:

import datetime from pprint import pprint  def get_duration(timestamps):     max_interruption = 30     starts = timestamps[:-1]     ends = timestamps[1:]     durations = zip(starts, ends)     accumulated = 0     for start, end in durations:         delta = (end - start).total_seconds()         if delta < max_interruption:             accumulated += delta     return accumulated  proc_activity = {     'service_0': {         'timestamps': [             datetime.datetime(2018, 7, 1, 22, 33, 39, 86170),             datetime.datetime(2018, 7, 1, 22, 33, 42, 33213),             datetime.datetime(2018, 7, 1, 22, 33, 44, 898234),             datetime.datetime(2018, 7, 1, 22, 33, 47, 893731),             datetime.datetime(2018, 7, 1, 22, 33, 50, 928946),             datetime.datetime(2018, 7, 1, 22, 33, 53, 895617),             datetime.datetime(2018, 7, 1, 22, 35, 7, 116182),             datetime.datetime(2018, 7, 1, 22, 35, 10, 105035),             datetime.datetime(2018, 7, 1, 22, 35, 13, 193428),             datetime.datetime(2018, 7, 1, 22, 35, 16, 210135),             datetime.datetime(2018, 7, 1, 22, 35, 19, 168881),             datetime.datetime(2018, 7, 1, 22, 35, 22, 114653),             datetime.datetime(2018, 7, 1, 22, 35, 25, 102365),             datetime.datetime(2018, 7, 1, 22, 35, 43, 46950),             datetime.datetime(2018, 7, 1, 22, 35, 46, 15435),             datetime.datetime(2018, 7, 1, 22, 35, 49, 23333),             datetime.datetime(2018, 7, 1, 22, 35, 52, 22164),             datetime.datetime(2018, 7, 1, 22, 35, 55, 78615),             datetime.datetime(2018, 7, 1, 22, 35, 58, 78573)         ],     } }  for k,v in proc_activity.items():     proc_activity[k]['duration'] = get_duration(v['timestamps'])  pprint(proc_activity) 

has a duration of 65.77183800000002 seconds

Answers 2

My try to the problem using generators:

import datetime  def calculate(timestamps, largest_interrupt = 30):     begin_t, last_good_t = timestamps[0], timestamps[0]     for current_t, previous_t in zip(timestamps[1:], timestamps):         if (current_t - last_good_t).total_seconds() < largest_interrupt:             last_good_t = current_t             continue         yield (previous_t - begin_t).total_seconds()         last_good_t, begin_t = current_t, current_t     yield (current_t - begin_t).total_seconds()  sample_data = {'service_0': {'timestamps': [datetime.datetime(2018, 7, 1, 22, 33, 39, 86170),                                      datetime.datetime(2018, 7, 1, 22, 33, 42, 33213),                                      datetime.datetime(2018, 7, 1, 22, 33, 44, 898234),                                      datetime.datetime(2018, 7, 1, 22, 33, 47, 893731),                                      datetime.datetime(2018, 7, 1, 22, 33, 50, 928946),                                      datetime.datetime(2018, 7, 1, 22, 33, 53, 895617),                                      datetime.datetime(2018, 7, 1, 22, 35, 7, 116182),                                      datetime.datetime(2018, 7, 1, 22, 35, 10, 105035),                                      datetime.datetime(2018, 7, 1, 22, 35, 13, 193428),                                      datetime.datetime(2018, 7, 1, 22, 35, 16, 210135),                                      datetime.datetime(2018, 7, 1, 22, 35, 19, 168881),                                      datetime.datetime(2018, 7, 1, 22, 35, 22, 114653),                                      datetime.datetime(2018, 7, 1, 22, 35, 25, 102365),                                      datetime.datetime(2018, 7, 1, 22, 35, 43, 46950),                                      datetime.datetime(2018, 7, 1, 22, 35, 46, 15435),                                      datetime.datetime(2018, 7, 1, 22, 35, 49, 23333),                                      datetime.datetime(2018, 7, 1, 22, 35, 52, 22164),                                      datetime.datetime(2018, 7, 1, 22, 35, 55, 78615),                                      datetime.datetime(2018, 7, 1, 22, 35, 58, 78573)]}}   for k, v in sample_data.items():     s = sum(calculate(v['timestamps']))     print(f"Service '{k}' has duration of '{s}' seconds") 

The program prints Service 'service_0' has duration of '65.771838' seconds

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment