I'm doing some work with logs. Need to calculate a sum of time duration when the process was running without long interruptions. Set the maximum possible interruption to 30 seconds. Logs are emitted every 3 seconds.
So, for example if it was running since 10:20:00 (hours) to 10:30:00 and was interrupted from 10:24:10 to 10:27:10, the desired result is the sum of 10:24:10 - 10:20:00 and 10:30:00 - 10:27:10 = 420 (in seconds). However, calculating time difference using datetime types does not provide a valid solution - I suppose it calculates a difference without including a start/end seconds.
here is the solution I came up with (['timestamps'] is a list of datetime timestamps normally emitted every 3 sec):
for k, v in proc_activity.items(): proc_activity[k]['duration'] = 0 start, next = v['timestamps'][0], '' for time in v['timestamps']: next = time diff = next - start if diff.seconds < 30: proc_activity[k]['duration'] += diff.seconds else: print("diff: %s" % diff.seconds) start = next print(f"added: {proc_activity[k]['duration']}") diff = v['timestamps'][-1] - v['timestamps'][0] print(f"real: {diff.seconds}") output:
added: 39 real: 45 added: 39 real: 45 diff: 36 added: 155 real: 218 any suggestion how to fix it?
update, sample input data:
{'service_0': {'timestamps': [datetime.datetime(2018, 7, 1, 22, 33, 39, 86170), datetime.datetime(2018, 7, 1, 22, 33, 42, 33213), datetime.datetime(2018, 7, 1, 22, 33, 44, 898234), datetime.datetime(2018, 7, 1, 22, 33, 47, 893731), datetime.datetime(2018, 7, 1, 22, 33, 50, 928946), datetime.datetime(2018, 7, 1, 22, 33, 53, 895617), datetime.datetime(2018, 7, 1, 22, 35, 7, 116182), datetime.datetime(2018, 7, 1, 22, 35, 10, 105035), datetime.datetime(2018, 7, 1, 22, 35, 13, 193428), datetime.datetime(2018, 7, 1, 22, 35, 16, 210135), datetime.datetime(2018, 7, 1, 22, 35, 19, 168881), datetime.datetime(2018, 7, 1, 22, 35, 22, 114653), datetime.datetime(2018, 7, 1, 22, 35, 25, 102365), datetime.datetime(2018, 7, 1, 22, 35, 43, 46950), datetime.datetime(2018, 7, 1, 22, 35, 46, 15435), datetime.datetime(2018, 7, 1, 22, 35, 49, 23333), datetime.datetime(2018, 7, 1, 22, 35, 52, 22164), datetime.datetime(2018, 7, 1, 22, 35, 55, 78615), datetime.datetime(2018, 7, 1, 22, 35, 58, 78573)]}} 2 Answers
Answers 1
In short, I think the key thing you are missing is to use timedelta.total_seconds() rather than timedelta.seconds
This seems to work fine for me:
import datetime from pprint import pprint def get_duration(timestamps): max_interruption = 30 starts = timestamps[:-1] ends = timestamps[1:] durations = zip(starts, ends) accumulated = 0 for start, end in durations: delta = (end - start).total_seconds() if delta < max_interruption: accumulated += delta return accumulated proc_activity = { 'service_0': { 'timestamps': [ datetime.datetime(2018, 7, 1, 22, 33, 39, 86170), datetime.datetime(2018, 7, 1, 22, 33, 42, 33213), datetime.datetime(2018, 7, 1, 22, 33, 44, 898234), datetime.datetime(2018, 7, 1, 22, 33, 47, 893731), datetime.datetime(2018, 7, 1, 22, 33, 50, 928946), datetime.datetime(2018, 7, 1, 22, 33, 53, 895617), datetime.datetime(2018, 7, 1, 22, 35, 7, 116182), datetime.datetime(2018, 7, 1, 22, 35, 10, 105035), datetime.datetime(2018, 7, 1, 22, 35, 13, 193428), datetime.datetime(2018, 7, 1, 22, 35, 16, 210135), datetime.datetime(2018, 7, 1, 22, 35, 19, 168881), datetime.datetime(2018, 7, 1, 22, 35, 22, 114653), datetime.datetime(2018, 7, 1, 22, 35, 25, 102365), datetime.datetime(2018, 7, 1, 22, 35, 43, 46950), datetime.datetime(2018, 7, 1, 22, 35, 46, 15435), datetime.datetime(2018, 7, 1, 22, 35, 49, 23333), datetime.datetime(2018, 7, 1, 22, 35, 52, 22164), datetime.datetime(2018, 7, 1, 22, 35, 55, 78615), datetime.datetime(2018, 7, 1, 22, 35, 58, 78573) ], } } for k,v in proc_activity.items(): proc_activity[k]['duration'] = get_duration(v['timestamps']) pprint(proc_activity) has a duration of 65.77183800000002 seconds
Answers 2
My try to the problem using generators:
import datetime def calculate(timestamps, largest_interrupt = 30): begin_t, last_good_t = timestamps[0], timestamps[0] for current_t, previous_t in zip(timestamps[1:], timestamps): if (current_t - last_good_t).total_seconds() < largest_interrupt: last_good_t = current_t continue yield (previous_t - begin_t).total_seconds() last_good_t, begin_t = current_t, current_t yield (current_t - begin_t).total_seconds() sample_data = {'service_0': {'timestamps': [datetime.datetime(2018, 7, 1, 22, 33, 39, 86170), datetime.datetime(2018, 7, 1, 22, 33, 42, 33213), datetime.datetime(2018, 7, 1, 22, 33, 44, 898234), datetime.datetime(2018, 7, 1, 22, 33, 47, 893731), datetime.datetime(2018, 7, 1, 22, 33, 50, 928946), datetime.datetime(2018, 7, 1, 22, 33, 53, 895617), datetime.datetime(2018, 7, 1, 22, 35, 7, 116182), datetime.datetime(2018, 7, 1, 22, 35, 10, 105035), datetime.datetime(2018, 7, 1, 22, 35, 13, 193428), datetime.datetime(2018, 7, 1, 22, 35, 16, 210135), datetime.datetime(2018, 7, 1, 22, 35, 19, 168881), datetime.datetime(2018, 7, 1, 22, 35, 22, 114653), datetime.datetime(2018, 7, 1, 22, 35, 25, 102365), datetime.datetime(2018, 7, 1, 22, 35, 43, 46950), datetime.datetime(2018, 7, 1, 22, 35, 46, 15435), datetime.datetime(2018, 7, 1, 22, 35, 49, 23333), datetime.datetime(2018, 7, 1, 22, 35, 52, 22164), datetime.datetime(2018, 7, 1, 22, 35, 55, 78615), datetime.datetime(2018, 7, 1, 22, 35, 58, 78573)]}} for k, v in sample_data.items(): s = sum(calculate(v['timestamps'])) print(f"Service '{k}' has duration of '{s}' seconds") The program prints Service 'service_0' has duration of '65.771838' seconds
0 comments:
Post a Comment