Saturday, February 24, 2018

Tracking aggregates in MongoDB in near real-time

Leave a Comment

We need to track record counts for records in an Accounts collection, based on a 'type' field. So we want to know how many Accounts are in TYPE1, how many are in TYPE2, etc... Further, we need to know the totals of a 'amount' field inside each Account.

Aggregate queries aren't going to be fast enough for us (these counts need to update in real-time in the UI and we will have 10s of millions of records, aggregate queries that take many seconds to run aren't going to cut it), so I'm looking at having a separate totals collection with an object that tracks counters for each type.

As we change the value of the 'type' field (i.e. move the account from one type to another), we need to adjust the counts and 'value' totals (decrement the counter for the original type, increment the counter for the new type). We can then use an update command with $incr() to adjust the fields in the totals record that stores the type counts and value sums. (This does mean that we have two database writes for every 'type' update, but I don't see a way around this unless anyone has a suggestion).

For single record adjustments, this is pretty straightforward - we can just trap the type change in our data access layer and make a secondary update in the totals tracking object.

The problem is how to track the 'amount' totals. For single record adjustments, this isn't a problem. But for bulk operations, we need to get the total of the 'amount' field for each of the adjusted records.

So far, I haven't been able to easily find a way to get Mongo to get the information that I need.

I've got one strategy worked out that involves adding a tagged history array in the Account object with a unique "changeId" and the 'amount' the document record had at the time of the change, then running an aggregate against that history record for the changeId to get the totals. Then optionally delete the history record (or do that in a periodic clean-up process).

For example, if I did a bulk change, I'd generate a unique ID ('aaaaaaaa' in the following), then do an array insert for a history record as part of the bulk update that adjusts the 'type':

{   "amount": 123,   "type": "TYPE1",   "history": [      {        "changeId": "aaaaaaaaaa",        "amount": 123,        "oldType": "TYPE2",        "newType": "TYPE1"      }   ] } 

Then I can do an aggregate that gives me the sum of the 'amount' for the changeId that just ran.

I think this will work, but it's clumsy - is there a better way?

1 Answers

Answers 1

I don't think you need to store much data, just a ticker tape of changes:

changeID:  "aaaaaaaa" Type: Type2 chgAmount: 123 changeID:  "aaaaaaaa" Type: Type1 chgAmount: -500 

A separate cleanup process would run on an interval to apply the data in the tape to the totals. I think this is very close to what you're proposing except it doesn't need to be stored in the account; it's a separate object written to by the account update. If the history is stored with the account, the cleanup process has to determine which accounts to cleanup. The ticker tape process just sweeps everything in the object. The tape entries can be removed when processed.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment