Friday, February 10, 2017

Ruby MongodB - improving speed when working with multiple collections

Leave a Comment

I'm using MongoDB with Ruby using mongo gem.

I have the following scenario:

  1. for each document in a collection say coll1, look at key1 and key2
  2. search for document in another collection say coll2 with matching values for key1 and key2
  3. if there is a match, add document fetched in #2 with a new key key3 whose value be set to value of key3 in the document referenced in #1
  4. insert the updated hash into a new collection coll3

The general guideline with MongoDB has been to handle cross collection operations in application code.

So I do the following:

    client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db,                                 :server_selection_timeout => 5)     cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id     cursor.each do |doc|         doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method         if(doc_coll2 && doc[:key3])             doc_coll2[:key3] = doc[:key3]             doc_coll2.delete(:_id) # remove key :_id             client[:coll3].insert_one(doc_coll2)         end     end 

This works, but it takes a lot of time to finish this job - approximately 250ms per document in collection coll1 or 3600s (1 hour) for ~15000 records, which seems a lot, which could be associated with reading the document one at a time, do the check in app code and then writing one doc at a time back to a new collection.

Is there a way to get this operation be done faster? Is the way I'm doing even the right way to do it?

Example documents

  • coll1

    {     "_id" : ObjectId("588610ead0ae360cb815e55f"),     "key1" : "115384042",     "key2" : "276209",     "key3" : "10101122317876" } 
  • coll2

    {     "_id" : ObjectId("788610ead0ae360def15e88e"),     "key1" : "115384042",     "key2" : "276209",     "key4" : 10,     "key5" : 4,     "key6" : 0,     "key7" : "false",     "key8" : 0,     "key9" : "false" } 
  • coll3

    {     "_id" : ObjectId("788610ead0ae360def15e88e"),     "key1" : "115384042",     "key2" : "276209",     "key3" : "10101122317876",     "key4" : 10,     "key5" : 4,     "key6" : 0,     "key7" : "false",     "key8" : 0,     "key9" : "false" } 

1 Answers

Answers 1

A solution would be to use aggregation instead, and do this in one single query:

  • perform a join on key1 field with $lookup
  • unwind the array with $unwind
  • keep doc where coll1.key2 == coll2.key2 with $redact
  • reformat the document with $project
  • write it to coll3 with $out

so the query would be :

db.coll1.aggregate([     { "$lookup": {          "from": "coll2",          "localField": "key1",          "foreignField": "key1",          "as": "coll2_doc"     }},      { "$unwind": "$coll2_doc" },     { "$redact": {          "$cond": [             { "$eq": [ "$key2", "$coll2_doc.key2" ] },              "$$KEEP",              "$$PRUNE"         ]     }},      {        $project: {          key1: 1,           key2: 1,           key3: 1,           key4: "$coll2_doc.key4",          key5: "$coll2_doc.key5",           key6: "$coll2_doc.key6",           key7: "$coll2_doc.key7",           key8: "$coll2_doc.key8",       key9: "$coll2_doc.key9",          }      },      {$out: "coll3"}  ], {allowDiskUse: true} ); 

and db.coll3.find() would return

{     "_id" : ObjectId("588610ead0ae360cb815e55f"),     "key1" : "115384042",     "key2" : "276209",     "key3" : "10101122317876",     "key4" : 10,     "key5" : 4,     "key6" : 0,     "key7" : "false",     "key8" : 0,     "key9" : "false" } 

Edit: MongoDB 3.4 solution

If you don't want to specify all keys in the $project stage, you can take advantage of $addFields and $replaceRoot, two new operators introduced in MongoDB 3.4

the query would become:

db.coll1.aggregate([     { "$lookup": {          "from": "coll2",          "localField": "key1",          "foreignField": "key1",          "as": "coll2_doc"     }},      { "$unwind": "$coll2_doc" },     { "$redact": {          "$cond": [             { "$eq": [ "$key2", "$coll2_doc.key2" ] },              "$$KEEP",              "$$PRUNE"         ]     }},      {$addFields: {"coll2_doc.key3": "$key3" }},     {$replaceRoot: {newRoot: "$coll2_doc"}},     {$out: "coll3"}  ], {allowDiskUse: true}) 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment