I'm using MongoDB with Ruby using mongo
gem.
I have the following scenario:
- for each document in a collection say
coll1
, look atkey1
andkey2
- search for document in another collection say
coll2
with matching values forkey1
andkey2
- if there is a match, add document fetched in #2 with a new key
key3
whose value be set to value ofkey3
in the document referenced in #1 - insert the updated hash into a new collection
coll3
The general guideline with MongoDB has been to handle cross collection operations in application code.
So I do the following:
client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db, :server_selection_timeout => 5) cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id cursor.each do |doc| doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method if(doc_coll2 && doc[:key3]) doc_coll2[:key3] = doc[:key3] doc_coll2.delete(:_id) # remove key :_id client[:coll3].insert_one(doc_coll2) end end
This works, but it takes a lot of time to finish this job - approximately 250ms per document in collection coll1
or 3600s (1 hour) for ~15000 records, which seems a lot, which could be associated with reading the document one at a time, do the check in app code and then writing one doc at a time back to a new collection.
Is there a way to get this operation be done faster? Is the way I'm doing even the right way to do it?
Example documents
coll1
{ "_id" : ObjectId("588610ead0ae360cb815e55f"), "key1" : "115384042", "key2" : "276209", "key3" : "10101122317876" }
coll2
{ "_id" : ObjectId("788610ead0ae360def15e88e"), "key1" : "115384042", "key2" : "276209", "key4" : 10, "key5" : 4, "key6" : 0, "key7" : "false", "key8" : 0, "key9" : "false" }
coll3
{ "_id" : ObjectId("788610ead0ae360def15e88e"), "key1" : "115384042", "key2" : "276209", "key3" : "10101122317876", "key4" : 10, "key5" : 4, "key6" : 0, "key7" : "false", "key8" : 0, "key9" : "false" }
1 Answers
Answers 1
A solution would be to use aggregation instead, and do this in one single query:
- perform a join on key1 field with
$lookup
- unwind the array with
$unwind
- keep doc where
coll1.key2 == coll2.key2
with$redact
- reformat the document with
$project
- write it to coll3 with
$out
so the query would be :
db.coll1.aggregate([ { "$lookup": { "from": "coll2", "localField": "key1", "foreignField": "key1", "as": "coll2_doc" }}, { "$unwind": "$coll2_doc" }, { "$redact": { "$cond": [ { "$eq": [ "$key2", "$coll2_doc.key2" ] }, "$$KEEP", "$$PRUNE" ] }}, { $project: { key1: 1, key2: 1, key3: 1, key4: "$coll2_doc.key4", key5: "$coll2_doc.key5", key6: "$coll2_doc.key6", key7: "$coll2_doc.key7", key8: "$coll2_doc.key8", key9: "$coll2_doc.key9", } }, {$out: "coll3"} ], {allowDiskUse: true} );
and db.coll3.find()
would return
{ "_id" : ObjectId("588610ead0ae360cb815e55f"), "key1" : "115384042", "key2" : "276209", "key3" : "10101122317876", "key4" : 10, "key5" : 4, "key6" : 0, "key7" : "false", "key8" : 0, "key9" : "false" }
Edit: MongoDB 3.4 solution
If you don't want to specify all keys in the $project
stage, you can take advantage of $addFields
and $replaceRoot
, two new operators introduced in MongoDB 3.4
the query would become:
db.coll1.aggregate([ { "$lookup": { "from": "coll2", "localField": "key1", "foreignField": "key1", "as": "coll2_doc" }}, { "$unwind": "$coll2_doc" }, { "$redact": { "$cond": [ { "$eq": [ "$key2", "$coll2_doc.key2" ] }, "$$KEEP", "$$PRUNE" ] }}, {$addFields: {"coll2_doc.key3": "$key3" }}, {$replaceRoot: {newRoot: "$coll2_doc"}}, {$out: "coll3"} ], {allowDiskUse: true})
0 comments:
Post a Comment