Friday, June 8, 2018

PyMongo: How Do Update A Collection Using Aggregate?

Leave a Comment

This is a continuation of this question.

I'm using the following code to find all documents from collection C_a whose text contains the word StackOverflow and store them in another collection called C_b:

import pymongo from pymongo import MongoClient client = MongoClient('127.0.0.1')  # mongodb running locally dbRead = client['C_a']            # using the test database in mongo # create the pipeline required  pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"C_b"}]  # all attribute and operator need to quoted in pymongo dbRead.C_a.aggregate(pipeline)  #execution  print (dbRead.C_b.count()) ## verify count of the new collection  

This works great, however, if I run the same snippet for multiple keywords the results get overwritten. For example I want the collection C_b to contain all documents that contain the keywords StackOverflow, StackExchange, and Programming. To do so I simply iterate the snippet using the above keywords. But unfortunately, each iteration overwrites the previous.

Question: How do I update the output collection instead of overwriting it?

Plus: Is there a clever way to avoid duplicates, or do I have to check for duplicates afterwards?

1 Answers

Answers 1

If you look at the documentation $out doesn't support update

https://docs.mongodb.com/manual/reference/operator/aggregation/out/#pipe._S_out

So you need to do a two stage operation

pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"temp"}]  # all attribute and operator need to quoted in pymongo dbRead.C_a.aggregate(pipeline) 

and then use approach discussed in

https://stackoverflow.com/a/37433640/2830850

dbRead.C_b.insert(    dbRead.temp.aggregate([]).toArray() ) 

And before starting the run you will need to drop the C_b collection

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment