Friday, August 17, 2018

Looping over Pymongo cursor returns bson.errors.InvalidBSON error after some iterations

Leave a Comment

I'm trying making a simple query with pymongo and looping over the results.

This is the code I'm using:

data = [] tam = db.my_collection.find({'timestamp': {'$gte': start, '$lte':end}}).count() for i,d in enumerate(table.find({'timestamp': {'$gte': start, '$lte':end}}):     print('%s of %s' % (i,tam))     data.append(d) 

start and end variables are datetime python objects. Everything runs fine until I get the following output:

2987 of 12848 2988 of 12848 2989 of 12848 2990 of 12848 2991 of 12848 2992 of 12848 Traceback (most recent call last):   File "db_extraction\extract_data.py", line 68, in <module>     data = extract_data(yesterday,days = 1)   File "db_extraction\extract_data.py", line 24, in extract_data     for i,d in enumerate(table.find({'timestamp': {'$gte': start, '$lte':end}}).limit(100000)):   File "\venv\lib\site-packages\pymongo\cursor.py", line 1169, in next     if len(self.__data) or self._refresh():   File "\venv\lib\site-packages\pymongo\cursor.py", line 1106, in _refresh     self.__send_message(g)   File "\venv\lib\site-packages\pymongo\cursor.py", line 971, in __send_message     codec_options=self.__codec_options)   File "\venv\lib\site-packages\pymongo\cursor.py", line 1055, in _unpack_response     return response.unpack_response(cursor_id, codec_options)   File "\venv\lib\site-packages\pymongo\message.py", line 945, in unpack_response     return bson.decode_all(self.documents, codec_options) bson.errors.InvalidBSON 

First thing I've tried is changing the range of the query to check if it is data related, and it's not. Another range stops at 1615 of 6360 and same error.

I've also tried list(table.find({'timestamp': {'$gte': start, '$lte':end}}) and same error.

Another maybe relevant info is that first queries are really fast. It freezes on the last number for a while before returning the error.

So I need some help. Am I hitting limits here? Or any clue on whats going on?

This is might be related with this 2013 question, but the author says that he gets no error output.

Thanks!

2 Answers

Answers 1

Your code runs fine on my computer. Since it works for your first 2992 records, I think the documents may have some inconsistency. Does every document in your collection follow the same schema and format? and is your pymongo updated?

Here is my suggestion if you want to loop through every record:

data = [] all_posts = db.my_collection.find({'timestamp': {'$gte': start, '$lte':end}}) tam = all_posts.count() i = 0 for post in all_posts:     i += 1     print('%s of %s' % (i,tam))     data.append(post) 

Regards,

Answers 2

Could it be related to specific documents in the DB? Have you checked the document that might cause the error (e.g., the 2992th result of your above query, starting with 0)?

You could also execute some queries against the DB directly (e.g., via the mongo shell) without using pymongo to see whether expected results are returned. For example, you could try db.my_collection.find({...}).skip(2992) to see the result. You could also use cursor.forEach() to print all the retrieved documents.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment