Wednesday, April 6, 2016

limit() and sort() order pymongo and mongodb

Leave a Comment

Despite reading peoples answers stating that the sort is done first, evidence shows something different that the limit is done before the sort. Is there a way to force sort always first?

views = mongo.db.view_logging.find().sort([('count', 1)]).limit(10) 

Whether I use .sort().limit() or .limit().sort(), the limit takes precedence. I wonder if this is something to do with pymongo...

4 Answers

Answers 1

According to the documentation, regardless of which goes first in your chain of commands, sort() would be always applied before the limit().

You can also study the .explain() results of your query and look at the execution stages - you will find that the sorting input stage examines all of the filtered (in your case all documents in the collection) and then the limit is applied.


Let's go through an example.

Imagine there is a foo database with a test collection having 6 documents:

>>> col = db.foo.test >>> for doc in col.find(): ...     print(doc) {'time': '2016-03-28 12:12:00', '_id': ObjectId('56f9716ce4b05e6b92be87f2'), 'value': 90} {'time': '2016-03-28 12:13:00', '_id': ObjectId('56f971a3e4b05e6b92be87fc'), 'value': 82} {'time': '2016-03-28 12:14:00', '_id': ObjectId('56f971afe4b05e6b92be87fd'), 'value': 75} {'time': '2016-03-28 12:15:00', '_id': ObjectId('56f971b7e4b05e6b92be87ff'), 'value': 72} {'time': '2016-03-28 12:16:00', '_id': ObjectId('56f971c0e4b05e6b92be8803'), 'value': 81} {'time': '2016-03-28 12:17:00', '_id': ObjectId('56f971c8e4b05e6b92be8806'), 'value': 90} 

Now, let's execute queries with different order of sort() and limit() and check the results and the explain plan.

Sort and then limit:

>>> from pprint import pprint >>> cursor = col.find().sort([('time', 1)]).limit(3)   >>> sort_limit_plan = cursor.explain() >>> pprint(sort_limit_plan) {u'executionStats': {u'allPlansExecution': [],                      u'executionStages': {u'advanced': 3,                                           u'executionTimeMillisEstimate': 0,                                           u'inputStage': {u'advanced': 6,                                                           u'direction': u'forward',                                                           u'docsExamined': 6,                                                           u'executionTimeMillisEstimate': 0,                                                           u'filter': {u'$and': []},                                                           u'invalidates': 0,                                                           u'isEOF': 1,                                                           u'nReturned': 6,                                                           u'needFetch': 0,                                                           u'needTime': 1,                                                           u'restoreState': 0,                                                           u'saveState': 0,                                                           u'stage': u'COLLSCAN',                                                           u'works': 8},                                           u'invalidates': 0,                                           u'isEOF': 1,                                           u'limitAmount': 3,                                           u'memLimit': 33554432,                                           u'memUsage': 213,                                           u'nReturned': 3,                                           u'needFetch': 0,                                           u'needTime': 8,                                           u'restoreState': 0,                                           u'saveState': 0,                                           u'sortPattern': {u'time': 1},                                           u'stage': u'SORT',                                           u'works': 13},                      u'executionSuccess': True,                      u'executionTimeMillis': 0,                      u'nReturned': 3,                      u'totalDocsExamined': 6,                      u'totalKeysExamined': 0},  u'queryPlanner': {u'indexFilterSet': False,                    u'namespace': u'foo.test',                    u'parsedQuery': {u'$and': []},                    u'plannerVersion': 1,                    u'rejectedPlans': [],                    u'winningPlan': {u'inputStage': {u'direction': u'forward',                                                     u'filter': {u'$and': []},                                                     u'stage': u'COLLSCAN'},                                     u'limitAmount': 3,                                     u'sortPattern': {u'time': 1},                                     u'stage': u'SORT'}},  u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',                  u'host': u'h008742.mongolab.com',                  u'port': 53439,                  u'version': u'3.0.7'}} 

Limit and then sort:

>>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = cursor.explain() >>> pprint(limit_sort_plan) {u'executionStats': {u'allPlansExecution': [],                      u'executionStages': {u'advanced': 3,                                           u'executionTimeMillisEstimate': 0,                                           u'inputStage': {u'advanced': 6,                                                           u'direction': u'forward',                                                           u'docsExamined': 6,                                                           u'executionTimeMillisEstimate': 0,                                                           u'filter': {u'$and': []},                                                           u'invalidates': 0,                                                           u'isEOF': 1,                                                           u'nReturned': 6,                                                           u'needFetch': 0,                                                           u'needTime': 1,                                                           u'restoreState': 0,                                                           u'saveState': 0,                                                           u'stage': u'COLLSCAN',                                                           u'works': 8},                                           u'invalidates': 0,                                           u'isEOF': 1,                                           u'limitAmount': 3,                                           u'memLimit': 33554432,                                           u'memUsage': 213,                                           u'nReturned': 3,                                           u'needFetch': 0,                                           u'needTime': 8,                                           u'restoreState': 0,                                           u'saveState': 0,                                           u'sortPattern': {u'time': 1},                                           u'stage': u'SORT',                                           u'works': 13},                      u'executionSuccess': True,                      u'executionTimeMillis': 0,                      u'nReturned': 3,                      u'totalDocsExamined': 6,                      u'totalKeysExamined': 0},  u'queryPlanner': {u'indexFilterSet': False,                    u'namespace': u'foo.test',                    u'parsedQuery': {u'$and': []},                    u'plannerVersion': 1,                    u'rejectedPlans': [],                    u'winningPlan': {u'inputStage': {u'direction': u'forward',                                                     u'filter': {u'$and': []},                                                     u'stage': u'COLLSCAN'},                                     u'limitAmount': 3,                                     u'sortPattern': {u'time': 1},                                     u'stage': u'SORT'}},  u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',                  u'host': u'h008742.mongolab.com',                  u'port': 53439,                  u'version': u'3.0.7'}} 

As you can see, in both cases the sort is applied first and affects all the 6 documents and then the limit limits the results to 3.

And, the execution plans are exactly the same:

>>> from copy import deepcopy  # just in case >>> cursor = col.find().sort([('time', 1)]).limit(3) >>> sort_limit_plan = deepcopy(cursor.explain()) >>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = deepcopy(cursor.explain()) >>> sort_limit_plan == limit_sort_plan True 

Also see:

Answers 2

I suspect, you're passing wrong key in sort parameter. something like "$key_name" instead of just "key_name"

refer How do you tell Mongo to sort a collection before limiting the results?solution for same problem as yours

Answers 3

Logically it should be whatever comes first in pipeline, But MongoDB always sort first before limit.

In my test Sort operation does takes precedence regardless of if it's coming before skip or after. However, it appears to be very strange behavior to me.

My sample dataset is:

[     {         "_id" : ObjectId("56f845fea524b4d098e0ef81"),          "number" : 48.98052410874508     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef82"),          "number" : 50.98747461471063     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef83"),          "number" : 81.32911244349772     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef84"),          "number" : 87.95549919039071     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef85"),          "number" : 81.63582683594402     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef86"),          "number" : 43.25696270026136     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef87"),          "number" : 88.22046335409453     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef88"),          "number" : 64.00556739160076     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef89"),          "number" : 16.09353150244296     },      {         "_id" : ObjectId("56f845fea524b4d098e0ef8a"),          "number" : 17.46667776660574     } ] 

Python test code:

import pymongo  client = pymongo.MongoClient("mongodb://localhost:27017") database = client.get_database("test") collection = database.get_collection("collection")  print("----------------[limit -> sort]--------------------------") result = collection.find().limit(5).sort([("number", pymongo.ASCENDING)]) for r in result:     print(r)  print("----------------[sort -> limit]--------------------------") result = collection.find().sort([("number", pymongo.ASCENDING)]).limit(5) for r in result:     print(r) 

Result:

----------------[limit -> sort]-------------------------- {u'_id': ObjectId('56f845fea524b4d098e0ef89'), u'number': 16.09353150244296} {u'_id': ObjectId('56f845fea524b4d098e0ef8a'), u'number': 17.46667776660574} {u'_id': ObjectId('56f845fea524b4d098e0ef86'), u'number': 43.25696270026136} {u'_id': ObjectId('56f845fea524b4d098e0ef81'), u'number': 48.98052410874508} {u'_id': ObjectId('56f845fea524b4d098e0ef82'), u'number': 50.98747461471063} ----------------[sort -> limit]-------------------------- {u'_id': ObjectId('56f845fea524b4d098e0ef89'), u'number': 16.09353150244296} {u'_id': ObjectId('56f845fea524b4d098e0ef8a'), u'number': 17.46667776660574} {u'_id': ObjectId('56f845fea524b4d098e0ef86'), u'number': 43.25696270026136} {u'_id': ObjectId('56f845fea524b4d098e0ef81'), u'number': 48.98052410874508} {u'_id': ObjectId('56f845fea524b4d098e0ef82'), u'number': 50.98747461471063} 

Answers 4

The mongodb documentation states that the skip() method controls the starting point of the results set, followed by sort() and ends with the limit() method.

This is regardless the order of your code. The reason is that mongo gets all the methods for the query, then it orders the skip-sort-limit methods in that exact order, and then runs the query.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment