Wednesday, August 23, 2017

How efficient is searching by tags?

Leave a Comment

Situation:

I have 1 million users and I want to search them by tags.

Here's the schema:

const Account = new Schema({   tags: { type: [String], default: ['cats'] } }) 

Here's the first query:

const query = { tags: 'cats'} const fields = { _id: 0 } const options = { skip: 0, limit: 5 }  Account.find(query, fields, options) 

After calling find method Mongoose will start searching and return the first 5 entries it matches. The performance in this case is optimal.

Here's the second query:

const query = { tags: 'cats'} const fields = { _id: 0 } const options = { skip: 500 000, limit: 5 }  Account.find(query, fields, options) 

What going on in this case is of interest to me.

Does Mongoose match 500 000 entries first and then returns the next 5 entries?

Or does it somehow 'jump' to the 500 000 element without having to match them all beforehand?

I'm trying to understand how efficient this proccess is and whether I can improve it somehow. Do you have any advice?

2 Answers

Answers 1

Interesting. My understanding is that Findhere is very similar in query functionalities to Select, wherein it will work inefficiently by first going through 500,000 values before skipping them.

Hence, what I suggested is implementing something known as a Multilevel index. Basically create a nested indexing system, which although taking up data space, is very efficient, especially for sequential search practices where you do not wish to implement a tree outright. Create a set of outer indices which are more general, then link them to inner indices respectively, before moving onto stored data blocks.

Implementing this should not be too difficult and should drastically improve the amount of traversing being done. Hope this helps.

Answers 2

Yes, it is matching 500,000 entries first.

From the MongoDB Definitive Guide 2nd Edition:

$skip takes a number, n, and discards the first n documents from the result set. As with “normal” querying, it isn’t efficient for large skips, as it must find all of the matches that must be skipped and then discard them.

You can examine your query in detail from the Mongo shell using explain():

https://docs.mongodb.com/manual/reference/method/cursor.explain/

Ex:

// Create a dummy collection var tagTypes = ['cats', 'dogs', 'chickens']     for (i=0; i < 1000; i++) {         let tag = tagTypes[Math.floor(Math.random()*tagTypes.length)]         db.people.insert({"tags" : [tag]})     } 

then explain it

db.people.find({"tags" : "cats"}).skip(100).limit(10).explain("executionStats”) 

totalDocsExamined is showing you that it is matching everything that it's skipping

"executionStats" : { ...     "totalDocsExamined" : 420 

If you were to create an index on tags:

db.people.ensureIndex({ "tags" : 1 }) 

Running it again you'd get:

"executionStats" : { ...     "totalDocsExamined" : 110 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment