Tuesday, January 31, 2017

What are the rules to order the keywords in a MySQL boolean search?

Leave a Comment

When I change the order of the keywords in a boolean search, I get the same result but very different performance results.

The profiling on MySQL 5.6.33 with a MyISAM table, ft_min_word_len=2 and description_index as a FULLTEXT index on title and description returns this:

# Query 1 SELECT id FROM archive, topic WHERE topic.type=0 AND archive.status=2 AND MATCH(title, description) AGAINST ('+house* +tz*' IN BOOLEAN MODE) AND archive.topicId = topic.id ORDER BY archive.featured DESC, archive.submissionDate DESC LIMIT 0,20 

Result:

Total count: 12 Key_read_requests: 2384607 Creating sort index: 7.950430 sec (!) Duration: 8.851252 sec 

# Query 2 SELECT id FROM archive, topic WHERE topic.type=0 AND archive.status=2 AND MATCH(title, description) AGAINST ('+tz* +house*' IN BOOLEAN MODE) AND archive.topicId = topic.id ORDER BY archive.featured DESC, archive.submissionDate DESC LIMIT 0,20 

Result:

Total count: 12 Key_read_requests: 415 Creating sort index: 0.003449 Duration: 0.004054 sec 

Total records per keyword:

tz*: 135092 tz: 25596 house*: 12 

Explain is the same for both queries:

id | select_type | Table   | Type     | Key               | Key len | Ref             | Rows | Extra 1  | SIMPLE      | archive | fulltext | description_index | 0       |                 | 1    | Using where; Using filesort 1  | SIMPLE      | topic   | eq_ref   | PRIMARY           | 3       | archive.topicId | 1    | Using where 

Only Key_read_requests and Creating sort index are different between the 2 queries.

It seems that:

  1. the order of the keyword order is a critical performance factor
  2. the keywords are used in reverse order
  3. having the most discriminating keyword at the end improves the performance.

Questions:

  • What is the reason of this big performance difference?
  • What are the rules/best practices? (I could not find anything in the documentation of mysql).

2 Answers

Answers 1

Edit after OP comment:
I'm not sure about the exact query plan when resolving this query.
Sometimes one operation is more expensive than another therefore doing the less expensive operation first can sort out many rows that then don't have to go through the more expensive operation which leads to a reduced running time.
(In your example one of the matching-operations could be more expensive than the other which increases and reduces running time by changing the order of the strings to match against).

Answers 2

Did you run both queries against a fresh started instance? Its possible you are just gaining perfomance plus due to filled caches on the second run.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment