The table structure is:
CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `from` int(10) unsigned NOT NULL, `to` int(10) unsigned NOT NULL, `message` text NOT NULL, `sent` int(10) unsigned NOT NULL DEFAULT '0', `read` tinyint(1) unsigned NOT NULL DEFAULT '0', `direction` tinyint(1) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `one` (`to`,`direction`,`from`,`id`), KEY `two` (`from`,`direction`,`to`,`id`), KEY `three` (`read`,`direction`,`to`), KEY `four` (`read`,`direction`,`from`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; I have a strange issue. Please have a look at the following query:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test where ( (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and ( (test.from = 204177 and test.id > 5341203) OR (test.from = 214518 and test.id > 5336549) OR (test.from = 231429 and test.id > 5338284) OR (test.from = 242739 and test.id > 5339541) OR (test.from = 243834 and test.id > 5340438) OR (test.from = 244354 and test.id > 5337489) OR (test.from = 244644 and test.id > 5338572) OR (test.from = 244690 and test.id > 5338467) ) ) or (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and ( (test.to = 204177 and test.id > 5341203) OR (test.to = 214518 and test.id > 5336549) OR (test.to = 231429 and test.id > 5338284) OR (test.to = 242739 and test.id > 5339541) OR (test.to = 243834 and test.id > 5340438) OR (test.to = 244354 and test.id > 5337489) OR (test.to = 244644 and test.id > 5338572) OR (test.to = 244690 and test.id > 5338467) ) ) or (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) ) or (test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) ) ) order by test.id; If I do an explain on this query, it goes through all the rows:
1 SIMPLE test index PRIMARY,one,two,three,four PRIMARY 4 1440596 Using where If I remove both the "not in" statements, then it works fine:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test where ( (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and ( (test.from = 204177 and test.id > 5341203) OR (test.from = 214518 and test.id > 5336549) OR (test.from = 231429 and test.id > 5338284) OR (test.from = 242739 and test.id > 5339541) OR (test.from = 243834 and test.id > 5340438) OR (test.from = 244354 and test.id > 5337489) OR (test.from = 244644 and test.id > 5338572) OR (test.from = 244690 and test.id > 5338467) ) ) or (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and ( (test.to = 204177 and test.id > 5341203) OR (test.to = 214518 and test.id > 5336549) OR (test.to = 231429 and test.id > 5338284) OR (test.to = 242739 and test.id > 5339541) OR (test.to = 243834 and test.id > 5340438) OR (test.to = 244354 and test.id > 5337489) OR (test.to = 244644 and test.id > 5338572) OR (test.to = 244690 and test.id > 5338467) ) ) or (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 ) or (test.read <> 1 and test.direction = 2 and test.from = 244975 ) ) order by test.id; Now the explain query returns:
1 SIMPLE test index_merge PRIMARY,one,two,three,four one,two 5,5 30 Using sort_union(one,two); Using where; Using filesort I am not sure why it does not work right. What am I missing in the indexes?
5 Answers
Answers 1
If your mySQL version less then 5.0.7, the mysql issue can be reason
Have a look this ticket in MySQL bug tracking https://bugs.mysql.com/bug.php?id=10561
Answers 2
Mixing AND and OR would often cause weird query plan on MySQL, in my experience. I don't have enough data on the table to test, but I would try rewriting your query using UNION ALL. After all, OR in WHERE is basically UNION.
The idea is to break it down on smaller conditions so MySQL can use different index optimized for each part as opposed to jamming all together.
SELECT * FROM ( SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction FROM test WHERE test.to = 244975 AND test.direction <> 2 AND test.direction <> 3 AND ( (test.from = 204177 AND test.id > 5341203) OR (test.from = 214518 AND test.id > 5336549) OR (test.from = 231429 AND test.id > 5338284) OR (test.from = 242739 AND test.id > 5339541) OR (test.from = 243834 AND test.id > 5340438) OR (test.from = 244354 AND test.id > 5337489) OR (test.from = 244644 AND test.id > 5338572) OR (test.from = 244690 AND test.id > 5338467) ) UNION ALL SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction FROM test WHERE test.from = 244975 AND test.direction <> 1 AND test.direction <> 3 AND ( (test.to = 204177 and test.id > 5341203) OR (test.to = 214518 and test.id > 5336549) OR (test.to = 231429 and test.id > 5338284) OR (test.to = 242739 and test.id > 5339541) OR (test.to = 243834 and test.id > 5340438) OR (test.to = 244354 and test.id > 5337489) OR (test.to = 244644 and test.id > 5338572) OR (test.to = 244690 and test.id > 5338467) ) UNION ALL SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction FROM test WHERE test.read <> 1 AND test.direction <> 3 AND test.direction <> 2 AND test.to = 244975 AND test.from NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) UNION ALL SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction FROM test WHERE test.read <> 1 AND test.direction = 2 AND test.from = 244975 AND test.to NOT IN (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) ) test ORDER BY test.id Answers 3
It would be good to have a dump of sample data to test with, but I created some of my own anyway. Next I split each of the four outer OR conditions into sub-queries, UNIONed them, and moved the ordering to the final result set.
I've had issues with indexes when using complex WHERE clauses and to me it looks like you have a chat/messaging app and are attempting to get messages to and from a particular user in a single query. Personally, I'd split these into individual queries to simplify the code/queries.
Here is my query:
SELECT test.id, test.from, test.to, test.message, test.sent, test.read, test.direction FROM ( SELECT * FROM test WHERE test.to = 244975 AND test.direction not in (2,3) AND ( (test.from = 204177 AND test.id > 5341203) OR (test.from = 214518 AND test.id > 5336549) OR (test.from = 231429 AND test.id > 5338284) OR (test.from = 242739 AND test.id > 5339541) OR (test.from = 243834 AND test.id > 5340438) OR (test.from = 244354 AND test.id > 5337489) OR (test.from = 244644 AND test.id > 5338572) OR (test.from = 244690 AND test.id > 5338467) ) UNION SELECT * FROM test WHERE test.from = 244975 AND test.direction not in (1,3) AND ( (test.to = 204177 AND test.id > 5341203) OR (test.to = 214518 AND test.id > 5336549) OR (test.to = 231429 AND test.id > 5338284) OR (test.to = 242739 AND test.id > 5339541) OR (test.to = 243834 AND test.id > 5340438) OR (test.to = 244354 AND test.id > 5337489) OR (test.to = 244644 AND test.id > 5338572) OR (test.to = 244690 AND test.id > 5338467) ) UNION SELECT * FROM test WHERE test.read != 1 AND test.direction not in (2,3) AND test.to = 244975 AND test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) UNION SELECT * FROM test WHERE test.read != 1 AND test.direction = 2 AND test.from = 244975 AND test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) ) test ORDER BY test.id; Answers 4
This is probably due to the additional level of nesting / complexity the in condition on an additional column adds to your where clause.
The index merge union sort your 2nd query uses converts the where clause into a set of range conditions combined by OR.
Each value you compare using in counts as another range predicate, so adding the two inconditions with 8 values each to your first query adds 64 more predicates.
As the number of predicates goes up, at some point the optimizer decides it is faster to just scan the whole table.
Answers 5
I am not sure why it does not work right. What am I missing in the indexes?
I'm pretty sure the query planner is working perfectly, you're not missing anything in the indexes that would help in this case. The query planner decided that it would be faster to use a different index because the two queries are very different.
We can make the optimizer use a union of the indexes for us which will make it considerably faster. You can keep the not in and not change any of the or statements. I ran some basic benchmarks of the method I used against the union method. Caveats apply because your DB configuration may be vastly different than mine. Running the query a 1000 times and doing this 3 times I took the best time for each query...
Optimized above
real 0m15.410s user 0m6.681s sys 0m2.641s rewritten as a set of unions
real 0m17.747s user 0m6.798s sys 0m2.812s NOT IN vs IN
not in and in are very different. The difference between these in this case is access pattern, do I need the data temporarily or as part of the result set. When you use not in with a few keys and the index holds millions of keys it might need to read lots of records if the data is part of the result set. Even when using indexes not in can be reading millions of records from disk... in with a few keys and these are the keys you need is asking to find and use a small subset. The two access patterns are very different. The following example might help make this clear...
1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index. 2. I only want these 10 from a 1,000,000 records. This might only require one disk seek. Number 2 is faster because the of the access pattern ie I found the 10 I need, Nunmber 1. may need to read a million records.
MySQL's query optimizer is seeing this ie the last two OR statements are asking for large subsets of data from either the table or the index ie case 1. above. Seeing that and the fact that it needed to use the primary key anyway the optimizer decided it would be quicker to use the primary key.
When you remove the not in things change ie now the query planner can use the indexes because in the other two or clauses they're in effect get me the few from the many and it performs an index_merge on the two keys that share a to and a from column along with the id.
To see what I mean don't remove the 'not in' part of the query, change it to in to see what happens, on my machine the query plan changed to use a range index.
What follows is an acute example of what I'm describing above and how to take advantage of it ie how do we let the optimizer know we can work with less data.
Think like the optimizer and work with less data
The following SQL is several orders of magnitude faster in tests on a ~4 million row database. The key change is the following line
(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test This one line is vastly reducing the dataset that mysql needs to work on because we're using in instead of not in. This is the new query.
select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test where ( (test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and ( (test.from_ = 204177 and test.id > 5341203) OR (test.from_ = 214518 and test.id > 5336549) OR (test.from_ = 231429 and test.id > 5338284) OR (test.from_ = 242739 and test.id > 5339541) OR (test.from_ = 243834 and test.id > 5340438) OR (test.from_ = 244354 and test.id > 5337489) OR (test.from_ = 244644 and test.id > 5338572) OR (test.from_ = 244690 and test.id > 5338467) ) ) or (test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and ( (test.to_ = 204177 and test.id > 5341203) OR (test.to_ = 214518 and test.id > 5336549) OR (test.to_ = 231429 and test.id > 5338284) OR (test.to_ = 242739 and test.id > 5339541) OR (test.to_ = 243834 and test.id > 5340438) OR (test.to_ = 244354 and test.id > 5337489) OR (test.to_ = 244644 and test.id > 5338572) OR (test.to_ = 244690 and test.id > 5338467) )) or (test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975 and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) or (test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) ) order by test.id; The explain plan for this looks very different...
mysql> \. sql_fixed.sql *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 226 filtered: 100.00 Extra: Using where; Using filesort *************************** 2. row *************************** id: 2 select_type: DERIVED table: test type: index_merge possible_keys: one,two key: two,one key_len: 4,4 ref: NULL rows: 226 filtered: 100.00 Extra: Using sort_union(two,one); Using where 2 rows in set, 1 warning (0.01 sec) Note the rows column is now 266. The optimizer being smart immediately can see that it doesn't need most of the data because up front we've told it to use an IN statement with a few keys. Most query optimizers attach a high cost to disk accesses so anything that reduces this it typically preferred by the optimizer.
0 comments:
Post a Comment