I need to improve the search engine of my project.
The general idea: it's a system where clients can find easy available cleaning companies.
- Table "clients" - general table with all clients
- Table "companies" - general table with all cleaning companies
- Table "calendar" - here all companies can specify one or many different DATETIME ranges when they DO NOT take client requests
Here are the mysql tables
http://sqlfiddle.com/#!9/5c7034/1
The system has almost ~ 10000 clients and ~ 500 companies.
By default, all companies are always available for client requests. That's why we are using this calendar, where the companies can disable ( put date time ranges in future as 'not working' ) and the clients wont be able to make request if a company already specified that their services are closed on the choose date time period by the client.
If a client wants cleaning services on 2017-10-24 12:00:00 ( for 2 hours ) 1. he choose date : 2017-10-24 and start time 12:00:00 2. he specifies duration : 2 hours 3. the system must find all companies which are available between 2017-10-24 12:00:00 and 2017-10-24 14:00:00.
AS you can see the example data in table "calendar" now, the only available result should be Company 2, because they are working on the date choosen by the client ( 2017-10-24 12:00:00 ) .
My problem is that I'm forced to:
To find all companies who cannot take the client request between 2017-10-24 12:00:00 and 2017-10-24 14:00:00
To use the result of the first query with NOT IN where clause and finally to find the available companies.
SELECT * FROM companies WHERE companies.company_id NOT IN ( **SELECT calendar.company_id FROM calendar WHERE calendar.not_available_from = '2017-10-24 12:00:00' OR calendar.not_available_from = '2017-10-24 14:00:00'** );
So, is there a better way to find these available companies ?
All of the companies right now are constantly modifying their closed( not working ) datetime ranges in future and this table calendar is huge, but without this table I cannot exclude the companies who cannot handle client requests.
Any advice about the current logic is very welcome.
Obligatory things to keep: all companies are always available by default for the clients and that's why they need manually to keep up to date when they do not want to work.
Everything else can be modified ...
Thanks!
4 Answers
Answers 1
If I understand correctly, you can do:
SELECT c.* FROM companies c WHERE NOT EXISTS (SELECT 1 FROM calendar ca WHERE ca.company_id = c.company_id AND ca.not_available_to >= '2017-10-24 12:00:00' AND ca.not_available_from <= '2017-10-24 14:00:00' );
This is the correct way to implement an overlap. The above assumes that the end points are inclusive. So if a not-available period ends at 2017-10-24 12:00:00, then then the company would be filtered out. Change <=
to <
and >=
to >
for exclusive ranges.
For performance, you want an index on calendar(company_id, not_available_to, not_available_from)
.
Answers 2
You may use an outer join and then check for the NULL values like this
select cs.company_id, cs.company_name from companies cs left join calendar cr on cr.company_id = cs.company_id AND cr.not_available_to >= '2017-10-24 12:00:00' AND cr.not_available_from <= '2017-10-24 14:00:00' where cr.company_id is null
As mentioned by Gordon the following index is necessary on calendar
table for reasonable performance:
create index ix_calendar_company_notavailable on calendar(company_id, not_available_to, not_available_from)
The InnoDB already has a clustered index on companies.company_id
since it is a primary key, therefore, the query should be quite fast.
Answers 3
--To find all companies who cannot take the client request between --2017-10-24 12:00:00 and 2017-10-24 14:00:00 SELECT * FROM #companies WHERE company_id NOT IN ( --overlap of start time/end time with --availability and 'not working' SELECT company_id FROM #calendar WHERE not_available_from <= '2017-10-24 14:00:00' --@endtime AND not_available_to >= '2017-10-24 12:00:00' --@starttime AND type = 'not working' )
The above selects the company names that do not have an unavailability overlapping the start and endtime where the record type is 'not working'. I put 'not working' because I notice in the sqlfiddle you are putting client requests in the same table and putting a record type 'client request'.
Anyway...
In your question you ask if there's a different way to do it other than NOT IN. You could do NOT EXISTS equivalent, but the Actual Execution Plan is identical hence performance is identical.
Answers 4
Try using between on dates when you can and notice that it gets different results too.
SELECT * FROM companies WHERE companies.company_id NOT IN ( SELECT calendar.company_id FROM calendar WHERE calendar.not_available_from BETWEEN '2017-10-24 12:00:00' AND '2017-10-24 12:00:00' OR calendar.not_available_from BETWEEN '2017-10-24 14:00:00' AND '2017-10-24 14:00:00' );
This thread linked below has a good explanation of why using between is good, and is generally informative. Aside from what they say and they example they give, MySQL and SQL-Server (to my knowledge) will also handle date formats much better if you use between.
0 comments:
Post a Comment