Wednesday, November 1, 2017

Mysql, SELECT only available companies and exclude the rest of the companies by one or more datetime ranges ( when they do not work )

Leave a Comment

I need to improve the search engine of my project.

The general idea: it's a system where clients can find easy available cleaning companies.

  1. Table "clients" - general table with all clients
  2. Table "companies" - general table with all cleaning companies
  3. Table "calendar" - here all companies can specify one or many different DATETIME ranges when they DO NOT take client requests

Here are the mysql tables

http://sqlfiddle.com/#!9/5c7034/1

The system has almost ~ 10000 clients and ~ 500 companies.

By default, all companies are always available for client requests. That's why we are using this calendar, where the companies can disable ( put date time ranges in future as 'not working' ) and the clients wont be able to make request if a company already specified that their services are closed on the choose date time period by the client.

If a client wants cleaning services on 2017-10-24 12:00:00 ( for 2 hours ) 1. he choose date : 2017-10-24 and start time 12:00:00 2. he specifies duration : 2 hours 3. the system must find all companies which are available between 2017-10-24 12:00:00 and 2017-10-24 14:00:00.

AS you can see the example data in table "calendar" now, the only available result should be Company 2, because they are working on the date choosen by the client ( 2017-10-24 12:00:00 ) .

My problem is that I'm forced to:

  1. To find all companies who cannot take the client request between 2017-10-24 12:00:00 and 2017-10-24 14:00:00

  2. To use the result of the first query with NOT IN where clause and finally to find the available companies.

 SELECT     * FROM     companies WHERE     companies.company_id NOT IN (      **SELECT         calendar.company_id     FROM         calendar     WHERE         calendar.not_available_from = '2017-10-24 12:00:00'     OR         calendar.not_available_from = '2017-10-24 14:00:00'** ); 

So, is there a better way to find these available companies ?

All of the companies right now are constantly modifying their closed( not working ) datetime ranges in future and this table calendar is huge, but without this table I cannot exclude the companies who cannot handle client requests.

Any advice about the current logic is very welcome.

Obligatory things to keep: all companies are always available by default for the clients and that's why they need manually to keep up to date when they do not want to work.

Everything else can be modified ...

Thanks!

4 Answers

Answers 1

If I understand correctly, you can do:

SELECT c.* FROM companies c WHERE NOT EXISTS (SELECT 1                   FROM calendar ca                   WHERE ca.company_id = c.company_id AND                         ca.not_available_to >= '2017-10-24 12:00:00' AND                         ca.not_available_from <= '2017-10-24 14:00:00'                  ); 

This is the correct way to implement an overlap. The above assumes that the end points are inclusive. So if a not-available period ends at 2017-10-24 12:00:00, then then the company would be filtered out. Change <= to < and >= to > for exclusive ranges.

For performance, you want an index on calendar(company_id, not_available_to, not_available_from).

Answers 2

You may use an outer join and then check for the NULL values like this

select cs.company_id,         cs.company_name from companies cs left join calendar cr on cr.company_id = cs.company_id AND                          cr.not_available_to >= '2017-10-24 12:00:00' AND                          cr.not_available_from <= '2017-10-24 14:00:00' where cr.company_id is null 

As mentioned by Gordon the following index is necessary on calendar table for reasonable performance:

create index ix_calendar_company_notavailable on        calendar(company_id, not_available_to, not_available_from) 

The InnoDB already has a clustered index on companies.company_id since it is a primary key, therefore, the query should be quite fast.

Answers 3

--To find all companies who cannot take the client request between  --2017-10-24 12:00:00 and 2017-10-24 14:00:00 SELECT *   FROM #companies  WHERE company_id          NOT IN ( --overlap of start time/end time with                   --availability and 'not working'                  SELECT company_id                    FROM #calendar                   WHERE not_available_from <= '2017-10-24 14:00:00' --@endtime                     AND not_available_to >= '2017-10-24 12:00:00'   --@starttime                     AND type = 'not working'                ) 

The above selects the company names that do not have an unavailability overlapping the start and endtime where the record type is 'not working'. I put 'not working' because I notice in the sqlfiddle you are putting client requests in the same table and putting a record type 'client request'.

Anyway...

In your question you ask if there's a different way to do it other than NOT IN. You could do NOT EXISTS equivalent, but the Actual Execution Plan is identical hence performance is identical.

Answers 4

Try using between on dates when you can and notice that it gets different results too.

SELECT     * FROM     companies WHERE     companies.company_id NOT IN (      SELECT         calendar.company_id     FROM         calendar     WHERE         calendar.not_available_from BETWEEN '2017-10-24 12:00:00' AND '2017-10-24 12:00:00'     OR         calendar.not_available_from BETWEEN '2017-10-24 14:00:00' AND '2017-10-24 14:00:00' ); 

This thread linked below has a good explanation of why using between is good, and is generally informative. Aside from what they say and they example they give, MySQL and SQL-Server (to my knowledge) will also handle date formats much better if you use between.

Why use the BETWEEN operator when we can do without it?

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment