SQL: Comparing two tables for missing records and then on the date fields

I have two tables as below

work_assignments

emp_id   | start_date  |   End Date ------------------------------------------   1      | May-10-2017 | May-30-2017   1      | Jun-05-2017 | null   2      | May-08-2017 | null

hourly_pay

emp_id   | start_date  |   End Date    |  Rate -----------------------------------------------   1      | May-20-2017 | Jun-30-2017   |  75   1      | Jul-01-2017 | null          |  80

These 2 tables share the emp_id (employee id) foreign key and joining these two I should be able to:

find employee records missing in the hourly_pay table. Given the data here, the query should return emp_id 2 from work_assignments table
find the records where the hourly_pay start_date that are later than the work assignments start_date. Again, given the data here, the query should return emp_id 1 (because work_assignments.start_date has May-10-2017, while the earliest hourly_pay.start_date is on May-20-2017)

I am able to achieve the first part of result using the join query below

select distinct emp_id from work_contracts left join hourly_pay hr USING(emp_id) where hr.emp_id is null

I am stuck on the second part where probably I need a correlated subquery to tell the hourly pay table records that did not start before the work_assignments start_date? or is there any other way?

12 Answers

Answers 1

Do the date comparison in an inner query then wrap it to filter it to the ones that satisfy the late pay criteria.

select * from (     select distinct c.emp_id,          case when c.start_date < hr.start_date then 1 else 0 end as latePay     from work_contracts c         left join hourly_pay hr USING(emp_id) ) result where latePay = 1

Answers 2

You can achieve second part using query

 select distinct wc.emp_id   from (select emp_id, min(start_date) start_date from work_contracts group by emp_id) wc  join (select emp_id, min(start_date) start_date from hourly_pay group by emp_id) hr      on wc.emp_id = hr.emp_id  where wc.start_date < hr.start_date

Answers 3

This hints at a between condition, with some twists, but I've had extremely bad luck using betweens in joins. They appear to perform some form of cross-join on the back and end then filter out the actual join where-clause style. I know that's not very technical, but I've never done a non-equality condition in a join that's turned out well.

So, this may seem counter-intuitive, but I think exploding all date possibilities might actually be your best bet. Without knowing how big your date ranges actually are it's hard to say.

Also, I think this will actually satisfy both conditions in your question at once -- by telling you all work assignments that do not have corresponding pay rates.

Try this against your actual data and see how it works (and how long it takes).

with pay_dates as (   select     emp_id, rate,     generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd   from hourly_pay ), assignment_dates as (   select     emp_id, start_date,     generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd   from work_assignments ) select   emp_id, min (wd)::date as from_date,   max (wd)::date as thru_date from   assignment_dates a where   not exists (     select null     from pay_dates p     where p.emp_id = a.emp_id     and a.wd = p.pd   ) group by   emp_id, start_date

The results should be all work assignment ranges with no rates:

emp     from             thru 1    '2017-05-10'    '2017-05-19' 2    '2017-05-08'    '2017-11-14'

The cool thing is it would also remove any overlaps, where a work assignment was partially covered.

-- Edit 3/20/2018 --

Per your request, here is a break-down of what the logic does.

with pay_dates as(   select     emp_id, rate,     generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd   from hourly_pay )

This takes the hourly_pay data and breaks it into a record for each employee, for each day:

emp_id    rate    pay date 1         75      5/20/17 1         75      5/21/17 1         75      5/22/17 ... 1         75      6/30/17 1         80      6/01/17 1         80      6/02/17 ... 1         80      today

Next,

[implied "with"] assignment_dates as (   select     emp_id, start_date,     generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd   from work_assignments )

Effectively does the same thing for the work assignments table, only preserving the "start date column" in each row.

Then the main query is this:

select   emp_id, min (wd)::date as from_date,   max (wd)::date as thru_date from   assignment_dates a where   not exists (     select null     from pay_dates p     where p.emp_id = a.emp_id     and a.wd = p.pd   ) group by   emp_id, start_date

Which draws from the two queries above. The important part is the anti-join:

not exists (   select null   from pay_dates p   where p.emp_id = a.emp_id   and a.wd = p.pd )

That identifies every work assignment where there is no corresponding record for that employee, for that day.

So in essence, the query takes the data ranges from both tables, comes up with every possible date combination and then does an anti-join to see where they don't match.

While it seems counterintuitive, to take a single record and blow it up into multiple records, two things to consider:

Dates are very bounded creatures -- even in 10 years worth of data that only constitutes 4,000 or so records, which isn't much to a database, even when multiplied by an employee database. Your time frame looks much less than that.
I've had very, VERY bad luck using joins other than =, for example between or >. It seems in the background it does cartesians and then filters the results. By comparison, exploding the ranges at least gives you some control over how much data explosion occurs.

For grins, I did it with your sample data above and came up with this, which actually looks accurate:

1   '2017-05-10'    '2017-05-19' 2   '2017-05-08'    '2018-03-20'

Let me know if any of that is unclear.

Answers 4

I would use not exists/exists:

select wa.empid from work_assignments wa where not exists (select 1 from hourly_pay hp where wa.emp_id = hp.emp_id);

and for the second:

select wa.* from work_assignments wa where not exists (select 1                   from hourly_pay hp                   where wa.emp_id = hp.emp_id and ep.start_date <= wp.start_date                  );

The question is very particular on (2). However, I would expect that you would want hourly pay for the entire period of the assignment, not just the start date. If that is the case, then the OP should ask a new qustion.

Answers 5

You can solve this with using the daterange type (because, what you basically want is the missing ranges in hourly_pay table.).

I used the following operators in it:

+ range union
- range subtraction
&& test for range intersection
@> test for range containment

With these and a simple left join, you can write a query to find out which ranges are missing in the hourly_pay table.

select     wa.emp_id, lower(dr) start_date, upper(dr) - 1 end_date from       work_assignments wa left join  hourly_pay hp on wa.emp_id = hp.emp_id and        daterange(wa.start_date, wa.end_date, '[]') && daterange(hp.start_date, hp.end_date, '[]') cross join lateral (select case                       when hp is null then daterange(wa.start_date, wa.end_date, '[]')                       else daterange(wa.start_date, wa.end_date, '[]')                          + daterange(hp.start_date, hp.end_date, '[]')                          - daterange(hp.start_date, hp.end_date, '[]')                     end dr) dr where      not exists (select 1                        from   hourly_pay p                        where  p.emp_id = wa.emp_id                        and    daterange(p.start_date, p.end_date, '[]') @> dr)  -- emp_id | start_date | end_date ----------+------------+------------- -- 1      | 2017-05-01 | 2017-05-19 -- 2      | 2017-05-08 | (null)

http://sqlfiddle.com/#!17/4bac0/14

Answers 6

Maybe I am a little caught up by the wording, but would this not suffice? This would return any emp_id where there is a record for which the hourly start date is after a work assignment start date

select distinct wc.emp_id from work_contracts wc left join hourly_pay hr USING(emp_id) where hr.start_date > wc.start_date

Answers 7

select distinct p.emp_id <br> from hourly_pay p <br> join work_assignments w on p.emp_id = w.emp_id <br> where p.start_date < w.start_date <br>

Based on the stated requirement in the original question: find the records where the hourly_pay start_date that are later than the work assignments start_date. Again, given the data here, the query should return emp_id 1 (because work_assignments.start_date has May-10-2017, while the earliest hourly_pay.start_date is on May-20-2017)

This means to me that they only want the employee id number.

Answers 8

Second query is very simple,

Try below query

select distinct h.emp_id  from work_assignments w inner join hourly_pay h  on  w.emp_id = h.emp_id and h.start_date > w.start_date;

Answers 9

Looking at your data, I can make following assumptions:

1) There can be max one record for an employee that has end_date as null this condition applied to both tables.

2) Multiple records dates for same employee don't overlap When employee has multiple records (like Emp 1) , he/she can't have dates like [jan 1 - feb 1] and next record as [jan 15-feb 20] or [jan 15 - null] (they must be for non overlapping periods).

With these in mind, below query should work for you.

SELECT hourly_pay.* FROM work_assignments INNER JOIN hourly_pay  USING(emp_id) WHERE hourly_pay.start_date > work_assignments.start_date         AND ( hourly_pay.start_date < work_assignments.end_date              OR (work_assignments.end_date is null                    AND hourly_pay.end_date is null) );

Explanation: The query joins both tables on emp_id then filters records that

1) Have start_date in hourly_pay > start_date in work_assignments

-AND-

2) Have start_date in hourly_pay < end_date in work_assignments (This is needed, so we can avoid comparing un-related time period records from both tables

-OR-

End dates of both table records are null, using assumption 1 (stated above) there can be max one record for an employee that has end_date as null.

Based on your data, this query should return both records of EMP 1 in hourly_pay as start_date there is > start_date in work_assignments.

If you just need list of EMP IDs you can just select that column SELECT DISTINCT hourly_pay.emp_id ...(rest of the query)

Answers 10

http://sqlfiddle.com/#!17/f4595/1

Records missing in hourly_pay table;

Instead of using left join and then filtering null valued records, I suggest you to use not exists, It will work way faster.

    SELECT w.emp_id, 'missing in the hourly_pay table' FROM work_assignments w     WHERE NOT exists (SELECT 1 FROM hourly_pay h WHERE h.emp_id = w.emp_id)

Records hourly_pay start_date is later than the work assignment start_date;

SELECT w.emp_id FROM work_assignments w WHERE NOT exist (     SELECT 1 FROM hourly_pay hp     WHERE         hp.start_date < w.start_date AND w.emp_id = hp.emp_id )

Second query actually includes the results from first query, so you can merge them like below:

SELECT     w.emp_id,     (CASE WHEN ( EXISTS             (SELECT 1 FROM hourly_pay h                 WHERE                     h.emp_id = w.emp_id ) )            THEN             'hourly_pay start_date is later'           ELSE             'missing in the hourly_pay table'           END) FROM     work_assignments w WHERE     NOT EXISTS (         SELECT             1         FROM             hourly_pay hp         WHERE             hp.start_date < w.start_date         AND w.emp_id = hp.emp_id     )

Answers 11

this will do the job nicely.

SELECT DISTINCT emp_id  FROM work_assingment  JOIN hourly_pay hr USING(emp_id) WHERE hr.start_date < work_assingment.start_date;

Answers 12

If i understand correctly below query should work;

first join is getting hourly_pays which has bigger date then work assignments
second join is checking until it founds the earliest hour from hourly_pay table

the first left join can be avoided if you dont want to see employees which has no data in hourly_pay table [emp_id = 2]

select h.emp_id,h.start_date from work_assignments hr  left join hourly_pay h on hr.emp_id=h.emp_id and hr.start_date < h.start_date  left join hourly_pay h2 on h2.emp_id = h.emp_id and h.start_date > h2.start_date  where h2.start_date is null

Coding Question

Friday, March 23, 2018

SQL: Comparing two tables for missing records and then on the date fields

work_assignments

hourly_pay

12 Answers

Answers 1

Answers 2

Answers 3

Answers 4

Answers 5

Answers 6

Answers 7

Answers 8

Answers 9

Answers 10

Answers 11

Answers 12

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook