I have two tables as below
work_assignments
emp_id | start_date | End Date ------------------------------------------ 1 | May-10-2017 | May-30-2017 1 | Jun-05-2017 | null 2 | May-08-2017 | null
hourly_pay
emp_id | start_date | End Date | Rate ----------------------------------------------- 1 | May-20-2017 | Jun-30-2017 | 75 1 | Jul-01-2017 | null | 80
These 2 tables share the emp_id (employee id) foreign key and joining these two I should be able to:
- find employee records missing in the hourly_pay table. Given the data here, the query should return emp_id 2 from work_assignments table
- find the records where the hourly_pay start_date that are later than the work assignments start_date. Again, given the data here, the query should return emp_id 1 (because work_assignments.start_date has May-10-2017, while the earliest hourly_pay.start_date is on May-20-2017)
I am able to achieve the first part of result using the join query below
select distinct emp_id from work_contracts left join hourly_pay hr USING(emp_id) where hr.emp_id is null
I am stuck on the second part where probably I need a correlated subquery to tell the hourly pay table records that did not start before the work_assignments start_date? or is there any other way?
12 Answers
Answers 1
Do the date comparison in an inner query then wrap it to filter it to the ones that satisfy the late pay criteria.
select * from ( select distinct c.emp_id, case when c.start_date < hr.start_date then 1 else 0 end as latePay from work_contracts c left join hourly_pay hr USING(emp_id) ) result where latePay = 1
Answers 2
You can achieve second part using query
select distinct wc.emp_id from (select emp_id, min(start_date) start_date from work_contracts group by emp_id) wc join (select emp_id, min(start_date) start_date from hourly_pay group by emp_id) hr on wc.emp_id = hr.emp_id where wc.start_date < hr.start_date
Answers 3
This hints at a between
condition, with some twists, but I've had extremely bad luck using betweens in joins. They appear to perform some form of cross-join on the back and end then filter out the actual join where-clause style. I know that's not very technical, but I've never done a non-equality condition in a join that's turned out well.
So, this may seem counter-intuitive, but I think exploding all date possibilities might actually be your best bet. Without knowing how big your date ranges actually are it's hard to say.
Also, I think this will actually satisfy both conditions in your question at once -- by telling you all work assignments that do not have corresponding pay rates.
Try this against your actual data and see how it works (and how long it takes).
with pay_dates as ( select emp_id, rate, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd from hourly_pay ), assignment_dates as ( select emp_id, start_date, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd from work_assignments ) select emp_id, min (wd)::date as from_date, max (wd)::date as thru_date from assignment_dates a where not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd ) group by emp_id, start_date
The results should be all work assignment ranges with no rates:
emp from thru 1 '2017-05-10' '2017-05-19' 2 '2017-05-08' '2017-11-14'
The cool thing is it would also remove any overlaps, where a work assignment was partially covered.
-- Edit 3/20/2018 --
Per your request, here is a break-down of what the logic does.
with pay_dates as( select emp_id, rate, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd from hourly_pay )
This takes the hourly_pay data and breaks it into a record for each employee, for each day:
emp_id rate pay date 1 75 5/20/17 1 75 5/21/17 1 75 5/22/17 ... 1 75 6/30/17 1 80 6/01/17 1 80 6/02/17 ... 1 80 today
Next,
[implied "with"] assignment_dates as ( select emp_id, start_date, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd from work_assignments )
Effectively does the same thing for the work assignments table, only preserving the "start date column" in each row.
Then the main query is this:
select emp_id, min (wd)::date as from_date, max (wd)::date as thru_date from assignment_dates a where not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd ) group by emp_id, start_date
Which draws from the two queries above. The important part is the anti-join:
not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd )
That identifies every work assignment where there is no corresponding record for that employee, for that day.
So in essence, the query takes the data ranges from both tables, comes up with every possible date combination and then does an anti-join to see where they don't match.
While it seems counterintuitive, to take a single record and blow it up into multiple records, two things to consider:
Dates are very bounded creatures -- even in 10 years worth of data that only constitutes 4,000 or so records, which isn't much to a database, even when multiplied by an employee database. Your time frame looks much less than that.
I've had very, VERY bad luck using joins other than =, for example
between
or>
. It seems in the background it does cartesians and then filters the results. By comparison, exploding the ranges at least gives you some control over how much data explosion occurs.
For grins, I did it with your sample data above and came up with this, which actually looks accurate:
1 '2017-05-10' '2017-05-19' 2 '2017-05-08' '2018-03-20'
Let me know if any of that is unclear.
Answers 4
I would use not exists
/exists
:
select wa.empid from work_assignments wa where not exists (select 1 from hourly_pay hp where wa.emp_id = hp.emp_id);
and for the second:
select wa.* from work_assignments wa where not exists (select 1 from hourly_pay hp where wa.emp_id = hp.emp_id and ep.start_date <= wp.start_date );
The question is very particular on (2). However, I would expect that you would want hourly pay for the entire period of the assignment, not just the start date. If that is the case, then the OP should ask a new qustion.
Answers 5
You can solve this with using the daterange
type (because, what you basically want is the missing ranges in hourly_pay
table.).
I used the following operators in it:
+
range union-
range subtraction&&
test for range intersection@>
test for range containment
With these and a simple left join
, you can write a query to find out which ranges are missing in the hourly_pay
table.
select wa.emp_id, lower(dr) start_date, upper(dr) - 1 end_date from work_assignments wa left join hourly_pay hp on wa.emp_id = hp.emp_id and daterange(wa.start_date, wa.end_date, '[]') && daterange(hp.start_date, hp.end_date, '[]') cross join lateral (select case when hp is null then daterange(wa.start_date, wa.end_date, '[]') else daterange(wa.start_date, wa.end_date, '[]') + daterange(hp.start_date, hp.end_date, '[]') - daterange(hp.start_date, hp.end_date, '[]') end dr) dr where not exists (select 1 from hourly_pay p where p.emp_id = wa.emp_id and daterange(p.start_date, p.end_date, '[]') @> dr) -- emp_id | start_date | end_date ----------+------------+------------- -- 1 | 2017-05-01 | 2017-05-19 -- 2 | 2017-05-08 | (null)
Answers 6
Maybe I am a little caught up by the wording, but would this not suffice? This would return any emp_id where there is a record for which the hourly start date is after a work assignment start date
select distinct wc.emp_id from work_contracts wc left join hourly_pay hr USING(emp_id) where hr.start_date > wc.start_date
Answers 7
select distinct p.emp_id <br> from hourly_pay p <br> join work_assignments w on p.emp_id = w.emp_id <br> where p.start_date < w.start_date <br>
Based on the stated requirement in the original question: find the records where the hourly_pay
start_date
that are later than the work assignments
start_date
. Again, given the data here, the query should return emp_id
1 (because work_assignments.start_date
has May-10-2017, while the earliest hourly_pay.start_date
is on May-20-2017)
This means to me that they only want the employee id number.
Answers 8
Second query is very simple,
Try below query
select distinct h.emp_id from work_assignments w inner join hourly_pay h on w.emp_id = h.emp_id and h.start_date > w.start_date;
Answers 9
Looking at your data, I can make following assumptions:
1) There can be max one record for an employee that has end_date as null this condition applied to both tables.
2) Multiple records dates for same employee don't overlap When employee has multiple records (like Emp 1) , he/she can't have dates like [jan 1 - feb 1] and next record as [jan 15-feb 20] or [jan 15 - null] (they must be for non overlapping periods).
With these in mind, below query should work for you.
SELECT hourly_pay.* FROM work_assignments INNER JOIN hourly_pay USING(emp_id) WHERE hourly_pay.start_date > work_assignments.start_date AND ( hourly_pay.start_date < work_assignments.end_date OR (work_assignments.end_date is null AND hourly_pay.end_date is null) );
Explanation: The query joins both tables on emp_id then filters records that
1) Have start_date in hourly_pay > start_date in work_assignments
-AND-
2) Have start_date in hourly_pay < end_date in work_assignments (This is needed, so we can avoid comparing un-related time period records from both tables
-OR-
End dates of both table records are null, using assumption 1 (stated above) there can be max one record for an employee that has end_date as null.
Based on your data, this query should return both records of EMP 1 in hourly_pay as start_date there is > start_date in work_assignments.
If you just need list of EMP IDs you can just select that column SELECT DISTINCT hourly_pay.emp_id ...(rest of the query)
Answers 10
http://sqlfiddle.com/#!17/f4595/1
- Records missing in hourly_pay table;
Instead of using left join and then filtering null valued records, I suggest you to use not exists
, It will work way faster.
SELECT w.emp_id, 'missing in the hourly_pay table' FROM work_assignments w WHERE NOT exists (SELECT 1 FROM hourly_pay h WHERE h.emp_id = w.emp_id)
Records hourly_pay start_date is later than the work assignment start_date;
SELECT w.emp_id FROM work_assignments w WHERE NOT exist ( SELECT 1 FROM hourly_pay hp WHERE hp.start_date < w.start_date AND w.emp_id = hp.emp_id )
Second query actually includes the results from first query, so you can merge them like below:
SELECT w.emp_id, (CASE WHEN ( EXISTS (SELECT 1 FROM hourly_pay h WHERE h.emp_id = w.emp_id ) ) THEN 'hourly_pay start_date is later' ELSE 'missing in the hourly_pay table' END) FROM work_assignments w WHERE NOT EXISTS ( SELECT 1 FROM hourly_pay hp WHERE hp.start_date < w.start_date AND w.emp_id = hp.emp_id )
Answers 11
this will do the job nicely.
SELECT DISTINCT emp_id FROM work_assingment JOIN hourly_pay hr USING(emp_id) WHERE hr.start_date < work_assingment.start_date;
Answers 12
If i understand correctly below query should work;
first join is getting hourly_pays which has bigger date then work assignments
second join is checking until it founds the earliest hour from hourly_pay table
the first left join can be avoided if you dont want to see employees which has no data in hourly_pay table [emp_id = 2]
select h.emp_id,h.start_date from work_assignments hr left join hourly_pay h on hr.emp_id=h.emp_id and hr.start_date < h.start_date left join hourly_pay h2 on h2.emp_id = h.emp_id and h.start_date > h2.start_date where h2.start_date is null
0 comments:
Post a Comment