I have a MySql table which is populating daily with price values. Each day it records an entry even if the price hasn't changed. I want to delete some rows that repeat too much. I want to keep the first price and the last price before there is a price change.
Example 1)
id name price date 1 Product1 $6 13/07/2017 2 Product1 $6 14/07/2017 3 Product1 $6 15/07/2017 4 Product1 $7 16/07/2017 5 Product1 $6 17/07/2017 6 Product1 $6 18/07/2017 7 Product1 $6 19/07/2017
From that list records with id 2 and 6 should be deleted with the following result:
id name price date 1 Product1 $6 13/07/2017 3 Product1 $6 15/07/2017 4 Product1 $7 16/07/2017 5 Product1 $6 17/07/2017 7 Product1 $6 19/07/2017
Example 2)
id name price date 1 Product1 $6 13/07/2017 2 Product1 $6 14/07/2017 3 Product1 $6 15/07/2017 4 Product1 $6 16/07/2017 5 Product1 $6 17/07/2017 6 Product1 $6 18/07/2017 7 Product1 $6 19/07/2017
Here there is no price change so I can delete all records from 2 to 6:
id name price date 1 Product1 $6 13/07/2017 7 Product1 $6 19/07/2017
Id is not supposed to be one incremental and also date is not daily on daily basis.
15 Answers
Answers 1
You can do this with some creative self-join logic.
Think of three hypothetical rows in the table.
- Row a you want to keep.
- Row b has the same product name and price, and a date 1 day after a. You want to delete this.
- Row c has the same product name and price, and a date 1 day after b. You want to keep this.
So if you can do a self-join to match these three rows, then delete row b.
DELETE b FROM MyTable AS a JOIN MyTable AS b ON a.name=b.name AND a.price=b.price AND a.date=b.date + INTERVAL 1 DAY JOIN MyTable AS c ON b.name=c.name AND b.price=c.price AND b.date=c.date + INTERVAL 1 DAY;
This works even if there are multiple rows that fit the conditions for row b. It'll delete the first one, and then continue to delete subsequent rows that also fit the conditions.
This works if you use the DATE
data type and store your dates as 'YYYY-MM-DD', not 'DD-MM-YYYY'. You should do this anyway.
Answers 2
You want to delete the rows where the product name and price are the same as the rows with the date plus/minus one day.
DELETE row_mid FROM record_table AS row_mid JOIN record_table AS row_prev JOIN record_table AS row_next WHERE row_mid.name = row_prev.name AND row_mid.price = row_prev.price AND row_mid.date = DATE_SUB(row_prev.date, INTERVAL 1 DAY) AND row_mid.name = row_next.name AND row_mid.price = row_next.price AND row_mid.date = DATE_ADD(row_next.date, INTERVAL 1 DAY);
Answers 3
Is your MySQL new enough to support CTE? This is a pretty interesting problem I've seen with date scheduling. The code always looks awkward. To check the results without a delete you can switch the comment mark with the select and delete and comment out the t.[Name] is null line.
WITH
cte AS ( SELECT a.ID , a.[Name] , a.[Date] , a.Price , NextDate = max(npc.[Date]) -- Next Price change , PrevDate = max(lpc.[Date]) -- Next Price change FROM yourTable as a -- Base Table LEFT JOIN yourTable as npc -- Looking for Next Price Change ON a.[Name] = npc.[Name] and a.[Date] < npc.[Date] and a.Price <> npc.Price LEFT JOIN yourTable as lpc -- Looking for Last Price Change ON a.[Name] = lpc.[Name] and a.[Date] > lpc.[Date] and a.Price <> lpc.Price GROUP BY a.ID, a.[Name], a.[Date], a.Price ) ----SELECT f.*, [Check] = CASE WHEN t.[Name] is null THEN 'DELETE' ELSE '' END DELETE f FROM yourTable as f LEFT JOIN ( SELECT [Name], [GoodDate] = Max([Date]) FROM cte GROUP BY [Name], PrevDate UNION SELECT [Name], [GoodDate] = Min([Date]) FROM cte GROUP BY [Name], PrevDate UNION SELECT [Name], [GoodDate] = Max([Date]) FROM cte GROUP BY [Name], NextDate UNION SELECT [Name], [GoodDate] = Min([Date]) FROM cte GROUP BY [Name], NextDate ) as t ON t.[Name] = f.[Name] and t.[GoodDate] = f.[Date] WHERE t.[Name] is null --ORDER BY f.[Name], f.[Date]
Answers 4
MySQL doesn't support lead() and lag() but you can simulate these functions with variables and delete all rows where both the previous and next price is the same as the row's price.
To get the previous price:
select *, if(@prevName = name, @prevPrice, null) as prevPrice, @prevName := name, @prevPrice := price from prices order by name, id
Similarly to get the next price:
select *, if(@nextName = name, @nextPrice, null) as nextPrice, @nextName := name, @nextPrice := price from prices order by name, id desc
Now you can only select ids where the current price is the same as both the next and previous price
select t1.id from ( select *, if(@prevName = name, @prevPrice, null) as prevPrice, @prevName := name, @prevPrice := price from prices order by name, id ) t1 join ( select *, if(@nextName = name, @nextPrice, null) as nextPrice, @nextName := name, @nextPrice := price from prices order by name, id desc ) t2 on t1.id = t2.id and t1.prevPrice = t1.price and t2.nextPrice = t2.price
And to delete
delete from prices where id in ( select ids where previous and next price is the same )
Answers 5
You can detect prev Id
and next Id
, then select rows for deletion:
SELECT * FROM (SELECT *, (SELECT next_id.id FROM a next_id WHERE next_id.id > current.id ORDER BY next_id.id ASC LIMIT 1) as next_id, (SELECT prev_id.id FROM a prev_id WHERE prev_id.id < current.id ORDER BY prev_id.id DESC LIMIT 1) as prev_id FROM a current) t WHERE EXISTS (SELECT 1 FROM a next WHERE next.name = t.name AND t.price = next.price AND next.id=t.next_id) AND EXISTS (SELECT 1 FROM a prev WHERE prev.name = t.name AND t.price = prev.price AND prev.id=t.prev_id)
I tested this queries on both of your examples. Demo.
UPDATE. If the Id
column is not unique then the logic must be corrected from prev Id
+next Id
to prev Date
+next Date
. Anyway general concept will stay the same. Query will looks like this:
SELECT * FROM (SELECT *, (SELECT next_date.date FROM a next_date WHERE next_date.date > current.date AND next_date.name = current.name ORDER BY next_date.date ASC LIMIT 1) as next_date, (SELECT prev_date.date FROM a prev_date WHERE prev_date.date < current.date AND prev_date.name = current.name ORDER BY prev_date.date DESC LIMIT 1) as prev_date FROM a current) t WHERE EXISTS (SELECT 1 FROM a next WHERE next.name = t.name AND t.price = next.price AND next.date=t.next_date) AND EXISTS (SELECT 1 FROM a prev WHERE prev.name = t.name AND t.price = prev.price AND prev.date=t.prev_date)
Demo for second query.
Answers 6
all of your data is repeated, witch one do you want to keep? your explanation is confusing.
you can save oldest data with same price and delete other :
with Ranked as ( select name, price, date, dense_rank() over (partition by name, price, date order by date desc) as DupeCount from Your_table P ) delete R from Ranked R where R.DupeCount <> 1
Answers 7
Well i cannot write the exact code for your scenario but you can write a Function\Procedure and follow this pseudo code
r = allrows tobeDeleted = [] unique = [] for (var i=0;i<rows.length; i++){ unique.push(rows[i]->id); dd = true; while (dd){ if ((rows[i]->price == rows[i+1]->price) AND (rows[i]->name == rows[i+1]->price)){ tobeDeleted.push(rows[i]->id); i++; }else{ dd= false; } } } //tobeDeleted contains ids of rows to be deleted //
Answers 8
Try the below query ,hope it helps you.
(I don't have mysql , I've tried to convert the syntax to my sql-- so I'm sorry if any syntax error.)
(I've tested it on sqlserver it works well and get the result you want)
CREATE TEMPORARY Table tempData (id INT,name varchar(50),price DECIMAL(12,2),date DATE,groupDate DATE) /* get the data grouped by name and price with NewField GroupDate to group every continous dates then save it to temporary table*/ insert into tempData(id,name,price,date,groupDate) select id,name,price,date,DATE_SUB(date, INTERVAL rowNumber DAY)groupDate from( select t1.id,t1.name,t1.price,t1.date,count(*) rowNumber from (select id,name,price,date from yourTable) t1 inner join (select id,name,price,date from yourTable) t2 on t1.name=t2.name and t1.price=t2.price and t1.date>=t2.date group by t1.id,t1.name,t1.price,t1.date ) t delete from yourTable where id not in (select id from ( /* query to order every continous data asscending using the date field */ select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn from tempData firstData left join tempData secondData on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate and firstData.date>=secondData.date group by firstData.id,firstData.name,firstData.price,firstData.date /* query to order every continous data Descending using the date field */ union all select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn from tempData firstData left join tempData secondData on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate and firstData.date<=secondData.date group by firstData.id,firstData.name,firstData.price,firstData.date )allData where rn=1 )
Answers 9
You can use below piece of code. Let me know if it is working.
DELETE FROM record_table WHERE id NOT IN ( (SELECT MIN(id) FROM record_table GROUP BY name, price), (SELECT MAX(id) FROM record_table GROUP BY name, price) )
Answers 10
You can use EXISTS
DELETE FROM test t1 WHERE EXISTS ( SELECT * FROM test t2 WHERE t1.name = t2.name AND t1.price = t2.price AND t1.day = DATE_SUB(t2.DAY, INTERVAL 1 DAY) ) AND EXISTS( SELECT * FROM test t3 WHERE t1.name = t3.name AND t1.price = t3.price AND t1.day = DATE_ADD(t3.DAY, INTERVAL 1 DAY) )
or IN
construct to solve your problem
DELETE FROM test t1 WHERE t1.day IN ( SELECT DATE_SUB(t2.day, INTERVAL 1 DAY) FROM test t2 WHERE t1.NAME = t2.NAME AND t1.price = t2.price ) AND t1.day IN ( SELECT DATE_ADD(t3.day, INTERVAL 1 DAY) FROM test t3 WHERE t1.NAME = t3.NAME AND t1.price = t3.price )
Answers 11
You can use the logic below:
- rank by price
- group by id, name, price
- get min date
- get max date
Following the query and fiddle example:
SET @prev_value = NULL; SET @rank_count = 0; select distinct `name`, `price`, `date` from ( ( select id, name, price, CASE WHEN @prev_value = price THEN @rank_count WHEN @prev_value := price THEN @rank_count := @rank_count + 1 END AS rank, min(`date`) as `date` from `prices` group by `name`, `price`, `rank` ) union distinct ( select id, name, price, CASE WHEN @prev_value = price THEN @rank_count WHEN @prev_value := price THEN @rank_count := @rank_count + 1 END AS rank, max(`date`) as `date` from `prices` group by `name`, `price`, `rank` ) order by `id`, `date` ) as `result`
Answers 12
We have to ask ourselves, when do we have to delete a record?
Answer: A record can be deleted,
if there exists another record, with the same name, with the same price, and an earlier date, while there is no record with the same name, with another price in between the both dates.
and
if there exists another record, with the same name, with the same price, and a later date, while there is no record with the same name, with another price in between the both dates.
Putting both requirements into SQL results in the following:
DELETE FROM PriceTable t WHERE EXISTS ( SELECT * FROM PriceTable tmp1 WHERE t.name = tmp1.name AND t.price = tmp1.price AND t.date > tmp1.date AND NOT EXISTS (SELECT * FROM PriceTable tmp2 WHERE t.name = tmp2.name AND t.price != tmp2.price AND t.date > tmp2.date AND tmp1.date < tmp2.date ) ) AND EXISTS ( SELECT * FROM PriceTable tmp1 WHERE t.name = tmp1.name AND t.price = tmp1.price AND t.date < tmp1.date AND NOT EXISTS (SELECT * FROM PriceTable tmp2 WHERE t.name = tmp2.name AND t.price != tmp2.price AND t.date < tmp2.date AND tmp1.date > tmp2.date ) );
Answers 13
Edit: after further consideration, it appears that it's not possible to solve this problem with user defined variable trickery (take note of the other solutions using these). While I think the below solution will "most likely work 99% of the time," MySQL does not guarantee the order of variable evaluation: link 1 and link 2.
Original answer:
(I am working under the assumptions that products.name
is defined as NOT NULL
and both products.id
and products.price
are not negative [can provide a simple patch if handling negatives, too]).
The query:
SET @one_prior_id := NULL, @one_prior_price := NULL, @one_prior_name := NULL, @two_prior_id := NULL, @two_prior_price := NULL, @two_prior_name := NULL ; SELECT @two_prior_id AS id_to_delete FROM ( SELECT * FROM products ORDER BY name, date ) AS t WHERE IF( ( (name = @one_prior_name) AND (name = @two_prior_name) AND (price = @one_prior_price) AND (price = @two_prior_price) ), ( GREATEST( 1, IFNULL(@two_prior_id := @one_prior_id, 0), IFNULL(@two_prior_price := @one_prior_price, 0), LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)), IFNULL(@one_prior_id := id, 0), IFNULL(@one_prior_price := price, 0), LENGTH(IFNULL(@one_prior_name := name, 0)) ) ), ( LEAST( 0, IFNULL(@two_prior_id := @one_prior_id, 0), IFNULL(@two_prior_price := @one_prior_price, 0), LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)), IFNULL(@one_prior_id := id, 0), IFNULL(@one_prior_price := price, 0), LENGTH(IFNULL(@one_prior_name := name, 0)) ) ) )
The query's return, based on your "Example 1:"
+--------------+ | id_to_delete | +--------------+ | 2 | | 6 | +--------------+
The query's return, based on your "Example 2:"
+--------------+ | id_to_delete | +--------------+ | 2 | | 3 | | 4 | | 5 | | 6 | +--------------+
How the query works:
Do a simple 'partitioning' of the
products
table via ORDER BYLoop over the ordered result set, tracking 2 sets of variables: the 1st set to hold the price and name of the 'one prior' row (the 'one prior' row being directly above the current row) and the 2nd set of variables to hold the 'two prior' row (the 'two prior' row being directly above the 'one prior' row).
The
GREATEST
andLEAST
are identical except that the former returns a value that will evaluate to true for theIF
and the latter will evaluate to false. The real point of these functions is to update our loop variables.See this for more info regarding variable updating within subqueries.
The actual DELETE:
SET @one_prior_id := NULL, @one_prior_price := NULL, @one_prior_name := NULL, @two_prior_id := NULL, @two_prior_price := NULL, @two_prior_name := NULL ; DELETE FROM products WHERE id IN ( SELECT * FROM ( SELECT @two_prior_id AS id_to_delete FROM ( SELECT * FROM products ORDER BY name, date ) AS t1 WHERE IF( ( (name = @one_prior_name) AND (name = @two_prior_name) AND (price = @one_prior_price) AND (price = @two_prior_price) ), ( GREATEST( 1, IFNULL(@two_prior_id := @one_prior_id, 0), IFNULL(@two_prior_price := @one_prior_price, 0), LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)), IFNULL(@one_prior_id := id, 0), IFNULL(@one_prior_price := price, 0), LENGTH(IFNULL(@one_prior_name := name, 0)) ) ), ( LEAST( 0, IFNULL(@two_prior_id := @one_prior_id, 0), IFNULL(@two_prior_price := @one_prior_price, 0), LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)), IFNULL(@one_prior_id := id, 0), IFNULL(@one_prior_price := price, 0), LENGTH(IFNULL(@one_prior_name := name, 0)) ) ) ) ) AS t2 )
Important Note
See how the above delete query does 2 inner selects? Make sure that you include this, otherwise you will unintentionally delete the last row! Try executing without the SELECT (...) AS t2
to see what I mean.
Answers 14
This is the second answer I've submitted for this question, but I think I've finally got it this time:
DELETE FROM products WHERE id IN ( SELECT id_to_delete FROM ( SELECT t0.id AS id_to_delete, t0.price, ( SELECT t1.price FROM products AS t1 WHERE (t0.date < t1.date) AND (t0.name = t1.name) ORDER BY t1.date ASC LIMIT 1 ) AS next_price, ( SELECT t2.price FROM products AS t2 WHERE (t0.date > t2.date) AND (t0.name = t2.name) ORDER BY t2.date DESC LIMIT 1 ) AS prev_price FROM products AS t0 HAVING (price = next_price) AND (price = prev_price) ) AS t )
This is a modified version of the answer from @vadim_hr.
Answers 15
You can write a query with distinct keyword as following :
SELECT DISTINCT(id, name, price, date) from YOUR_TABLE
0 comments:
Post a Comment