Saturday, August 26, 2017

Deleting repeating values in database

Leave a Comment

I have a MySql table which is populating daily with price values. Each day it records an entry even if the price hasn't changed. I want to delete some rows that repeat too much. I want to keep the first price and the last price before there is a price change.

Example 1)

   id name     price date     1 Product1 $6 13/07/2017     2 Product1 $6 14/07/2017     3 Product1 $6 15/07/2017     4 Product1 $7 16/07/2017     5 Product1 $6 17/07/2017     6 Product1 $6 18/07/2017     7 Product1 $6 19/07/2017 

From that list records with id 2 and 6 should be deleted with the following result:

   id name     price date     1 Product1 $6 13/07/2017     3 Product1 $6 15/07/2017     4 Product1 $7 16/07/2017     5 Product1 $6 17/07/2017     7 Product1 $6 19/07/2017 

Example 2)

   id name     price date     1 Product1 $6 13/07/2017     2 Product1 $6 14/07/2017     3 Product1 $6 15/07/2017     4 Product1 $6 16/07/2017     5 Product1 $6 17/07/2017     6 Product1 $6 18/07/2017     7 Product1 $6 19/07/2017 

Here there is no price change so I can delete all records from 2 to 6:

   id name     price date     1 Product1 $6 13/07/2017     7 Product1 $6 19/07/2017 

Id is not supposed to be one incremental and also date is not daily on daily basis.

15 Answers

Answers 1

You can do this with some creative self-join logic.

Think of three hypothetical rows in the table.

  • Row a you want to keep.
  • Row b has the same product name and price, and a date 1 day after a. You want to delete this.
  • Row c has the same product name and price, and a date 1 day after b. You want to keep this.

So if you can do a self-join to match these three rows, then delete row b.

DELETE b FROM MyTable AS a  JOIN MyTable AS b ON a.name=b.name AND a.price=b.price AND a.date=b.date + INTERVAL 1 DAY  JOIN MyTable AS c ON b.name=c.name AND b.price=c.price AND b.date=c.date + INTERVAL 1 DAY; 

This works even if there are multiple rows that fit the conditions for row b. It'll delete the first one, and then continue to delete subsequent rows that also fit the conditions.

This works if you use the DATE data type and store your dates as 'YYYY-MM-DD', not 'DD-MM-YYYY'. You should do this anyway.

Answers 2

You want to delete the rows where the product name and price are the same as the rows with the date plus/minus one day.

DELETE row_mid FROM    record_table AS row_mid   JOIN record_table AS row_prev   JOIN record_table AS row_next WHERE   row_mid.name = row_prev.name    AND row_mid.price = row_prev.price   AND row_mid.date = DATE_SUB(row_prev.date, INTERVAL 1 DAY)   AND row_mid.name = row_next.name   AND row_mid.price = row_next.price   AND row_mid.date = DATE_ADD(row_next.date, INTERVAL 1 DAY); 

Answers 3

Is your MySQL new enough to support CTE? This is a pretty interesting problem I've seen with date scheduling. The code always looks awkward. To check the results without a delete you can switch the comment mark with the select and delete and comment out the t.[Name] is null line.

WITH

cte AS  (         SELECT a.ID             , a.[Name]             , a.[Date]             , a.Price             , NextDate = max(npc.[Date])    -- Next Price change             , PrevDate = max(lpc.[Date])    -- Next Price change         FROM    yourTable as a  -- Base Table             LEFT JOIN                 yourTable as npc    -- Looking for Next Price Change             ON a.[Name] = npc.[Name]                 and a.[Date] < npc.[Date]                 and a.Price <> npc.Price             LEFT JOIN                 yourTable as lpc    -- Looking for Last Price Change             ON a.[Name] = lpc.[Name]                 and a.[Date] > lpc.[Date]                 and a.Price <> lpc.Price         GROUP BY a.ID, a.[Name], a.[Date], a.Price     )   ----SELECT f.*, [Check] = CASE WHEN t.[Name] is null THEN 'DELETE' ELSE '' END DELETE f FROM          yourTable as f     LEFT JOIN         (             SELECT [Name], [GoodDate] = Max([Date])             FROM cte             GROUP BY [Name], PrevDate             UNION             SELECT [Name], [GoodDate] = Min([Date])             FROM cte             GROUP BY [Name], PrevDate             UNION             SELECT [Name], [GoodDate] = Max([Date])             FROM cte             GROUP BY [Name], NextDate             UNION             SELECT [Name], [GoodDate] = Min([Date])             FROM cte             GROUP BY [Name], NextDate         ) as t     ON t.[Name] = f.[Name] and t.[GoodDate] = f.[Date] WHERE t.[Name] is null --ORDER BY f.[Name], f.[Date] 

Answers 4

MySQL doesn't support lead() and lag() but you can simulate these functions with variables and delete all rows where both the previous and next price is the same as the row's price.

To get the previous price:

select *,     if(@prevName = name, @prevPrice, null) as prevPrice,    @prevName := name,    @prevPrice := price from prices order by name, id 

Similarly to get the next price:

select *,     if(@nextName = name, @nextPrice, null) as nextPrice,    @nextName := name,    @nextPrice := price from prices order by name, id desc 

Now you can only select ids where the current price is the same as both the next and previous price

select t1.id from (     select *,          if(@prevName = name, @prevPrice, null) as prevPrice,     @prevName := name,     @prevPrice := price     from prices     order by name, id ) t1 join (     select *,          if(@nextName = name, @nextPrice, null) as nextPrice,         @nextName := name,         @nextPrice := price     from prices     order by name, id desc ) t2 on t1.id = t2.id  and t1.prevPrice = t1.price  and t2.nextPrice = t2.price 

And to delete

delete from prices  where id in ( select ids where previous and next price is the same ) 

Demo

Answers 5

You can detect prev Id and next Id, then select rows for deletion:

SELECT *  FROM    (SELECT        *,       (SELECT next_id.id         FROM a next_id         WHERE next_id.id > current.id         ORDER BY next_id.id ASC LIMIT 1) as next_id,       (SELECT prev_id.id         FROM a prev_id         WHERE prev_id.id < current.id         ORDER BY prev_id.id DESC LIMIT 1) as prev_id     FROM a current) t WHERE     EXISTS (SELECT 1             FROM a next             WHERE next.name = t.name AND t.price = next.price AND next.id=t.next_id)     AND    EXISTS (SELECT 1             FROM a prev             WHERE prev.name = t.name AND t.price = prev.price AND prev.id=t.prev_id) 

I tested this queries on both of your examples. Demo.

UPDATE. If the Id column is not unique then the logic must be corrected from prev Id+next Id to prev Date+next Date. Anyway general concept will stay the same. Query will looks like this:

SELECT *  FROM    (SELECT        *,       (SELECT next_date.date         FROM a next_date         WHERE next_date.date > current.date AND next_date.name = current.name        ORDER BY next_date.date ASC LIMIT 1) as next_date,       (SELECT prev_date.date        FROM a prev_date         WHERE prev_date.date < current.date AND prev_date.name = current.name        ORDER BY prev_date.date DESC LIMIT 1) as prev_date    FROM a current) t WHERE     EXISTS (SELECT 1             FROM a next             WHERE next.name = t.name AND t.price = next.price AND next.date=t.next_date)     AND    EXISTS (SELECT 1             FROM a prev             WHERE prev.name = t.name AND t.price = prev.price AND prev.date=t.prev_date) 

Demo for second query.

Answers 6

all of your data is repeated, witch one do you want to keep? your explanation is confusing.

you can save oldest data with same price and delete other :

with Ranked as ( select  name, price, date,     dense_rank()      over (partition by name, price, date  order by date desc) as DupeCount from    Your_table P ) delete  R from    Ranked R where   R.DupeCount <> 1 

Answers 7

Well i cannot write the exact code for your scenario but you can write a Function\Procedure and follow this pseudo code

r = allrows tobeDeleted = [] unique = [] for (var i=0;i<rows.length; i++){     unique.push(rows[i]->id);     dd = true;     while (dd){         if ((rows[i]->price == rows[i+1]->price) AND (rows[i]->name == rows[i+1]->price)){             tobeDeleted.push(rows[i]->id);             i++;         }else{             dd= false;         }     } }  //tobeDeleted contains ids of rows to be deleted // 

Answers 8

Try the below query ,hope it helps you.

(I don't have mysql , I've tried to convert the syntax to my sql-- so I'm sorry if any syntax error.)

(I've tested it on sqlserver it works well and get the result you want)

CREATE TEMPORARY Table tempData (id INT,name varchar(50),price DECIMAL(12,2),date DATE,groupDate DATE)  /* get the data grouped by name and price with NewField GroupDate to group every continous dates  then save it to temporary table*/ insert into tempData(id,name,price,date,groupDate) select id,name,price,date,DATE_SUB(date, INTERVAL rowNumber DAY)groupDate from( select t1.id,t1.name,t1.price,t1.date,count(*) rowNumber           from               (select id,name,price,date from yourTable) t1           inner join              (select id,name,price,date from yourTable) t2            on t1.name=t2.name and t1.price=t2.price and t1.date>=t2.date  group by t1.id,t1.name,t1.price,t1.date  ) t    delete from yourTable where id not in (select id from   (  /* query to order every continous data asscending using the date field */ select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn  from  tempData firstData left join  tempData secondData on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate and firstData.date>=secondData.date group by firstData.id,firstData.name,firstData.price,firstData.date   /* query to order every continous data  Descending using the date field */ union all select firstData.id,firstData.name,firstData.price,firstData.date,count(*) rn  from  tempData firstData left join  tempData secondData on firstData.name=secondData.name and firstData.price=secondData.price and firstData.groupDate=secondData.groupDate and firstData.date<=secondData.date group by firstData.id,firstData.name,firstData.price,firstData.date   )allData where rn=1    )       

Answers 9

You can use below piece of code. Let me know if it is working.

DELETE FROM record_table WHERE id NOT IN (     (SELECT MIN(id) FROM record_table GROUP BY name, price),     (SELECT MAX(id) FROM record_table GROUP BY name, price) ) 

Answers 10

You can use EXISTS

DELETE FROM test t1 WHERE EXISTS (   SELECT *    FROM test t2    WHERE t1.name = t2.name AND t1.price = t2.price AND t1.day = DATE_SUB(t2.DAY, INTERVAL 1 DAY) ) AND EXISTS(   SELECT *    FROM test t3    WHERE t1.name = t3.name AND t1.price = t3.price AND t1.day = DATE_ADD(t3.DAY, INTERVAL 1 DAY) ) 

or IN construct to solve your problem

DELETE FROM test t1 WHERE t1.day IN (   SELECT DATE_SUB(t2.day, INTERVAL 1 DAY)   FROM test t2    WHERE t1.NAME = t2.NAME AND t1.price = t2.price ) AND t1.day IN (   SELECT DATE_ADD(t3.day, INTERVAL 1 DAY)    FROM test t3    WHERE t1.NAME = t3.NAME AND t1.price = t3.price ) 

sqlfiddle demo

Answers 11

You can use the logic below:

  1. rank by price
  2. group by id, name, price
  3. get min date
  4. get max date

Following the query and fiddle example:

SET @prev_value = NULL; SET @rank_count = 0;  select distinct   `name`,   `price`,   `date` from  (   (   select      id,     name,     price,     CASE       WHEN @prev_value = price THEN @rank_count       WHEN @prev_value := price THEN @rank_count := @rank_count + 1     END AS rank,     min(`date`) as `date`   from      `prices`    group by       `name`,       `price`,       `rank`    )    union distinct    (    select      id,     name,     price,     CASE       WHEN @prev_value = price THEN @rank_count       WHEN @prev_value := price THEN @rank_count := @rank_count + 1     END AS rank,     max(`date`) as `date`   from      `prices`    group by       `name`,       `price`,       `rank`   )   order by `id`, `date` ) as `result` 

sqlfiddle

Answers 12

We have to ask ourselves, when do we have to delete a record?

Answer: A record can be deleted,

  • if there exists another record, with the same name, with the same price, and an earlier date, while there is no record with the same name, with another price in between the both dates.

    and

  • if there exists another record, with the same name, with the same price, and a later date, while there is no record with the same name, with another price in between the both dates.

Putting both requirements into SQL results in the following:

DELETE FROM PriceTable t WHERE    EXISTS ( SELECT *            FROM PriceTable tmp1             WHERE t.name  = tmp1.name  AND                   t.price = tmp1.price AND                   t.date  > tmp1.date  AND                  NOT EXISTS (SELECT *                               FROM PriceTable tmp2                              WHERE t.name    = tmp2.name  AND                                     t.price  != tmp2.price AND                                     t.date    > tmp2.date  AND                                     tmp1.date < tmp2.date                              )          )   AND   EXISTS ( SELECT *            FROM PriceTable tmp1             WHERE t.name  = tmp1.name  AND                   t.price = tmp1.price AND                   t.date  < tmp1.date  AND                  NOT EXISTS (SELECT *                               FROM PriceTable tmp2                              WHERE t.name    = tmp2.name  AND                                     t.price  != tmp2.price AND                                     t.date    < tmp2.date  AND                                     tmp1.date > tmp2.date                              )           ); 

Answers 13

Edit: after further consideration, it appears that it's not possible to solve this problem with user defined variable trickery (take note of the other solutions using these). While I think the below solution will "most likely work 99% of the time," MySQL does not guarantee the order of variable evaluation: link 1 and link 2.

Original answer:

(I am working under the assumptions that products.name is defined as NOT NULL and both products.id and products.price are not negative [can provide a simple patch if handling negatives, too]).

The query:

SET     @one_prior_id := NULL,     @one_prior_price := NULL,     @one_prior_name := NULL,     @two_prior_id := NULL,     @two_prior_price := NULL,     @two_prior_name := NULL ;  SELECT @two_prior_id AS id_to_delete FROM (     SELECT *     FROM products     ORDER BY name, date ) AS t WHERE IF(     (         (name  = @one_prior_name)         AND         (name  = @two_prior_name)         AND         (price = @one_prior_price)         AND         (price = @two_prior_price)     ), (         GREATEST(             1,             IFNULL(@two_prior_id := @one_prior_id, 0),             IFNULL(@two_prior_price := @one_prior_price, 0),             LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),             IFNULL(@one_prior_id := id, 0),             IFNULL(@one_prior_price := price, 0),             LENGTH(IFNULL(@one_prior_name := name, 0))         )     ), (         LEAST(             0,             IFNULL(@two_prior_id := @one_prior_id, 0),             IFNULL(@two_prior_price := @one_prior_price, 0),             LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),             IFNULL(@one_prior_id := id, 0),             IFNULL(@one_prior_price := price, 0),             LENGTH(IFNULL(@one_prior_name := name, 0))         )     ) ) 

The query's return, based on your "Example 1:"

+--------------+ | id_to_delete | +--------------+ |            2 | |            6 | +--------------+ 

The query's return, based on your "Example 2:"

+--------------+ | id_to_delete | +--------------+ |            2 | |            3 | |            4 | |            5 | |            6 | +--------------+ 

How the query works:

  • Do a simple 'partitioning' of the products table via ORDER BY

  • Loop over the ordered result set, tracking 2 sets of variables: the 1st set to hold the price and name of the 'one prior' row (the 'one prior' row being directly above the current row) and the 2nd set of variables to hold the 'two prior' row (the 'two prior' row being directly above the 'one prior' row).

  • The GREATEST and LEAST are identical except that the former returns a value that will evaluate to true for the IF and the latter will evaluate to false. The real point of these functions is to update our loop variables.

  • See this for more info regarding variable updating within subqueries.

The actual DELETE:

SET     @one_prior_id := NULL,     @one_prior_price := NULL,     @one_prior_name := NULL,     @two_prior_id := NULL,     @two_prior_price := NULL,     @two_prior_name := NULL ;  DELETE FROM products WHERE id IN (     SELECT * FROM (         SELECT @two_prior_id AS id_to_delete         FROM (             SELECT *             FROM products             ORDER BY name, date         ) AS t1         WHERE IF(             (                 (name  = @one_prior_name)                 AND                 (name  = @two_prior_name)                 AND                 (price = @one_prior_price)                 AND                 (price = @two_prior_price)             ), (                 GREATEST(                     1,                     IFNULL(@two_prior_id := @one_prior_id, 0),                     IFNULL(@two_prior_price := @one_prior_price, 0),                     LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),                     IFNULL(@one_prior_id := id, 0),                     IFNULL(@one_prior_price := price, 0),                     LENGTH(IFNULL(@one_prior_name := name, 0))                 )             ), (                 LEAST(                     0,                     IFNULL(@two_prior_id := @one_prior_id, 0),                     IFNULL(@two_prior_price := @one_prior_price, 0),                     LENGTH(IFNULL(@two_prior_name := @one_prior_name, 0)),                     IFNULL(@one_prior_id := id, 0),                     IFNULL(@one_prior_price := price, 0),                     LENGTH(IFNULL(@one_prior_name := name, 0))                 )             )         )     ) AS t2 ) 

Important Note

See how the above delete query does 2 inner selects? Make sure that you include this, otherwise you will unintentionally delete the last row! Try executing without the SELECT (...) AS t2 to see what I mean.

Answers 14

This is the second answer I've submitted for this question, but I think I've finally got it this time:

DELETE FROM products WHERE id IN (     SELECT id_to_delete     FROM (         SELECT             t0.id AS id_to_delete,             t0.price,             (                 SELECT t1.price                 FROM products AS t1                 WHERE (t0.date < t1.date)                     AND (t0.name = t1.name)                 ORDER BY t1.date ASC                 LIMIT 1             ) AS next_price,             (                 SELECT t2.price                 FROM products AS t2                 WHERE (t0.date > t2.date)                     AND (t0.name = t2.name)                 ORDER BY t2.date DESC                 LIMIT 1             ) AS prev_price         FROM products AS t0         HAVING (price = next_price) AND (price = prev_price)     ) AS t ) 

This is a modified version of the answer from @vadim_hr.

Answers 15

You can write a query with distinct keyword as following :

SELECT DISTINCT(id, name, price, date) from YOUR_TABLE 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment