Thursday, February 23, 2017

Remove duplicate data from query results

Leave a Comment

I have the following query:

 select      C.ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits'  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      C.ClientID,     C.FirstName + ' ' + C.LastName,     CAST(V.StartDate as date)  having     count(*) > 3  order by     C.ClientID,      CAST(V.StartDate as date) 

which gives the following results (names are fake in case anyone is wondering)

 ClientID   ClientName            VisitDate      Number of Visits  75         Kay Taylor            2016-06-07     4  372         Moses Mcgowan       2016-09-03      4  422         Raven Mckay         2016-03-11      4  422         Raven Mckay         2016-06-14      4  679         Ulysses Booker      2016-01-09      4  696         Timon Turner        2016-07-06      4  1063        Quyn Wall           2016-06-25      4  1142        Garth Moran         2016-11-20      4  1142        Garth Moran         2016-11-21      4  1563        Hedley Gutierrez    2016-01-07      4  1563        Hedley Gutierrez    2016-01-17      4  1563        Hedley Gutierrez    2016-01-21      4  1563        Hedley Gutierrez    2016-01-27      4  1563        Hedley Gutierrez    2016-01-28      4  1563        Hedley Gutierrez    2016-01-30      4  1563        Hedley Gutierrez    2016-02-27      4  1563        Hedley Gutierrez    2016-03-26      4  1563        Hedley Gutierrez    2016-04-06      4  1563        Hedley Gutierrez    2016-04-09      4  1563        Hedley Gutierrez    2016-04-22      4  1563        Hedley Gutierrez    2016-05-06      4  1563        Hedley Gutierrez    2016-05-26      4  1563        Hedley Gutierrez    2016-06-02      4  1563        Hedley Gutierrez    2016-07-14      4  1563        Hedley Gutierrez    2016-07-29      4  1563        Hedley Gutierrez    2016-08-09      7  1563        Hedley Gutierrez    2016-09-01      4  1563        Hedley Gutierrez    2016-09-23      4  1563        Hedley Gutierrez    2016-12-07      4  1636        Kiara Lowery        2016-01-12      4  2917        Cynthia Carr        2016-06-21      4  2917        Cynthia Carr        2016-10-21      4  3219        Alan Monroe         2016-01-02      4  3219        Alan Monroe         016-02-27       4  3219        Alan Monroe         2016-09-01      5  4288        Natalie Mitchell    2016-03-19      4 

How can I get the results to show only the ClientID and ClientName once so the results are like this?

 ClientID   ClientName            VisitDate      Number of Visits  75         Kay Taylor            2016-06-07     4  372         Moses Mcgowan       2016-09-03      4  422         Raven Mckay         2016-03-11      4                                  2016-06-14      4  679         Ulysses Booker      2016-01-09      4  696         Timon Turner        2016-07-06      4  1063        Quyn Wall           2016-06-25      4  1142        Garth Moran         2016-11-20      4                                  2016-11-21      4  1563        Hedley Gutierrez    2016-01-07      4                                  2016-01-17      4                                  2016-01-21      4                                  2016-01-27      4                                  2016-01-28      4                                  2016-01-30      4                                  2016-02-27      4                                  2016-03-26      4                                  2016-04-06      4                                  2016-04-09      4                                  2016-04-22      4                                  2016-05-06      4                                  2016-05-26      4                                  2016-06-02      4                                  2016-07-14      4                                  2016-07-29      4                                  2016-08-09      7                                  2016-09-01      4                                  2016-09-23      4                                  2016-12-07      4  1636        Kiara Lowery        2016-01-12      4  2917        Cynthia Carr        2016-06-21      4                                  2016-10-21      4  3219        Alan Monroe         2016-01-02      4  3219                            016-02-27       4                                  2016-09-01      5  4288        Natalie Mitchell    2016-03-19      4 

6 Answers

Answers 1

Actually, what you want is not to remove duplicates, but not display them.

In order to do this you can use a CASE statement with ROW_NUMBER() and show the value on the 1st row and display either NULL or '' on the ELSE branch (the other rows):

select     CASE        WHEN ROW_NUMBER() OVER (PARTITION BY C.ClientID ORDER BY CAST(V.StartDate as date) ASC) = 1             THEN C.ClientID        ELSE NULL    END as ClientID,    CASE         WHEN ROW_NUMBER() OVER (PARTITION BY C.ClientID ORDER BY CAST(V.StartDate as date) ASC) = 1             THEN C.FirstName + ' ' + C.LastName        ELSE NULL     END as ClientName,    CAST(V.StartDate as date) as VisitDate,    count(*) as 'Number of Visits' from    Visit V Inner Join Client C on    V.ClientID = C.ClientID group by     C.ClientID,    C.FirstName + ' ' + C.LastName,    CAST(V.StartDate as date) having    count(*) > 3 order by    C.ClientID,     CAST(V.StartDate as date) 

Answers 2

Try this:

DECLARE @Table TABLE (ClientId NVARCHAR(5), ClientName NVARCHAR(6), VisitDate DATE, NumOfVisits INT)  INSERT INTO @Table VALUES ('75' , 'A_NAME' , '2016-06-07' , '4' ),('372' , 'B_NAME' , '2016-09-03' , '4' ),   ('422' , 'C_NAME' , '2016-03-11' , '4' ),('500' , 'D_NAME' , '2016-03-15' , '4'),   ('500' , 'D_NAME' , '2016-03-19' , '4' ),('500' , 'D_NAME' , '2016-03-20' , '4'),   ('500' , 'D_NAME' , '2016-07-15' , '4' ),('500' , 'D_NAME' , '2016-09-13' , '4'),   ('600' , 'E_NAME' , '2016-03-19' , '4' ),('600' , 'E_NAME' , '2016-03-20' , '4'),   ('600' , 'E_NAME' , '2016-07-15' , '4' ),('600' , 'E_NAME' , '2016-09-13' , '4')  ;WITH A AS ( SELECT ROW_NUMBER() OVER(PARTITION BY ClientID ORDER BY ClientID) row_id,* FROM (                       ----------------------------------------- SELECT * FROM @Table --- replace this line with your query----                      -----------------------------------------   ) Main_Result ) SELECT ISNULL(BB.ClientID,'')ClientID,ISNULL(BB.ClientName,'')ClientName,AA.VisitDate,AA.NumOfVisits FROM A AA LEFT JOIN (SELECT * FROM A BB WHERE BB.row_id=1) BB ON AA.ClientID = BB.ClientID AND AA.row_id =BB.row_id              ORDER BY CONVERT(INT,AA.ClientID) 

Hope this helps. :)

you can execute this directly to get sample result from sample data. :)

Answers 3

have you tried using distinct in your query: https://www.w3schools.com/sql/sql_distinct.asp

Answers 4

You can solved this with GROUP BY, grouping by (ClientID, VisitDate).

See the response 1097 here: Using group by on multiple columns

Note: In your ORDER BY is not necessary to use CAST(V.StartDate as date) you can use VisitDate because it exists in your SELECT: ... CAST(V.StartDate as date) as VisitDate,

EDIT: Try this:

SELECT       C.ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits'  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      (C.ClientID, VisitDate)  having     count(*) > 3  order by     C.ClientID,      VisitDate 

Answers 5

I hope the below query would do that.... :)

WITH CTE AS  (  select top 100 percent     cast(C.ClientID as nvarchar(255)) as ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits',     row_number() over (partition by C.ClientID,C.FirstName + ' ' + C.LastName  ORDER BY CAST(V.StartDate as date) ) as rw_num  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      C.ClientID,     C.FirstName + ' ' + C.LastName,     CAST(V.StartDate as date)  having     count(*) > 3  order by     min(C.ClientID),      min(CAST(V.StartDate as date))  )  select case         when rw_num<>1 then '' else  ClientID end as ClientID,        case        when rw_num<>1 then '' else  ClientName end as ClientName,        VisitDate, [Number of Visits] from CTE 

Result:

enter image description here

My test Data in my test tables:

enter image description here

Answers 6

I would use your initial query as a CTE or as subquery to replace #TMP_DATA. Here is what I would do:

SELECT CASE WHEN CAST( LAG(T.ClientID,1,'') OVER (PARTITION BY T.ClientID ORDER BY T.ClientID,T.VisitDate) AS VARCHAR ) = T.ClientID THEN '' ELSE CAST(T.ClientID AS VARCHAR) END AS ClientID, CASE WHEN LAG(T.ClientName,1,'') OVER (PARTITION BY T.ClientID ORDER BY T.ClientID,T.VisitDate) = T.ClientName THEN '' ELSE T.ClientName END ClientName, T.VisitDate, T.[Number of Visits] FROM #TMP_DATA AS T

The result set is: enter image description here

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment