Sunday, August 20, 2017

How to identify groups/clusters in set of arcs/edges in SQL?

Leave a Comment

I have arcs/edges like these ‘haves’:

Node1   Node2 A       B B       C D       E 

Here A is connected to B and B to C. D is connected to E. In other words there are 2 groups/clusters shown in these ‘wants’:

Node1   Node2   Cluster A       B       1 B       C       1 D       E       2 

Could I use SQL to identify these groups/clusters? I guess this involves self-joins but I cannot see how to write this SQL. Any feedback would be very much appreciated. Thanks!

2 Answers

Answers 1

Assuming from the sample data provided in question, that there is no loops like A->B->C->A in data, Following is the query that will return the desired output.

WITH RECURSIVE NodeCluster (node1,node2,Cluster1) AS   (SELECT node1,           node2,           Rank() Over (                        ORDER BY Char2HexInt(node1))    FROM nodes    WHERE node1 IN        (SELECT DISTINCT(n1.node1) AS nList         FROM nodes n1         WHERE n1.node1 IS NOT IN             (SELECT DISTINCT(n2.node2)              FROM nodes n2) )    UNION ALL SELECT N1.node1,                     N1.node2,                     CASE                         WHEN NodeCluster.Node1=n1.node1 THEN (Cluster1+1)                         ELSE Cluster1                     END    FROM nodes n1,         NodeCluster    WHERE NodeCluster.node2=n1.node1      ) SELECT * FROM NodeCluster ORDER BY Cluster1,          node1,          node2; 

In seed query all the starting nodes are selected and Char2HexInt function is used to sort the seed nodes in asc order.

On the data provided in question, following is the output.

Node1 | Node2  | Cluster1 ------------------------- A       B        1 B       C        1 D       E        2 

For re-assurance, more data has been added to the sample data as below.

Node1 | Node2 ------------- A        B    B        C    D        E    E        F    F        G    H        I    I        J    J        K    L        M   

The query resulted in below output.

Node1 | Node2  | Cluster1 ------------------------- A       B         1 B       C         1 D       E         2 E       F         2 F       G         2 H       I         3 I       J         3 J       K         3 L       M         4 

Hope this will help.

Edit: After dnoeth suggestions, below is the simpler version of initial query.

  WITH RECURSIVE NodeCluster (node1,node2,Cluster1) AS   (SELECT node1,           node2,           Rank() Over (                        ORDER BY node1)    FROM nodes AS n1    WHERE NOT EXISTS        (SELECT *         FROM nodes AS n2         WHERE n1.node1 = n2.node2)    UNION ALL SELECT N1.node1,                     N1.node2,                     NodeCluster.Cluster1    FROM nodes n1,         NodeCluster    WHERE NodeCluster.node2=n1.node1 ) SELECT * FROM NodeCluster ORDER BY Cluster1,          node1,          node2; 

Answers 2

Try this:

    DECLARE @Edges TABLE     (       ID INT ,       Node1 CHAR(1) ,       Node2 CHAR(1)     ); INSERT  INTO @Edges         ( Node1, Node2 ) VALUES  ( 'A', 'B' ),         ( 'B', 'C' ),         ( 'D', 'E' ); WITH    CTE_Nodes           AS ( SELECT   Node1 AS Node                FROM     @Edges                UNION                SELECT   Node2 AS Node                FROM     @Edges              ),         CTE_Pairs           AS ( SELECT   Node1 ,                         Node2                FROM     @Edges                WHERE    Node1 <> Node2                UNION                SELECT   Node2 AS Node1 ,                         Node1 AS Node2                FROM     @Edges                WHERE    Node1 <> Node2              ),         CTE_Recursive           AS ( SELECT   CAST(CTE_Nodes.Node AS VARCHAR(8000)) AS AnchorNode ,                         Node1 ,                         Node2 ,                         CAST(',' + Node1 + ',' + Node2 + ',' AS VARCHAR(8000)) AS NodePath ,                         1 AS Lvl                FROM     CTE_Pairs                         INNER JOIN CTE_Nodes ON CTE_Nodes.Node = CTE_Pairs.Node1                UNION ALL                SELECT   CTE_Recursive.AnchorNode ,                         CTE_Pairs.Node1 ,                         CTE_Pairs.Node2 ,                         CAST(CTE_Recursive.NodePath + CTE_Pairs.Node2 + ',' AS VARCHAR(8000)) AS NodePath ,                         CTE_Recursive.Lvl + 1 AS Lvl                FROM     CTE_Pairs                         INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = CTE_Pairs.Node1                WHERE    CTE_Recursive.NodePath NOT LIKE CAST('%,'                         + CTE_Pairs.Node2 + ',%' AS VARCHAR(8000))              ),         CTE_RecursionResult           AS ( SELECT   AnchorNode ,                         Node1 ,                         Node2                FROM     CTE_Recursive              ),         CTE_CleanResult           AS ( SELECT   AnchorNode ,                         Node1 AS Node                FROM     CTE_RecursionResult                UNION                SELECT   AnchorNode ,                         Node2 AS Node                FROM     CTE_RecursionResult              )     SELECT  Edges.Node1 ,             Edges.Node2 ,             DENSE_RANK() OVER ( ORDER BY CASE WHEN CA_Data.XML_Value IS NULL                                               THEN Edges.Node1                                               ELSE CA_Data.XML_Value                                          END ) AS Cluster     FROM    @Edges Edges             CROSS APPLY ( SELECT    CTE_CleanResult.Node + ','                           FROM      CTE_CleanResult                           WHERE     CTE_CleanResult.AnchorNode = Edges.Node1                           ORDER BY  CTE_CleanResult.Node                         FOR                           XML PATH('') ,                               TYPE                         ) AS CA_XML ( XML_Value )             CROSS APPLY ( SELECT    CA_XML.XML_Value.value('.',                                                            'NVARCHAR(MAX)')                         ) AS CA_Data ( XML_Value ); 

SQL Fiddle DEMO

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment