Friday, March 24, 2017

SQL query Frequency Distribution matrix for product

Leave a Comment

i want to create a frequency distribution matrix

1.Create a matrix.**Is it possible to get this in separate columns**    customer1       p1         p2      p3   customer 2      p2         p3   customer 3      p2         p3      p1   customer 4      p2         p1  2. Then I have to count the number of products that come together the most     For eg       p2 and p3 comes together 3 times     p1 p3   comes 2 times     p1 p2  comes  2 times  I want to recommend products to customers ,frequency of products that comes together   select customerId,product,count(*) from sales group by customerId,product 

Can anyone please help me for a solution to this

3 Answers

Answers 1

If you want pairs of products that customers purchase, then you can use a self join:

select s1.product, s2.product, count(*) as cnt from sales s1 join      sales s2      on s1.customerId = s2.customerId where s1.product < s2.product group by s1.product, s2.product order by cnt desc; 

You can extend this to more than two products by using more joins.

Answers 2

1.

Array

select      customerId            ,sort_array(collect_set (product))   as products  from        sales   group by    customerId ; 

Multiple columns

select      customerId             ,products[0] as p0            ,products[1] as p1            ,products[2] as p2            ,products[3] as p3            ,products[4] as p4            ,products[5] as p5            ,products[6] as p6            ,products[7] as p7            ,products[8] as p8            ,products[9] as p9  from       (select      customerId                        ,sort_array(collect_set (product))   as products              from        sales               group by    customerId             ) s ; 

2.

This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3) is the most frequent (appears twice).
(p1,p2) appears once and so is (p2,p3).
For frequency of tuples, see @GordonLinoff answer.

select      s.products            ,count(*)    as frequency   from       (select      customerId                        ,sort_array(collect_set (product))   as products              from        sales               group by    customerId             ) s  group by    s.products             order by    frequency desc 

Answers 3

I know cursors are not the flavor of the month anymore - but I still find them very useful from time to time

Below code creates a table variable and then loops through the sales data to find out which combination of products are sold together most frequently

--Results will be loaded into this table DECLARE @OutputCombination TABLE (CombinationName VARCHAR(MAX), CombinationsFound INT)   --Just for demo - create table and load with data declare @demodata table (ClientID int, ProductBought varchar(50)) insert into @demodata (Clientid, ProductBought) values  (1,'Coke'), (1,'Chips'), (1,'Chips'), (2,'Coke'), (2,'Coke'), (2,'Sweets'), (3,'Coke'), (4,'Chips'), (5,'Coke'), (5,'Chips'), (6,'Coke'), (7,'Coke')  DECLARE clients CURSOR READ_ONLY FOR SELECT DISTINCT clientID from @demodata  DECLARE @clientID INT OPEN clients  FETCH NEXT FROM clients INTO @clientID WHILE (@@fetch_status <> -1) BEGIN        DECLARE @ThisClientCombination VARCHAR(MAX) --This is going to be used to find/update combinations of products sold from the data - pipe-delimiting them        SET @ThisClientCombination  = '' --better to actually wipe variables in cursor loops, as cursors sometimes funny about things like that        ;WITH thisClientsDistinctCoverages AS --CTE used because of the SQL-funnies caused when using DISTINCT and ORDER BY        (               SELECT DISTINCT TOP 100 PERCENT CoverageItems = ProductBought + '|' FROM @demodata WHERE ClientID = @clientID ORDER BY ProductBought + '|' --order alphabetically and do DISTINCT so that 2x cokes are changed into just 1 coke - for the sake of combos        )        SELECT @ThisClientCombination = @ThisClientCombination + CoverageItems FROM thisClientsDistinctCoverages        SET @ThisClientCombination = LEFT(@ThisClientCombination,LEN(@ThisClientCombination)-1) --Drop trailing Pipe '|'         IF EXISTS(SELECT * FROM @OutputCombination WHERE CombinationName = @ThisClientCombination)               UPDATE @OutputCombination SET CombinationsFound = CombinationsFound + 1 WHERE CombinationName = @ThisClientCombination --found before, increase count by 1        ELSE               INSERT INTO @OutputCombination (CombinationName, CombinationsFound) VALUES (@ThisClientCombination, 1)--first time found, make entry 1         FETCH NEXT FROM clients INTO @clientID END  CLOSE clients DEALLOCATE clients  --Show the results SELECT * FROM @OutputCombination ORDER BY CombinationsFound DESC, CombinationName 

This will yield the results:

CombinationName---------CombinationsFound
Coke--------------------------3
Chips|Coke------------------2
Chips--------------------------1
Coke|Sweets----------------1

Interpretation of above results:
Most often, 3 times, a customer only bought coke(s)
Next up, twice a customer bought chips and a coke
Once a customer only bought chips
And once a customer bought a coke & sweets

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment