i want to create a frequency distribution matrix
1.Create a matrix.**Is it possible to get this in separate columns** customer1 p1 p2 p3 customer 2 p2 p3 customer 3 p2 p3 p1 customer 4 p2 p1 2. Then I have to count the number of products that come together the most For eg p2 and p3 comes together 3 times p1 p3 comes 2 times p1 p2 comes 2 times I want to recommend products to customers ,frequency of products that comes together select customerId,product,count(*) from sales group by customerId,product
Can anyone please help me for a solution to this
3 Answers
Answers 1
If you want pairs of products that customers purchase, then you can use a self join:
select s1.product, s2.product, count(*) as cnt from sales s1 join sales s2 on s1.customerId = s2.customerId where s1.product < s2.product group by s1.product, s2.product order by cnt desc;
You can extend this to more than two products by using more joins.
Answers 2
1.
Array
select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ;
Multiple columns
select customerId ,products[0] as p0 ,products[1] as p1 ,products[2] as p2 ,products[3] as p3 ,products[4] as p4 ,products[5] as p5 ,products[6] as p6 ,products[7] as p7 ,products[8] as p8 ,products[9] as p9 from (select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ) s ;
2.
This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3)
is the most frequent (appears twice).
(p1,p2)
appears once and so is (p2,p3)
.
For frequency of tuples, see @GordonLinoff answer.
select s.products ,count(*) as frequency from (select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ) s group by s.products order by frequency desc
Answers 3
I know cursors are not the flavor of the month anymore - but I still find them very useful from time to time
Below code creates a table variable and then loops through the sales data to find out which combination of products are sold together most frequently
--Results will be loaded into this table DECLARE @OutputCombination TABLE (CombinationName VARCHAR(MAX), CombinationsFound INT) --Just for demo - create table and load with data declare @demodata table (ClientID int, ProductBought varchar(50)) insert into @demodata (Clientid, ProductBought) values (1,'Coke'), (1,'Chips'), (1,'Chips'), (2,'Coke'), (2,'Coke'), (2,'Sweets'), (3,'Coke'), (4,'Chips'), (5,'Coke'), (5,'Chips'), (6,'Coke'), (7,'Coke') DECLARE clients CURSOR READ_ONLY FOR SELECT DISTINCT clientID from @demodata DECLARE @clientID INT OPEN clients FETCH NEXT FROM clients INTO @clientID WHILE (@@fetch_status <> -1) BEGIN DECLARE @ThisClientCombination VARCHAR(MAX) --This is going to be used to find/update combinations of products sold from the data - pipe-delimiting them SET @ThisClientCombination = '' --better to actually wipe variables in cursor loops, as cursors sometimes funny about things like that ;WITH thisClientsDistinctCoverages AS --CTE used because of the SQL-funnies caused when using DISTINCT and ORDER BY ( SELECT DISTINCT TOP 100 PERCENT CoverageItems = ProductBought + '|' FROM @demodata WHERE ClientID = @clientID ORDER BY ProductBought + '|' --order alphabetically and do DISTINCT so that 2x cokes are changed into just 1 coke - for the sake of combos ) SELECT @ThisClientCombination = @ThisClientCombination + CoverageItems FROM thisClientsDistinctCoverages SET @ThisClientCombination = LEFT(@ThisClientCombination,LEN(@ThisClientCombination)-1) --Drop trailing Pipe '|' IF EXISTS(SELECT * FROM @OutputCombination WHERE CombinationName = @ThisClientCombination) UPDATE @OutputCombination SET CombinationsFound = CombinationsFound + 1 WHERE CombinationName = @ThisClientCombination --found before, increase count by 1 ELSE INSERT INTO @OutputCombination (CombinationName, CombinationsFound) VALUES (@ThisClientCombination, 1)--first time found, make entry 1 FETCH NEXT FROM clients INTO @clientID END CLOSE clients DEALLOCATE clients --Show the results SELECT * FROM @OutputCombination ORDER BY CombinationsFound DESC, CombinationName
This will yield the results:
CombinationName---------CombinationsFound
Coke--------------------------3
Chips|Coke------------------2
Chips--------------------------1
Coke|Sweets----------------1
Interpretation of above results:
Most often, 3 times, a customer only bought coke(s)
Next up, twice a customer bought chips and a coke
Once a customer only bought chips
And once a customer bought a coke & sweets
0 comments:
Post a Comment