i want to create a frequency distribution matrix
1.Create a matrix.**Is it possible to get this in separate columns** customer1 p1 p2 p3 customer 2 p2 p3 customer 3 p2 p3 p1 customer 4 p2 p1 2. Then I have to count the number of products that come together the most For eg p2 and p3 comes together 3 times p1 p3 comes 2 times p1 p2 comes 2 times I want to recommend products to customers ,frequency of products that comes together select customerId,product,count(*) from sales group by customerId,product Can anyone please help me for a solution to this
3 Answers
Answers 1
If you want pairs of products that customers purchase, then you can use a self join:
select s1.product, s2.product, count(*) as cnt from sales s1 join sales s2 on s1.customerId = s2.customerId where s1.product < s2.product group by s1.product, s2.product order by cnt desc; You can extend this to more than two products by using more joins.
Answers 2
1.
Array
select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ; Multiple columns
select customerId ,products[0] as p0 ,products[1] as p1 ,products[2] as p2 ,products[3] as p3 ,products[4] as p4 ,products[5] as p5 ,products[6] as p6 ,products[7] as p7 ,products[8] as p8 ,products[9] as p9 from (select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ) s ; 2.
This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3) is the most frequent (appears twice).
(p1,p2) appears once and so is (p2,p3).
For frequency of tuples, see @GordonLinoff answer.
select s.products ,count(*) as frequency from (select customerId ,sort_array(collect_set (product)) as products from sales group by customerId ) s group by s.products order by frequency desc Answers 3
I know cursors are not the flavor of the month anymore - but I still find them very useful from time to time
Below code creates a table variable and then loops through the sales data to find out which combination of products are sold together most frequently
--Results will be loaded into this table DECLARE @OutputCombination TABLE (CombinationName VARCHAR(MAX), CombinationsFound INT) --Just for demo - create table and load with data declare @demodata table (ClientID int, ProductBought varchar(50)) insert into @demodata (Clientid, ProductBought) values (1,'Coke'), (1,'Chips'), (1,'Chips'), (2,'Coke'), (2,'Coke'), (2,'Sweets'), (3,'Coke'), (4,'Chips'), (5,'Coke'), (5,'Chips'), (6,'Coke'), (7,'Coke') DECLARE clients CURSOR READ_ONLY FOR SELECT DISTINCT clientID from @demodata DECLARE @clientID INT OPEN clients FETCH NEXT FROM clients INTO @clientID WHILE (@@fetch_status <> -1) BEGIN DECLARE @ThisClientCombination VARCHAR(MAX) --This is going to be used to find/update combinations of products sold from the data - pipe-delimiting them SET @ThisClientCombination = '' --better to actually wipe variables in cursor loops, as cursors sometimes funny about things like that ;WITH thisClientsDistinctCoverages AS --CTE used because of the SQL-funnies caused when using DISTINCT and ORDER BY ( SELECT DISTINCT TOP 100 PERCENT CoverageItems = ProductBought + '|' FROM @demodata WHERE ClientID = @clientID ORDER BY ProductBought + '|' --order alphabetically and do DISTINCT so that 2x cokes are changed into just 1 coke - for the sake of combos ) SELECT @ThisClientCombination = @ThisClientCombination + CoverageItems FROM thisClientsDistinctCoverages SET @ThisClientCombination = LEFT(@ThisClientCombination,LEN(@ThisClientCombination)-1) --Drop trailing Pipe '|' IF EXISTS(SELECT * FROM @OutputCombination WHERE CombinationName = @ThisClientCombination) UPDATE @OutputCombination SET CombinationsFound = CombinationsFound + 1 WHERE CombinationName = @ThisClientCombination --found before, increase count by 1 ELSE INSERT INTO @OutputCombination (CombinationName, CombinationsFound) VALUES (@ThisClientCombination, 1)--first time found, make entry 1 FETCH NEXT FROM clients INTO @clientID END CLOSE clients DEALLOCATE clients --Show the results SELECT * FROM @OutputCombination ORDER BY CombinationsFound DESC, CombinationName This will yield the results:
CombinationName---------CombinationsFound
Coke--------------------------3
Chips|Coke------------------2
Chips--------------------------1
Coke|Sweets----------------1
Interpretation of above results:
Most often, 3 times, a customer only bought coke(s)
Next up, twice a customer bought chips and a coke
Once a customer only bought chips
And once a customer bought a coke & sweets
0 comments:
Post a Comment