top of page
Jihwan Kim

Cohort Analysis in Power BI, by telling the story.


In this writing, I want to share how I created my own story and how I used the sample data when processing the Cohort Analysis, in Power BI.

I got the sample data from EnterpriseDNA.


My story might not be correct because I did not have a conversation with the data owner. If I could have a conversation, I might create a better story.

Anyway, let the story begin.

The embedded report is shown below, and you can enlarge the report by clicking the extend-arrow-icon on the bottom-right-side of the embedded report.




Logic and Story in this analysis report.

In the WebsiteData Table (fact-table), each row indicates a sales order number. It means that each row represents sales activity. However, in the same table, if the interaction-column indicates "cancel", it means sales activity did not happen.

Each customer has a "first login date" and customers can be grouped by the month of "first login date".

Each customer log-ins one or more times in a certain month, and at the same time, sometimes cancels one or more times in that particular month.


Step 1. Count customers in each group -> It is shown in the zero-month-number-column in the below table in the first figure.

Step 2. Count customers in each group's 1st month whose log-in counts are more than cancel counts. In this report, customers who have the same log-in counts as cancel counts were not counted in the measure.

Step 3. Count customers in each group's 2nd month ~ 25th month whose log-in counts are more than cancel counts.

Step 4. Describe this calculation from a ratio perspective as well.



The story starts with a question.

Why first login customer count is decreasing as times go by?


Well, before answering this, let's see how the ratio changes as times go by.

Click the button, "Customers Ratio". (You can interact with the embedded report that is shown at the beginning of this post)




The information below describes by matrix chart and line chart about what is the ratio of how many customers are coming back as times go by.

As expected from the previous page, each cohort group shows a relatively similar pattern.

Then, I wanted to know how is the trend in the number of orders as times go by? As seen in the right-top side trend line, the order count did not drop. In this case, I can assume that even though there were no more new customers coming into the business, the business kept running.


Going back to the question to answer, "Why first login customer count is decreasing as times go by", should I come up with an answer "the business currently focusing on developing existing customers to keep running the business, but it has the risk which is ignoring to develop new customers"? Or, should I suggest changing the definition of first login customers and re-analyze? And how to redefine it? If I redefine the first login customer, the data has to be re-analyzed to provide proper insights.

Click the button, "Count Reanalyze", to see the amended calculation. (You can interact with the embedded report that is shown at the beginning of this post)




I want to redefine first login customers as new customers' login. And, I want to define new customers as those who did not login for the last two months. For instance, if a customer did login three months ago, but did not login until yesterday, and if the customer logins today, I want to consider this customer as a new customer. Of course, someone might say these are revisiting customers and the month number in the column on the cohort table already describes the revisiting behavior of the cohort group.

However, in some business cases, for instance in the freight forwarding and logistics service providing business, counting returning customers together with new customers, newly grouping them together, and seeing their monthly ordering behavior as times go by are similarly critical with only grouping purely-new-customers.


Let's start to learn the data by re-analyzing it.

I added one more column into each customer's detail information table, and that describes how many new logins were made by each customer. In other words, how many new login behaviors, that happened more than two months later from the previous login, are shown per customer? Click the button, "Ratio Reanalyze" to see the ratio and trend. (You can interact with the embedded report that is shown at the beginning of this post)




Seeing the re-analyzed ratio information by matrix chart and line chart, I cannot tell the difference between before reanalyzing the data and this one. However, I can say that before re-analyzing the data, later part of cohort groups did not show a sufficient number and it was hard to tell that those numbers could represent each cohort group.

Now, each cohort group's number is sufficient to analyze and tell the story, and I can come up with the conclusion below.

All cohort groups describe that the ratio drops to lower than 20% after the first two months.

After having the new customers, whether or not those are purely-new-customers or after-two-month-returning-customers, the company should have a marketing campaign to influence new customers to login more within the first two-months-time-period. And the target should be set up after researching and benchmarking the competitors.




(You can interact with the embedded report that is shown at the beginning of this post)




I hope this helps to have ideas on how to proceed in creating a story when doing cohort analysis.


And for your information, the two most important calculated measures that I have written in this Power BI report are shown below. You can also check those in the pbix file that is attached to this blog post.



Cohort customers count months later = VAR

firstlogincustomerslist = CALCULATETABLE ( VALUES ( Customers[Customer Index] ), TREATAS ( VALUES ( Dates[Date] ), Customers[First_LoginDate] ) ) VAR

onmonth = [Months Value] VAR

monthslatercustomerslist = SUMMARIZE ( FILTER ( ADDCOLUMNS ( CALCULATETABLE ( VALUES ( WebsiteData[Customer Name Index] ), FILTER ( ALL ( Dates ), Dates[MonthOffset] = MAX ( Dates[MonthOffset] ) + onmonth-1 ) ), "cancelcount", [Cancel count], "logincount", [login count] ), [cancelcount] < [logincount] ), Websitedata[Customer Name Index] ) VAR

customerslistintersect = INTERSECT ( firstlogincustomerslist, monthslatercustomerslist )

RETURN IF( [Months Value] = 0, [customers count by first login date], COUNTROWS ( customerslistintersect ))



Cohort customers count months later amended = VAR

thismonthcustomers = VALUES ( WebsiteData[Customer Name Index] ) VAR

lastmonthcustomers = CALCULATETABLE ( VALUES ( WebsiteData[Customer Name Index] ), DATEADD ( Dates[Date], -1, MONTH ) ) VAR

twomonthsagocustomers = CALCULATETABLE ( VALUES ( WebsiteData[Customer Name Index] ), DATEADD ( Dates[Date], -2, MONTH ) ) VAR

newcustomerslist = EXCEPT ( EXCEPT ( thismonthcustomers, lastmonthcustomers ), twomonthsagocustomers ) VAR

onmonth = [Months Value] VAR

monthslatercustomerslist = SUMMARIZE ( FILTER ( ADDCOLUMNS ( CALCULATETABLE ( VALUES ( WebsiteData[Customer Name Index] ), FILTER ( ALL ( Dates ), Dates[MonthOffset] = MAX ( Dates[MonthOffset] ) + onmonth-1 ) ), "cancelcount", [Cancel count], "logincount", [login count] ), [cancelcount] < [logincount] ), Websitedata[Customer Name Index] ) VAR

customerslistintersect = INTERSECT ( newcustomerslist, monthslatercustomerslist )

RETURN IF( [Months Value] = 0, [new customers count], COUNTROWS ( customerslistintersect ))

254 views0 comments

Comments


bottom of page