top of page

Cohort Analysis in Power BI: Mastering WINDOW and PARTITIONBY

  • Jihwan Kim
  • 11 minutes ago
  • 6 min read


By Jihwan Kim

In this writing, I like to share how I learned to tackle one of the more insightful business analyses – cohort analysis, specifically for tracking repeat customer purchases – using some truly powerful DAX capabilities within Power BI. If you're like me, constantly pushing the boundaries of what Power BI can do, you know that understanding customer behavior over time is gold.

Traditionally, crafting these analyses in DAX could become quite complex, sometimes leading to verbose measures that are hard to maintain and, critically, can impact performance in large-scale semantic models. But as I explored deeper, I found a more elegant, more powerful way to define these dynamic groupings and calculations.

This centers around the WINDOW function, especially when paired with its powerful sub-function PARTITIONBY.


While often discussed for time intelligence patterns like Year-To-Date, their true potential shines when applied to more nuanced analytical scenarios. Today, we're diving deep into how these functions can revolutionize the approach to cohort analysis. I'll build a solution from the ground up, showcasing the clarity and benefits this modern DAX approach offers.



The Scenario: Tracking Repeat Purchase Rate & Running totals by Acquisition Cohort

Imagine a common enterprise scenario: We want to understand how well we retain customers by analyzing their repeat purchase behavior.

Specifically, we want to know: for a cohort of customers acquired in a particular month (e.g., all customers who made their first purchase in January 2024), what percentage of them make another purchase in their first month, second month, third month, and so on, relative to their acquisition date?


This requires us to:

  1. Define each customer's acquisition cohort (e.g., "2024 Jan," "2024 Feb").

  2. For each subsequent purchase, determine which "lifecycle month" it falls into (Month 1, Month 2, etc., post-acquisition).

  3. Calculate the repeat purchase rate for each cohort over these lifecycle months.

  4. Calculate the running totals of sales for each cohort over these lifecycle months.


This is precisely the kind of dynamic, partitioned calculation where WINDOW and PARTITIONBY excel.



Deep Dive: Crafting the Cohort Analysis with Modern DAX

Let's break down how I'd construct the measures to achieve this, highlighting the roles of WINDOW and PARTITIONBY.


Sample Data Model

Before jumping into DAX, let's look at the data model, based on the structure I've used:

  • dimCustomer: Contains customer-specific information.

    • CustomerKey (Unique Key)


  • dimDate: A standard date dimension table.

    • DateKey (Unique Key)

    • Date (The actual date value)

    • CalendarYear

    • MonthName (e.g., "January", "February"). This column is sorted by [MonthNumberOfYear] for correct chronological display.

    • MonthNumberOfYear (1 for January, 2 for February, etc.)

    • YearMonth (e.g., "2024 Jan" or 202401). This column is sorted by [EndOfMonth] to ensure proper chronological order in visuals.

    • EndOfMonth (The last date of the month)


  • factSales: Contains transactional data. It has relationships to dimCustomer (via CustomerKey) and dimDate (via OrderDateKey).

    • SalesOrderNumber (Identifier for the sales order)

    • CustomerKey (Foreign Key to dimCustomer)

    • OrderDateKey (Foreign Key to dimDate, representing the date of the sale)

    • SalesAmount (The value of the sales transaction)




This structure provides the necessary foundation for the cohort analysis. The sorting information for dimDate is crucial for ensuring my reports display months and year-month combinations in the correct sequence.



Step 1: Defining the Customer Cohort

First, I need to assign each customer to a cohort based on their DateFirstPurchase. I can create a calculated column in dimCustomer for this:


CohortMonthYear = 
VAR _firstorderdatekey =
    MINX ( RELATEDTABLE ( factSales ), factSales[OrderDateKey] )
VAR _yearmonth =
    MAXX (
        FILTER ( dimDate, dimDate[DateKey] = _firstorderdatekey ),
        dimDate[YearMonth]
    )
RETURN
    _yearmonth
// Calculated Column in dimCustomer
CohortMonthYearSort = 
VAR _firstorderdatekey =
    MINX ( RELATEDTABLE ( factSales ), factSales[OrderDateKey] )
VAR _EndofMonth =
    MAXX (
        FILTER ( dimDate, dimDate[DateKey] = _firstorderdatekey ),
        dimDate[EndofMonth]
    )
RETURN
    _EndofMonth
// Calculated Column in dimCustomer



This gives us a text representation like "2024 Jan". For sorting in visuals, I would typically rely on a corresponding numeric column or the sort order defined in the dimDate table.



Step 2: The Core Logic – Repeat Purchase Rate Measure

Now, for the main event. I'll build a single measure, Repeat Purchase Rate.

This measure will calculate the "transactional cohort age" (Month 1, Month 2, etc.) and then the rate itself.


▶️Create a helper table, 'Lifecycle Months'



Here’s how I'll approach the components of the rate:

  • Denominator (Original Cohort Size): For any given CohortMonthYear, this is the distinct count of customers who belong to that cohort.

  • Numerator (Repeat Purchasers in Month N): The distinct count of customers from a specific CohortMonthYear who made at least one purchase in their subsequent 'Month N'.


▶️The filter context on the report looks something like,

CohortMonthYear

Month 1

Month 2

Month 3

...

2024 Jan

15.2%

12.5%

10.1%

...

2024 Feb

16.0%

13.1%

11.5%

...

...

...

...

...

...


And, let's build the Repeat Purchase Rate measure:

Repeat Purchase Rate = 
VAR CurrentCohortMonthYear =
    SELECTEDVALUE ( dimCustomer[CohortMonthYear] )
VAR CurrentLifecycleMonthsort =
    SELECTEDVALUE ( 'Lifecycle Months'[Sort] ) 

    // Calculate the Denominator: Size of the original cohort
VAR OriginalCohortSize =
    COUNTROWS ( DISTINCT ( dimCustomer[CustomerKey] ) ) 
    
    // Calculate the Numerator: Distinct count of customers from this cohort who purchased in this specific lifecycle month
VAR _t =
    SUMMARIZE (
        FILTER (
            SUMMARIZE (
                factSales,
                dimCustomer[CustomerKey],
                dimCustomer[CohortMonthYearSort],
                dimDate[EndofMonth]
            ),
            YEAR ( dimDate[EndofMonth] ) * 12
                + MONTH ( dimDate[EndofMonth] )
                - (
                    YEAR ( dimCustomer[CohortMonthYearSort] ) * 12
                        + MONTH ( dimCustomer[CohortMonthYearSort] )
                ) + 1 = CurrentLifecycleMonthsort
        ),
        dimCustomer[CustomerKey]
    )
VAR RepeatPurchasersNumerator =
    COUNTROWS ( _t )
RETURN
    DIVIDE ( RepeatPurchasersNumerator, OriginalCohortSize )

Visualizing the Cohort Matrix

The Repeat Purchase Rate measure can now populate a matrix visual in Power BI:

  • Rows: dimCustomer[CohortMonthYear]

  • Columns: 'Lifecycle Months'[Lifecycle Month Number] (from the helper table I created).

  • Values: [Repeat Purchase Rate] (formatted as a percentage).


Clarifying the WINDOW and PARTITIONBY Role for Advanced Scenarios:

While the DAX above calculates the specific rate by leveraging context manipulation and filtering, WINDOW functions, in conjunction with PARTITIONBY, become exceptionally powerful in cohort analysis when you need to perform calculations within these dynamic groups across their lifecycle.


Calculate running totals within each cohort's lifecycle: For example, the cumulative spend of a cohort by Month N.

Sales = 
VAR CurrentLifecycleMonthsort =
    SELECTEDVALUE ( 'Lifecycle Months'[Sort] )
VAR _t =
    FILTER (
        SUMMARIZE (
            factSales,
            factSales[SalesAmount],
            dimCustomer[CohortMonthYearSort],
            dimDate[EndofMonth]
        ),
        YEAR ( dimDate[EndofMonth] ) * 12
            + MONTH ( dimDate[EndofMonth] )
            - (
                YEAR ( dimCustomer[CohortMonthYearSort] ) * 12
                    + MONTH ( dimCustomer[CohortMonthYearSort] )
            ) + 1 = CurrentLifecycleMonthsort
    )
RETURN
    SUMX ( _t, factSales[SalesAmount] )
Cumulative Sales by Cohort Month N = 
VAR _t =
    ADDCOLUMNS (
        CROSSJOIN (
            ALLSELECTED ( dimCustomer[CohortMonthYear], dimCustomer[CohortMonthYearSort] ),
            ALLSELECTED (
                'Lifecycle Months'[Lifecycle Month Number],
                'Lifecycle Months'[Sort]
            )
        ),
        "@sales", [Sales]
    )
VAR _cumulate =
    WINDOW (
        1,
        ABS,
        0,
        REL,
        _t,
        ORDERBY ( 'Lifecycle Months'[Sort], ASC ),
        ,
        PARTITIONBY ( dimCustomer[CohortMonthYearSort] )
    )
RETURN
    IF ( [Sales], SUMX ( _cumulate, [@sales] ) )

  • Here, PARTITIONBY(dimCustomer[CohortMonthYearSort]]) ensures the WINDOW calculation (like a running sum) restarts for each distinct cohort, progressing through their lifecycle months as defined by ORDERBY.

  • Complex relative period comparisons within cohorts.




For the defined "Repeat Purchase Rate," the crucial parts are the accurate definition of CohortMonthYear, the LifecycleMonthForSale, and then distinct counts.

The PARTITIONBY concept is fundamental to the logic – I almost always process data per cohort.


▶️Summary page of the report



Beyond the Basics

What I've learned is a foundational approach to cohort analysis for repeat purchase rates. The real power of thinking in terms of WINDOW and PARTITIONBY is that it trains my DAX mind to handle context transitions and partitioned calculations with much more intuition and control, especially as scenarios get more complex.

Consider these points:

  • Performance in Large Models: While our specific rate measure relied on more traditional filtering for its final form, if you find yourself needing to pre-aggregate or create helper tables for cohort lifecycle calculations in very large models, WINDOW functions within SUMMARIZECOLUMNS or ADDCOLUMNS during data preparation can be more performant than complex iterative DAX in measures for certain patterns. Always test against your specific model and data.

  • Flexibility: Once you master WINDOW's parameters (from, fromType, to, toType, relation, orderBy, partitionBy), you unlock incredible flexibility. Need to compare to the previous lifecycle month within the cohort? Or a 3-month rolling repeat rate? WINDOW makes these far more approachable.

  • Clarity for Complex Logic: For calculations like "cumulative active users in cohort" or "average sales per active customer within cohort month N," WINDOW often leads to DAX that is easier to read and debug than deeply nested FILTER contexts, once you're familiar with its syntax.



The key is to not see WINDOW as just for cumulative calculation. It's a generic engine for defining dynamic row sets within specified partitions. Cohort analysis, by its very nature, is about partitioning (by cohort) and then observing behavior over a related window (the lifecycle months).



Conclusion: Embrace the Power of Windowing

Mastering functions like WINDOW and understanding how PARTITIONBY gives fine-grained control over calculation contexts are essential skills for any Power BI professional looking to deliver deep, actionable insights from complex datasets. While today I focused on a specific cohort retention metric, the approach can be adapted to other cohort-based analyses or any scenario requiring dynamic, partitioned calculations.

The elegance and potential performance gains make these functions a critical part of my advanced DAX toolkit.


I hope this helps having even more fun in Power BI, especially when tackling these kinds of advanced analytical puzzles!




References

Comments


bottom of page