In this blog post, I want to share what I have incorrectly understood in Power BI. I think there are more than 100 things I still do not 100% understand, but at this moment of time, I found two. 😢
My previous thinking about ADDCOLUMNS+SUMMARIZE and SUMMARIZE.
Option 1.
ADDCOLUMNS ( SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category] ), "@QTY", CALCULATE ( SUM ( Sales[Quantity] ) ) )
and
Option 2.
SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category], "@QTY", SUM ( Sales[Quantity] ) )
I always used ADDCOLUMN function together with SUMMARIZE. I cannot remember the reason, but I read from a book that only using SUMMARIZE sometimes creates unexpected result, and recommendation was always try using with ADDCOLUMNS and create context transition.
Until today, when I used ADDCOLUMNS together with SUMMARIZE, I did not have any performance issue. Perhaps it was because I did not use extremely large dataset. My sample dataset that I used today is not that big but it showed the significant size difference that I did not expect.
My previous thinking about Calculated Table VS. Calculated column
I sometimes compare Calculated column VS. Measure, and then I select an efficient way from the performance point of view. This time, I tried to compare the below two when creating a calculated table.
Option 1.
SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category], "@QTY", SUM ( Sales[Quantity] ) )
Option 2.
SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category] )
and then, add a calculated column separately.
QTY CC = VAR _currentmonthyear = 'Add onebyone NewTable'[Month-Year] VAR _currentcategory = 'Add onebyone NewTable'[Category] RETURN CALCULATE ( SUM ( Sales[Quantity] ), 'Calendar'[Month-Year] = _currentmonthyear, Category[Category] = _currentcategory ) If I needed to create a calculated table, I always tried to create it in one-go. I thought this was having faster performance than creating new columns one by one separately. In this blog post, I will share what is the result of each table's size by using my simple dataset.
In this writing, I used the below sample, that contains quantity information by category and by date. The data has five categories (A, B, C, D, E), and it has a date range from 1910/01/01 to 2021/12/31.
A fact table (Sales) looks like below. (total 204,540 rows)
Based on the fact table (Sales), I created a simple data model like below.
And then, I created two calculated tables like below.
OneGo SummaryTable = ADDCOLUMNS ( SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category] ), "@QTY", CALCULATE ( SUM ( Sales[Quantity] ) ) )
OneGo SummaryTable V2 = SUMMARIZE ( Sales, 'Calendar'[Month-Year], 'Calendar'[End of Month], Category[Category], "@QTY", SUM ( Sales[Quantity] ) )
In DAX Studio, the size of each table looks like below.
Size wise, without using ADDCOLUMNS wins.
I also wanted to compare between, creating calculated table by one-go and creating summarized calculated table + adding a new column separately, like below.
QTY CC (Calculated Column) =
VAR _currentmonthyear = 'Add onebyone NewTable'[Month-Year]
VAR _currentcategory = 'Add onebyone NewTable'[Category]
RETURN
CALCULATE (
SUM ( Sales[Quantity] ),
'Calendar'[Month-Year] = _currentmonthyear,
Category[Category] = _currentcategory
)
Adding one by one separately VS. OneGoSummaryTable V2 -> It shows the same size.
To summarize, at least in a dataset that looks like the sample,
firstly, try not to use ADDCOLUMNS with SUMMARIZE. And then, must check the result whether it shows correct answers. Secondly, when creating a calculated table, adding new columns separately does not affect badly to the size of a table.
I hope this short writing helps motivating and having fun in testing whatever PowerBI concepts you already know.
Addings: after a few more testing, I could see the below information from DAX Studio.
When I keep adding a new table with using the exactly same DAX formula, everything becomes to have the same size, except the last one that I create. I am not sure whether the size of the calculated table changes or not, but it seems like ADDCOLUMNS + SUMMARIZE VS. Only SUMMARIZE shows no winner from the perspective of size.
Comments