top of page

INFO.DAX + Fabric Notebooks Find Unused Columns

  • Jihwan Kim
  • 2 days ago
  • 4 min read

Measure Bloat You Can’t See—But Still Pay For


Introduction

In this writing, I like to share how I learned to uncover hidden VertiPaq memory bloat by letting the INFO.DAX family do the detective work inside a Fabric Notebook.


Important caveat: the technique below scans one published semantic model & report only. If the sematic model feeds different reports that have different purpose—or if a self-service user builds a new report tomorrow—those measures might rely on columns I flag as “unused.” Always verify cross-report impact before dropping anything.



1 Scenario Framing — “The Column Nobody Asked For”

I create a Semantic Model and Report, publish the model and report, and six months later half of those columns in the fact table never used in DAX measures or calculated columns or participate in a relationship—yet they still occupy RAM.

VertiPaq compresses everything, but high-cardinality text columns can add dozens of MB each. Without a dependency-and-relationship scan, you won’t know which ones are safe to delete.



2 Set-up: Sample Model, Report & Workspace

Build and publish the sample semantic model & report

  1. Power BI Desktop

    • Ensure SalesAmount is decimal number, Date is date, leave the two text columns in the fact table as string.

    • Load, then model a simple star schema.

  2. Create one quick report.

  3. Publish to the Fabric workspace I’ll use for the Notebook.

Now I have: a semantic model named Sales Demo, a report named Sales Demo, and a workspace that supports Fabric Notebooks.


Sales: = 
    SUM(fact_sales[Sales])
Sales Previous Month: = 
CALCULATE(
    [Sales:],
    OFFSET(
        -1,
        ALL( dim_calendar[Year-Month], dim_calendar[Year-Month sort] ),
        ORDERBY( dim_calendar[Year-Month sort], ASC )
    )
)
MoM %: = 
var _sales = [Sales:]
var _prev = [Sales Previous Month:]
RETURN
    DIVIDE(_sales-_prev, _prev)




3 Fabric Notebook Walk-through

Below is a Fabric Notebook that I used.


▶️ Get the information of the semantic model, whether the columns are used or not.

from sempy import fabric
import pandas as pd

# Execute DAX query on the semantic model
result = fabric.evaluate_dax(
    dataset="Sales Demo",   # semantic-model (dataset) name
    dax_string="""
EVALUATE
VAR _table =
    SELECTCOLUMNS ( INFO.TABLES (), "Table ID", [ID], "Table Name", [Name] )
VAR _column =
    SUMMARIZE (
        ADDCOLUMNS (
            SELECTCOLUMNS (
                INFO.COLUMNS (),
                "Table ID column", [TableID],
                "explicitname", [ExplicitName],
                "inferredname", [InferredName]
            ),
            "Column Name", IF ( NOT ISBLANK ( [explicitname] ), [explicitname], [inferredname] )
        ),
        [Table ID column],
        [Column Name]
    )
VAR _tableandcolumn =
    SUMMARIZE (
        GENERATE ( _table, FILTER ( _column, [Table ID column] = [Table ID] ) ),
        [Table Name],
        [Column Name]
    )
VAR _used =
    SELECTCOLUMNS (
        FILTER (
            INFO.CALCDEPENDENCY (),
            [REFERENCED_OBJECT_TYPE] IN { "COLUMN", "CALC_TABLE", "CALC_COLUMN" }
        ),
        "Type", [OBJECT_TYPE],
        "Table Name used", [REFERENCED_TABLE],
        "Column Name used", [REFERENCED_OBJECT]
    )
RETURN
    ADDCOLUMNS (
        _tableandcolumn,
        "Used",
            CONCATENATEX (
                FILTER (
                    _used,
                    [Table Name used] = [Table Name]
                        && [Column Name used] = [Column Name]
                ),
                [Type],
                ", "
            )
    )

"""
)
# Convert the results to a Pandas DataFrame
result_df = pd.DataFrame(result)

display(result_df)



▶️ Get the information of size of columns.


DATASET = "Sales Demo"             # ← replace if needed
vpq_stats = fabric.list_columns(dataset=DATASET, extended=True)
vpq_stats_df = pd.DataFrame(vpq_stats)

def clean_data(vpq_stats_df):
    # Select columns: 'Table Name', 'Column Name' and 3 other columns
    vpq_stats_df = vpq_stats_df.loc[:, ['Table Name', 'Column Name', 'Column Cardinality', 'Total Size', 'Data Size']]
    # Rename column 'Table Name' to '[Table Name]'
    vpq_stats_df = vpq_stats_df.rename(columns={'Table Name': '[Table Name]'})
    # Rename column 'Column Name' to '[Column Name]'
    vpq_stats_df = vpq_stats_df.rename(columns={'Column Name': '[Column Name]'})
    return vpq_stats_df

vpq_stats_df_clean = clean_data(vpq_stats_df.copy())
display(vpq_stats_df_clean)


▶️ Merge two results to have the result in one display.

# ── Left-join on table + column ──────────────────────────────────────
merged_df = (
    result_df.merge(
        vpq_stats_df_clean,
        how="left",
        on=["[Table Name]", "[Column Name]"],   # join keys (include brackets)
        suffixes=("", "_vpq")                  # avoid name clashes
    )
    [["[Table Name]", "[Column Name]", "[Used]",
      "Column Cardinality", "Total Size", "Data Size"]]  # final order
)

# ── 2. Inspect the first few rows ───────────────────────────────────────
print("=== Column usage + VertiPaq stats ===")
display(merged_df)



4 Results Snapshot — What This Tells Me

Dimension tables — Used = <NA>

Even if a dimension column looks unused in this result, chances are high that in visualizations and in other self-service reports rely on it for slicers or tooltip fields. Treat these columns as “might be in use elsewhere” and validate across every workspace before even considering removal.


Fact table — Used = <NA>

In the fact table, the majority of <NA> columns truly never make it into visuals. After confirming no other reports touch them, columns like CustomerEmail and InternalNote become prime candidates for deletion. In my demo model, removing just those two saves 5120763 B + 7004217 B ≈ 12 MB of in-memory storage—an instant VertiPaq win without sacrificing any analytical capability.


Key takeaway: Drop only when I am 100 % certain the column isn’t delivering value anywhere in my BI estate.




INFO.DAX isn’t just documentation; it’s a performance precision blade.

I hope this helps having fun in using INFO.DAX plus Fabric Notebooks to spot and squash VertiPaq bloat—safely and repeatably—before it bites performance or your capacity bill.





Call to Action

Clone the notebook, point it at your heaviest model, and share how many MB you reclaim




References


Comentarios


bottom of page