In this writing, I will share my journey of learning how to refresh the Power BI semantic model using a Data Pipeline and Notebook in Fabric, with incremental refresh configured.
Sometimes, I need to fully reload the data into the semantic model, which is configured with an incremental refresh for the recent 3 days. This need arises when new columns are added to fact tables or when data from a month ago changes in the source. In these situations, a normal refresh doesn't guarantee accurate data in the Power BI report.
In this blog post, I will explain how I learned to activate both normal refresh and full reload of the semantic model using Fabric data pipeline and notebook.
Fabric Data Pipeline
I have a semantic model called `incremental_refresh_sales_return`, which is configured as follows:
As illustrated below, incremental refresh is configured.
In the Fabric workspace, create a new Data pipeline following the steps outlined below.
If you encounter a blank canvas, click on the button indicated below the screenshot.
Set up the semantic model refresh according to the instructions provided below.
Once the Data pipeline run is completed, the semantic model refresh will display as depicted below, indicating that only the data from the recent 3 days has been refreshed.
Fabric Notebook
As explained in the beginning, sometimes I need to full reload the data.
And the below code in notebook helps to execute full-reload.
import sempy.fabric as fabric
fabric.refresh_dataset(
workspace = 'incremental_refresh',
dataset = 'incremental_refresh_sales_return',
refresh_type = 'full',
apply_refresh_policy = 'false'
)
We can also set up the Schedule Run as demonstrated below. Additionally, I forgot to mention earlier that during the Data pipeline process, you can also configure a scheduled run for the Data pipeline. 😁
As shown below, all partitions are refreshed.
For your reference, the link below provides explanations of some useful PySpark Notebook codes along with their parameters.
In conclusion, this short writing has delved into the intricacies of executing semantic model refreshes using Fabric data pipeline and notebook.
Through the exploration of incremental refresh configurations and the challenges faced when full-reloading data into the semantic model, valuable insights have been gained. Particularly noteworthy is the thorough examination of the full-reload scenario, which arises when changes in data necessitate a comprehensive refresh beyond the scope of incremental updates.
By following the outlined procedures and leveraging the provided PySpark codes, I have gained a deeper understanding of managing semantic model refreshes effectively within the Fabric workspace.
I hope this helps having more fun in learning the configuration of Data Pipeline and writing PySparks in Notebook within the Fabric world.
Comments