top of page
  • Jihwan Kim

Implementing Fabric Pipeline to Ingest Data into Data Warehouse with Integration of Lakehouse Notebook Prior to Execution

In this writing, I want to share how to basic-set-up a Pipeline in Fabric workspace to bring data into the Warehouse, along with integrating a Lakehouse Notebook, before hitting the "Run" button.


Why bother bring data from Lakehouse to Warehouse? Well, there are a few reasons. Maybe the Lakehouse has some limitations that the Warehouse doesn't, like being read-only with T-SQL. Or maybe you just prefer using SQL over Python. Sure, you can try to create Shortcuts as well, but I'll save that for another blog post.


Right now, let's focus on the basics: getting data into the Data Warehouse using a Pipeline.


Imagine this: I've set up a Lakehouse, and I've just crafted a notebook to perform some transformation. Specifically, I'm sculpting a new fact table that exclusively showcases product p_008. I've even given the new table a name: "p_008_only". But since I haven't hit that "Run" button yet, there's no shiny new delta table yet.


I've established a new data warehouse and now I'm kicking off the development of a data pipeline.


As depicted below, I begin with a blank canvas, and I select an existing notebook that is created above.



The subsequent action involves adding the "Copy data" activity.


The source is the new table in the Lakehouse (sales_lakehouse). However, since I haven't clicked the Run button in the notebook yet, the new table isn't available in the Lakehouse. Luckily, I know the name of the new table, so I input it exactly as shown below.


Input the destination information as depicted below.


In the pipeline, click on the Run button and patiently wait for the green light to illuminate, indicating completion for all the steps.


Once the process is complete, the newly created table appears in the data warehouse, as illustrated below.


Moreover, the new table is also generated in the Lakehouse, as depicted below. It's worth noting that despite not manually triggering the notebook's execution by clicking the Run button, the pipeline's action activates it automatically.


Summary:

Here, I tried to configure a data pipeline within the Fabric workspace, aimed at ingesting data into the Warehouse while integrating a Lakehouse Notebook. I explored the process step by step, from setting up the pipeline to adding activities like the "Copy data" task. Despite not executing the notebook in Lakehouse manually, the pipeline's action seamlessly activated it, leading to the creation of the new table in both the data warehouse and the Lakehouse.


Through this, I have learned the automation and integration in data pipeline. By leveraging pipelines within the workspace, I can streamline processes and ensure the smooth flow of data from source to destination.


I hope this helps having more fun in learning data pipeline in Microsoft Fabric.







105 views0 comments

Kommentare


bottom of page