top of page
Jihwan Kim

Worldwide Trading Export & Import values (USD) & volumes (kg) analysis, in Power BI


In this writing, I want to share how I treated null data when extracting, transforming, and loading data into Power BI.


I got the data from resourcetrade.earth and comtrade.un.org. The story of this data tells what is the size of each trading route, each exporter, and each importer. Furthermore, it also describes how the size changes as the year flow. The size is shown by USD value and weight (kg). The period is from 2000 to 2019.


As you can also find out from the above data source, the raw data was quite clean and I thought there were not many things to do in Power Query Editor. However, it took some time to wait for my computer to process the operations. Because when I combined all the years of data, it showed over 1.3 million rows only in the fact table.

From the fact table, I duplicated it and created some dimension-tables.


Here, I have one question. If I duplicate a 1.3 million-row-fact-table, and delete all columns except the one that I want to make as dim-column, and delete the duplicate values inside the column,

in this case, I can have 250-row-one-column-dimension-table, for instance. But the operating time takes quite a long to create this dimension table. In this case, how should I create a dimension-table? Am I doing correctly? Or, should I not create it?

I highly appreciate it if someone can give me an advice.


Anyway, moving forward, inside the fact table, there were two columns that I needed to focus on. One was the exporter column and the other was the importer column. When I scrolled down a bit, I saw that some showed null value. I was questioning myself that how can a "null-country" export to country A? Then, I thought perhaps the export value might be zero. But it was not. I was also questioning myself that how can country B export to a "null-country"? And, this route also showed a non-zero value in the trading amount column.


In this case, it was clear that I could not just filter out the rows that contained null values in the exporter or importer column. Because if I filter out, then the total amount that describes trade route, export, and import would be incorrect. So, I decided not to filter out but to load it into the Power BI data model.


And the next problem happened when I decided to show the top 5 routes by USD value and KG weight. Rank no.3 showed the route of country A to "null-country". From the perspective of trading-route-ranking, this was incorrect.


Because from country A to "null-country" meant the same as from country A to "all-country" (could not know whether or not it was one route or many routes). And this could not be compared with "from country C to country H" (one route), for instance.

So, I decided to write a measure that excludes values of null routes for the ranking-visualization, but includes values of null routes for the card visualization.




Once the concept is understood and shared with other stakeholders, it is not difficult to write DAX measures to show the information in this way.

It just needed to be double confirmed by myself and by other stakeholders about how to understand the null values in the exporter column and importer column.



I hope this helps to remind me and other Power Bi authors to consider one more time how to deal with a null value. Am I going to delete those? Or do I want to keep those? And I should have a concrete reason for the choice.



I publish to web to show my worldwide trading analysis report below.




I attach the pbix file to this blog post.


42 views0 comments

Comentarios


bottom of page