Power BI Dashboard of Underrepresented Companies
Every US state has a DOT website about underrepresented and minority-owned companies also known as DBE
or Disadvantaged Business Enteprise
. This is the dashboard to visualize these companies by region.
I previously wrote about the ETL process of this dataset in Creating a data pipeline with Azure Data Factory. Here is a summary:
- The US DOT has a website listing all DBE websites by state. But most links are broken and outdated.
- I created a data pipeline using Azure Data Factory, built ETL process using PowerQuery, and joined the output to an Azure SQL Database.
- Below are the details of creating the dashboard to visualize this dataset.
Database schema
First transformations
- Added a leading zero in those states like
MA
where the database did not store the values correctly. - Removed State and Zip values from the City column.
- Removed the Address column. Such level of detail was not required and it contained too many inconsistencies.
Exploratory data analysis
I queried the database to see the number of rows:
SELECT COUNT(CompanyID) FROM [dbo].[Directory];
Result: 25,490
Loaded the dataset into Power BI from the Azure SQL Database. Then created a map visualization using the zip code. However, it displayed the usual incorrect countries.
Star schema
- Created dimension tables for
Geography
,Agency
,Certification Type
,Company
, andCategory
- Joined
Geography
to a dataset of zip codes to find the latitude and longitude.
Other transformations
Other transformations in Power Query took a long time, since the columns were not completely clean, and visualizing the data showed inconsistent results.
Extracted a list of categories for the Category
table using Python inside Power Query. Extracted all the words from each company description and ranked the list by count of wor