Every US state has a DOT website about underrepresented and minority-owned companies also known as
Disadvantaged Business Enteprise. This is the dashboard to visualize these companies by region.
I previously wrote about the ETL process of this dataset in Creating a data pipeline with Azure Data Factory. Here is a summary:
- The US DOT has a website listing all DBE websites by state. But most links are broken and outdated.
- I created a data pipeline using Azure Data Factory, built ETL process using PowerQuery, and joined the output to an Azure SQL Database.
- Below are the details of creating the dashboard to visualize this dataset.
- Added a leading zero in those states like
MAwhere the database did not store the values correctly.
- Removed State and Zip values from the City column.
- Removed the Address column. Such level of detail was not required and it contained too many inconsistencies.
Exploratory data analysis
I queried the database to see the number of rows:
SELECT COUNT(CompanyID) FROM [dbo].[Directory];
Loaded the dataset into Power BI from the Azure SQL Database. Then created a map visualization using the zip code. However, it displayed the usual incorrect countries.
- Created dimension tables for
Geographyto a dataset of zip codes to find the latitude and longitude.
Other transformations in Power Query took a long time, since the columns were not completely clean, and visualizing the data showed inconsistent results.
Extracted a list of categories for the
Category table using Python inside Power Query. Extracted all the words from each company description and ranked the list by count of wor