Every US state has a DOT website about underrepresented and minority-owned companies also known as DBE or Disadvantaged Business Enteprise. This is the dashboard to visualize these companies by region.

Power BI Dashboard Underrepresented Companies

I previously wrote about the ETL process of this dataset in Creating a data pipeline with Azure Data Factory. Here is a summary:

  • The US DOT has a website listing all DBE websites by state. But most links are broken and outdated.
  • I created a data pipeline using Azure Data Factory, built ETL process using PowerQuery, and joined the output to an Azure SQL Database.
  • Below are the details of creating the dashboard to visualize this dataset.

Database schema

Power BI Dashboard Underrepresented Companies DB Schema

First transformations

  • Added a leading zero in those states like MA where the database did not store the values correctly.
  • Removed State and Zip values from the City column.
  • Removed the Address column. Such level of detail was not required and it contained too many inconsistencies.

Power BI Dashboard Underrepresented Companies Address

Exploratory data analysis

I queried the database to see the number of rows:

SELECT COUNT(CompanyID) FROM [dbo].[Directory];

Result: 25,490

Loaded the dataset into Power BI from the Azure SQL Database. Then created a map visualization using the zip code. However, it displayed the usual incorrect countries.

Star schema

  • Created dimension tables for Geography, Agency, Certification Type, Company, and Category
  • Joined Geography to a dataset of zip codes to find the latitude and longitude.

Power BI Dashboard Underrepresented Companies Star Schema

Other transformations

Other transformations in Power Query took a long time, since the columns were not completely clean, and visualizing the data showed inconsistent results.

Extracted a list of categories for the Category table using Python inside Power Query. Extracted all the words from each company description and ranked the list by count of wor