How to copy a table from an on-premise server to Azure Data Factory using a self-hosted integration runtime.

This tutorial simulates an on-prem environment by creating two servers in the same network.

  • One has SQL Server
  • The other server has a self-hosted integration runtime (IR) installed to connect to Azure.

Workflow:

  • The on-prem IR server connects to the on-prem SQL server to query tables
  • It copies the tables to Azure Blob Storage through Azure Data Factory using Parquet files.
  • These files are copied to Azure SQL Database for data warehouse.
  • Then Power BI connects to this database to create reporting.

Database example

As seen in Create a SQL Server Windows VM.

Using AdventureWorksDW2019 from this Microsoft doc.

Explore the table to copy to Azure

For this tutorial I am going to copy the table dbo.DimCustomer which has the following schema:

AdventureWorks dbo.DimCustomer

I am only going to keep these columns:

CustomerKey (PK, int)
GeographyKey (FK, int)
CustomerAlternateKey (nvarchar(15))
FirstName (nvarchar(50))
LastName (nvarchar(50))
MaritalStatus (nchar(1))
EmailAddress (nvarchar(50))
YearlyIncome (money)
TotalChildren (tinyint)
HouseOwnerFlag (nchar(1))
NumberCarsOwned (tinyint)
AddressLine1 (nvarchar(120))
Phone (nvarchar(20))
DateFirstPurchase (date)
CommuteDistance (nvarchar(15))

Setup: Self-Hosted Integration Runtime

Follow Azure Data Factory Self-Hosted Integration Runtime to setup two servers that simulate an on-premise private network. You can follow that guide to setup the integration runtime in your actual on-premise network.

Setup: Azure resources for this tutorial

Create these resources under the same subscription, resource group, and location:

Resources