How to copy a table from an on-premise server to Azure Data Factory using a self-hosted integration runtime.
This tutorial simulates an on-prem environment by creating two servers in the same network.
- One has SQL Server
- The other server has a self-hosted integration runtime (IR) installed to connect to Azure.
- The on-prem IR server connects to the on-prem SQL server to query tables
- It copies the tables to Azure Blob Storage through Azure Data Factory using Parquet files.
- These files are copied to Azure SQL Database for data warehouse.
- Then Power BI connects to this database to create reporting.
As seen in Create a SQL Server Windows VM.
AdventureWorksDW2019 from this Microsoft doc.
Explore the table to copy to Azure
For this tutorial I am going to copy the table
dbo.DimCustomer which has the following schema:
I am only going to keep these columns:
CustomerKey (PK, int) GeographyKey (FK, int) CustomerAlternateKey (nvarchar(15)) FirstName (nvarchar(50)) LastName (nvarchar(50)) MaritalStatus (nchar(1)) EmailAddress (nvarchar(50)) YearlyIncome (money) TotalChildren (tinyint) HouseOwnerFlag (nchar(1)) NumberCarsOwned (tinyint) AddressLine1 (nvarchar(120)) Phone (nvarchar(20)) DateFirstPurchase (date) CommuteDistance (nvarchar(15))
Setup: Self-Hosted Integration Runtime
Follow Azure Data Factory Self-Hosted Integration Runtime to setup two servers that simulate an on-premise private network. You can follow that guide to setup the integration runtime in your actual on-premise network.
- Create a resource group, network security group, and virtual network
- Create a Windows Server SQL Server VM connected to this Vnet
- Create a Windows Server VM connected to the same Vnet
- Install the Self-Hosted Integration Runtime to the Windows Server VM
- Create a Data Factory and create a Self-Hosted Integration Runtime
- Connect the Windows Server VM to this Integration Runtime
Setup: Azure resources for this tutorial
Create these resources under the same subscription, resource group, and location: