How to copy a table from an on-premise server to Azure Data Factory using a self-hosted integration runtime.

This tutorial simulates an on-prem environment by creating two servers in the same network.

  • One has SQL Server
  • The other server has a self-hosted integration runtime (IR) installed to connect to Azure.


  • The on-prem IR server connects to the on-prem SQL server to query tables
  • It copies the tables to Azure Blob Storage through Azure Data Factory using Parquet files.
  • These files are copied to Azure SQL Database for data warehouse.
  • Then Power BI connects to this database to create reporting.

Database example

As seen in Create a SQL Server Windows VM.

Using AdventureWorksDW2019 from this Microsoft doc.

Explore the table to copy to Azure

For this tutorial I am going to copy the table dbo.DimCustomer which has the following schema:

AdventureWorks dbo.DimCustomer

I am only going to keep these columns:

CustomerKey (PK, int)
GeographyKey (FK, int)
CustomerAlternateKey (nvarchar(15))
FirstName (nvarchar(50))
LastName (nvarchar(50))
MaritalStatus (nchar(1))
EmailAddress (nvarchar(50))
YearlyIncome (money)
TotalChildren (tinyint)
HouseOwnerFlag (nchar(1))
NumberCarsOwned (tinyint)
AddressLine1 (nvarchar(120))
Phone (nvarchar(20))
DateFirstPurchase (date)
CommuteDistance (nvarchar(15))

Setup: Self-Hosted Integration Runtime

Follow Azure Data Factory Self-Hosted Integration Runtime to setup two servers that simulate an on-premise private network. You can follow that guide to setup the integration runtime in your actual on-premise network.

Setup: Azure resources for this tutorial

Create these resources under the same subscription, resource group, and location:


Linked Services

Azure Data Factory Linked Services for:

  • On-premise SQL Server
  • Blob Storage
  • Azure SQL Database


Azure Data Factory Datasets for the corresponding Linked Services.

  • On-premise SQL Server
    • In Table name selected DimCustomer
  • Blob Storage (Parquet)
  • Azure SQL Database

Create the table in Azure SQL Database

Create the table in Azure SQL Database to map the table from SQL Server. (Alternatively, when using a Copy data from Blob to Azure SQL, in the Sink set to Auto create table for the first run, however all columns are created with nvarchar)

Get the query used to create the table dbo.DimCustomer in SQL Server

  • Connect to the Server with the Self-Hosted Integration Runtime.
  • Open SSMS
  • Connect to the DB AdventureWorksDW2019
  • Right click the table dbo.DimCustomer
  • Select Script table as / Create to / Clipboard
  • Use Ctrl V to paste the query to Notepad++

Open Azure Data Studio and connect to the Azure SQL Database

Run the query:

    CustomerKey int PRIMARY KEY,
    GeographyKey int NULL,
    CustomerAlternateKey nvarchar(15) NOT NULL,
    FirstName nvarchar(50) NULL,
    LastName nvarchar(50) NULL,
    MaritalStatus nchar(1) NULL,
    EmailAddress nvarchar(50) NULL,
    YearlyIncome money NULL,
    TotalChildren tinyint NULL,
    HouseOwnerFlag nchar(1) NULL,
    NumberCarsOwned tinyint NULL,
    AddressLine1 nvarchar(120) NULL,
    Phone nvarchar(20) NULL,
    DateFirstPurchase date NULL,
    CommuteDistance nvarchar(15) NULL,

Pipeline for the Data Factory

  • In the Data Factory, Author hub
  • On Pipelines click on New pipeline
  • Name it OnpremSQL to AzureSQL

Copy Activity to connect from SQL Server to Blob

  • Under Activities in Move & Transform
  • Drag/drop Copy data to the empty canvas
  • Name it OnpremSQL to Blob

Set the Source

  • Source dataset: The dataset for the SQL Server
  • Use query set to Table
    • Other options: Query, Stored procedure
  • Query timeout (minutes) 120 (default)
    • Timeout for query command execution, default is 120 minutes. If parameter is set for this property, allowed values are timespan, such as "02:00:00" (120 minutes).
  • Isolation level None (default)
    • Specifies the transaction locking behavior for the SQL source. The allowed values are: ReadCommitted, ReadUncommitted, RepeatableRead, Serializable, Snapshot
    • See docs here
  • Partition option None
    • Other options
      • Physical partitions of table
        • When using physical partition, ADF will auto determine the partition column and mechanism based on your physical table definition.
      • Dynamic range
        • When using query with parallel enabled, range partition parameter(?AdfDynamicRangePartitionCondition) is needed. Sample query: SELECT * FROM <TableName> WHERE ?AdfDynamicRangePartitionCondition
  • Click on Preview Data to validate settings before triggering a run or publishing the pipeline
    • It shows an overlay with dbo.DimCustomer

Set the Sink

  • Sink dataset: The dataset for the Blob Storage
  • Copy behavior None
    • Other options: Add dynamic content, Flatten hierarchy, Merge files, Preserve hierarchy
    • Specify behavior when copying files between file systems.
  • Max concurrent connections no value
    • The upper limit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.
  • Block size (MB) no value
    • Specify the block size in MB when writing data to Azure Blob Storage. Allowed