Azure Data Factory Self-Hosted Integration Runtime
Connect an on-prem SQL Server to Azure Data Factory using a self-hosted integration runtime (IR).
Summary
This simulates an on-prem environment by creating two servers in the same network. One has SQL Server and the other has a self-hosted integration runtime installed to connect to Azure.
- Create two Windows VMs in Azure with the same virtual network/subnet
- In VM1: Install SQL Server with sample database.
- In VM2: Install Azure Data Studio and the self-hosted IR.
- Test connection VM2 -> VM1.
Considerations for using a self-hosted integration runtime
More details in this Microsoft doc.
- Don’t install the IR on the same server that has the data source or that has a PBI gateway.
- Ideally install the IR on a machine that is used for this purpose only
- Ideally the IR machine is close to the data source machine to reduce connection time.
- The IR can be shared with multiple data factories but can’t be shared across Synapse workspaces
- You can use a single IR for many on-premise data sources.
- You can only install one IR instance on a single machine.
Create a resource group and network security group
See Azure Resource Group and Network Security Group
Create a virtual network
Create an “on-prem” SQL Server Windows VM
I will refer to it as the SQL Server VM
. This will simulate the on-premise SQL Server.
Follow my tutorial Create a SQL Server Windows VM
- Select the resource group and security group created above.
- In Networking: Select the Vnet created above
onprem-vnet
- In SQL Connectivity: Select
Private (within Virtual Network)
- RDP to the server, open SSMS and load a sample database.
Create an “on-prem” Windows Server VM
I will refer to it as the IR Server VM
. This will simulate another on-premise server. It is used for:
- Connecting to the SQL Server Windows VM in the virtual network.
- Installing the Self-Hosted Integration Runtime (IR) to connect to Azure Data Factory.
Follow my tutorial Create an Azure Windows Server Datacenter VM
- Select the resource group and security group created above.
- In Networking: Select the Vnet created above
onprem-vnet
Test Connection from IR Server VM to the SQL Server VM
Go to Azure to find the local IP address of the SQL Server VM sql-server-IP
.
RDP to the IR Server VM.
Open PowerShell and test the connection from IR Server VM to the SQL Server VM.
Test-NetConnection sql-server-IP -port 1433
Output should be similar to this:
ComputerName : sql-server-IP
RemoteAddress : sql-server-IP
RemotePort : 1433
InterfaceAlias : Ethernet
SourceAddress : ir-server-IP
TcpTestSucceeded : True
Install Azure Data Studio and SSMS in IR Server VM
RDP to the IR Server VM.
Follow my tutorial Use Chocolatey in Windows Powershell to Install Programs.
Then install these programs:
choco install azure-data-studio --yes
choco install sql-server-management-studio --yes
Connect to SQL Server VM from the IR Server VM
Open SSMS in the IR Server and connect to the SQL Server VM
- In the connection
- Enter the SQL Server local IP
- Select Authentication Type
SQL Login
- Use the VM login credentials
- When connecting accept the certificate
- Open
Databases
andAdventureWorksDW2019
should be listed there.
Verify connection to the database by creating a query:
SELECT TOP 20 * FROM DimCustomer;
Disable the SQL Server VM RDP and Public IP
Since this tutorial simulates that the SQL Server VM is an on-prem server with no public connection. Then RDP should be disabled.
- Go to Azure
- Find the SQL Server VM
- Go to Networking
- Select the RDP inbound port rule
- Click on the
...
and delete
Create an Azure Data Factory
See:
Setup:
- Name the GitHub repo
onprem-adf
- Name the data factory
onprem-adf
Create a self-hosted integration runtime (IR)
Once the ADF is deployed, open the resource. Then click on Launch studio
Create an integration runtime
- Inside the
Manage
hub - Go to
Connections/Integration runtimes
and click onNew
- Select
Azure, Self-Hosted
- Select
Self-Hosted
- Enter a name like
onprem
- Enter a description like
Self-hosted IR from onprem Windows Server that connects to a private network onprem SQL Server
- Click
Create
- Under
Option 2: Manual setup
, open the link toDownload and install integration runtime
- Copy/paste this link in the Windows Server to install the IR
Setup the Integration Runtime in the IR Server VM
For this tutorial, I started the two VMs:
onprem-ir
(IR Server VM)onprem-sqlserver
(SQL Server VM).
Go to the IR Server and install the IR:
- Open the link from the ADF integration runtime setup
- Download the latest configuration (it was about 924MB)
- Run the installer and keep all defaults
- Then a window says
Register Integration Runtime
- Go to the ADF setup for the IR and copy one of the Keys
- Then click
Register
andFinish
- A message should say
Integration runtime node has been registered successfully
In ADF confirm that the self-hosted is running
- Go to Azure Data Factory
- Manage, Integration runtimes
- The created IR, in this tutorial I named it
onprem
should be listed and status isRunning
Connect onprem SQL Server to Azure Data Factory
Setup a Linked Service for onprem SQL Server. See Azure Data Factory Linked Service
Copying to/from Parquet files
You will get this error when copying SQL Server to Parquet files:
Java Runtime Environment cannot be found on the Self-hosted Integration Runtime machine. It is required for parsing or writing to Parquet/ORC files. Make sure Java Runtime Environment has been installed on the Self-hosted Integration Runtime machine.
Follow this MS doc. On the IR Server:
- Install 64-bit JRE 8 from here or
- Install OpenJDK