Create an Azure Data Factory
How to create an Azure Data Factory.
Workflow
- Create a GitHub repo to save changes
- Create a Data Factory
- Use
Save all
to commit Data Factory changes to the repo - Use
Publish
when developing and you know the pipeline runs without issues.
Create a GitHub repo
- Go to GitHub and create a repo.
- Set to private. Initialize with a Readme
Create an Azure Data Factory
- In the Azure portal
- Create a resource
- Search for
data factory
- Select a
subscription
and create aresource group
- Enter a
name
, aregion
and keep theversion
asV2
- In
Git configuration
keep the defaultConfigure Git later
- Go to
Review and Create
andCreate
Self-Hosted Integration Runtime configuration
See: Azure Data Factory Self-Hosted Integration Runtime
- Networking
- Self-hosted integration runtime inbound connectivity, set to
Private endpoint
- Self-hosted integration runtime inbound connectivity, set to
- Private endpoint connections
- Click
Create a private endpoint
- Select subscription and resource group
- Enter a name like
onprem-ir-endpoint
- On
Networking
- Select the virtual network
onprem-vnet (onprem-azure-dw)
- Select subnet
default
- A message says
If you have a NSG enabled for this subnet, it will be disabled for private endpoints on this subnet only. Other resources on the subnet will still have NSG (network security group) enforcement.
- A message says
- Select the virtual network
- On
Private DNS integration
- It says
To connect privately with your private endpoint, you need a DNS record. We recommend a private DNS zone. You can also use your own DNS servers or create DNS records using the host files on your VMs
- Set to
Yes
- Private DNS Zone: Leave default
(New) privatelink.datafactory.azure.net
- Click OK, then select it with a checkbox
- It says
- Click
Add the GitHub repo to Data Factory
In Data Factory, add the repo:
- In the Manage interface, Source Control,
Git configuration
Select a GitHub repo:
- Enter the
GitHub repository owner
(your GitHub use