How to create an Azure Data Factory.

Workflow

  • Create a GitHub repo to save changes
  • Create a Data Factory
  • Use Save all to commit Data Factory changes to the repo
  • Use Publish when developing and you know the pipeline runs without issues.

Create a GitHub repo

  • Go to GitHub and create a repo.
  • Set to private. Initialize with a Readme

Create an Azure Data Factory

  • In the Azure portal
  • Create a resource
  • Search for data factory
  • Select a subscription and create a resource group
  • Enter a name, a region and keep the version as V2
  • In Git configuration keep the default Configure Git later
  • Go to Review and Create and Create

Self-Hosted Integration Runtime configuration

See: Azure Data Factory Self-Hosted Integration Runtime

  • Networking
    • Self-hosted integration runtime inbound connectivity, set to Private endpoint
  • Private endpoint connections
    • Click Create a private endpoint
    • Select subscription and resource group
    • Enter a name like onprem-ir-endpoint
    • On Networking
      • Select the virtual network onprem-vnet (onprem-azure-dw)
      • Select subnet default
        • A message says If you have a NSG enabled for this subnet, it will be disabled for private endpoints on this subnet only. Other resources on the subnet will still have NSG (network security group) enforcement.
    • On Private DNS integration
      • It says To connect privately with your private endpoint, you need a DNS record. We recommend a private DNS zone. You can also use your own DNS servers or create DNS records using the host files on your VMs
      • Set to Yes
      • Private DNS Zone: Leave default (New) privatelink.datafactory.azure.net
      • Click OK, then select it with a checkbox

Add the GitHub repo to Data Factory

In Data Factory, add the repo:

  • In the Manage interface, Source Control, Git configuration

Azure Data Factory Git Repository

Select a GitHub repo:

  • Enter the GitHub repository owner (your GitHub use