Create pipelines and activities

  1. Home
  2. Create pipelines and activities

Go back to DP-200 Tutorials

In this we will learn about creating pipelines and activities. However, we will create a pipeline with a copy activity that uses the input and output datasets. And, the copy activity copies data from the file you specified in the input dataset settings to the file you specified in the output dataset settings.

  • Firstly, create a JSON file named Adfv2QuickStartPipeline.json in the C:\ADFv2QuickStartPSH folder with the following content:
Create pipelines and activities
Image Source: Micrsoft
  • Secondly, to create the pipeline: Adfv2QuickStartPipeline, Run the Set-AzDataFactoryV2Pipeline cmdlet.

PowerShell
$DFPipeLine = Set-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName
-ResourceGroupName $ResGrp.ResourceGroupName -Name "Adfv2QuickStartPipeline"
-DefinitionFile “.\Adfv2QuickStartPipeline.json”

Pipelines and activities in Azure Data Factory

A data factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. However, the pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.

Data Factory has three groupings of activities: data movement activities, data transformation activities, and control activities. An activity can take zero or more input datasets and produce one or more output datasets. The following diagram shows the relationship between pipeline, activity, and dataset in Data Factory:

Relationship between dataset, activity, and pipeline

Here, an input dataset represents the input for an activity in the pipeline, and an output dataset represents the output for the activity. Datasets identify data within different data stores, such as tables, files, folders, and documents. After you create a dataset, you can use it with activities in a pipeline.

DP-200 practice tests

Data transformation activities

Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity.

Create activities
Image Source: Microsoft

Activity policy

Policies affect the run-time behavior of an activity, giving configurability options. However, Activity Policies are only available for execution activities.

Activity policy JSON definition

activity policy
Image Source: Microsoft

Control activity

Control activities have the following top-level structure:

control activity
Image Source: Microsoft

Multiple activities in a pipeline

You should know that you can have more than one activity in a pipeline. And, if you have multiple activities in a pipeline and subsequent activities are not dependent on previous activities, then, the activities may run in parallel. Also, you can chain two activities by using activity dependency, which defines how subsequent activities depend on previous activities.

Scheduling pipelines

Pipelines are scheduled by triggers. And, there are different types of triggers (Scheduler trigger, which allows pipelines to be triggered on a wall-clock schedule, as well as the manual trigger, which triggers pipelines on-demand).

However, to have your trigger kick off a pipeline run, you must include a pipeline reference of the particular pipeline in the trigger definition. Pipelines & triggers have an n-m relationship. Multiple triggers can kick off a single pipeline, and the same trigger can kick off multiple pipelines. Once the trigger is defined, you must start the trigger to have it start triggering the pipeline.

For example, say you have a Scheduler trigger, “Trigger A,” that I wish to kick off my pipeline, “MyCopyPipeline.” You define the trigger, as shown in the following example:

Trigger A definition

scheduling pipelines
Image Source: Microsoft
Create pipelines and activities DP-200 Online course

Reference: Microsoft Documentation, Documentation 2

Go back to DP-200 Tutorials

Menu