Pipeline Lifecycle, creation and Transform Google Professional Data Engineer GCP

  1. Home
  2. Pipeline Lifecycle, creation and Transform Google Professional Data Engineer GCP

Pipelien Creation

To construct a pipeline using the Beam SDKs

  • Create a Pipeline object.
  • Use a Read or Create transform to create one or more PCollections for pipeline data.
  • Apply transforms to each PCollection.
  • Write or otherwise output the final, transformed PCollections.
  • Run the pipeline.
  • Pipeline execution is separate from Apache Beam program’s execution; and is executed by a pipeline runner.
  • can specify the pipeline runner and other execution options

Transforms

  • Element-wise transforms operate on individual elements within PCollection .
  • Similar to MapReduce.
  • execute transformations by invoking a ParDo operation

 

 

Cancelling

  • Canceling a job causes a near immediate halt of execution,
  • good for idempotent pipelines
  • If consuming data destructively, may result in lost data.
Menu