AI Platform Pipelines Google Professional Data Engineer GCP
- makes it easier to get started with MLOps
 - easily set up Kubeflow Pipelines with TensorFlow Extended (TFX).
 - Kubeflow Pipelines is an open source platform for running, monitoring, auditing, and managing ML pipelines on Kubernetes.
 - TFX is an open source project for building ML pipelines that orchestrate end-to-end ML workflows.
 - ML pipelines are portable, scalable ML workflows
 - Use ML pipelines to:
 - Apply MLOps strategies to automate repeatable processes.
 - Experiment by running an ML workflow with different sets of hyperparameters,
 - Reuse a pipeline’s workflow to train a new model.
 
Pipeline Components
- are self-contained sets of code
 - perform one step in a pipeline’s workflow like
- data preprocessing
 - data transformation
 - model training
 
 - composed of a
- set of input parameters
 - set of outputs
 - location of a container image
 
 - A component’s container image includes
- component’s executable code
 - definition of the environment that the code runs in.
 
 

Understanding pipeline workflow
- Each task in a pipeline performs a step in the pipeline’s workflow.
 - tasks are instances of pipeline components,
 - tasks have input parameters, outputs, and a container image.
 - Task input parameters can be set from the pipeline’s input parameters
 
For example, consider a pipeline with the following tasks:
Preprocess: prepares the training data.- Train: uses the preprocessed training data to train the model.
 - Predict: deploys trained model as an ML service and gets predictions for testing dataset.
 - Confusion matrix: uses output of the prediction task to build a confusion matrix.
 - ROC: uses the output of the prediction task to perform receiver operating characteristic (ROC) curve analysis.
 
Kubeflow Pipelines SDK analyzes the task dependencies, as
- The preprocessing task does not depend on any other tasks
 - The training task relies on data produced by the preprocessing task, so training must occur after preprocessing.
 - The prediction task relies on the trained model produced by the training task, so prediction must occur after training.
 - Building the confusion matrix and performing ROC analysis both rely on the output of the prediction task, so they must occur after prediction is complete.
 - Hence, system runs the preprocessing, training, and prediction tasks sequentially, and then runs the confusion matrix and ROC tasks concurrently.
 - With AI Platform Pipelines, you can orchestrate machine learning (ML) workflows as reusable and reproducible pipelines.
 
Building pipelines using the TFX SDK
- TFX is an open source project to define ML workflow as a pipeline.
 - TFX components can only train TensorFlow based models.
 - TFX provides components to
- ingest and transform data
 - train and evaluate a model
 - deploy a trained model for inference, etc.
 
 - By using the TFX SDK, you can compose a pipeline for ML process from TFX components.
 
Building pipelines using the Kubeflow Pipelines SDK
build components and pipelines by
- Developing the code for each step in workflow using preferred language and tools
 - Creating a Docker container image for each step’s code
 - Using Python to define pipeline using the Kubeflow Pipelines SDK
 
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz
		