Cloud Composer Overview: Google Professional Data Engineer GCP

  1. Home
  2. Cloud Composer Overview: Google Professional Data Engineer GCP

In this, we will get Cloud Composer Overview.

Cloud Composer Overview:

  • Is a managed workflow orchestration service that is built on Airflow
  • deploys multiple components to run Airflow.
  • Composer relies on certain configurations to successfully execute workflows.
  • Altering configurations can have unintended consequences or break Airflow deployment.

Environments

  • Airflow is a micro-service architected framework.
  • To deploy Airflow, provision many GCP components, called Cloud Composer environment.
  • Can create one or more Cloud Composer environments inside of a project.
  • Environments are self-contained Airflow deployments based on GKE.
  • environments work with Google Cloud services.
  • create Cloud Composer environments in supported regions
  • environments run within a Compute Engine zone.
  • Airflow communicates with other Google Cloud products through the products’ public APIs.
Architecture

Cloud Composer distributes the environment’s resources between a Google-managed tenant project and a customer project, as

Tenant project resources

  • For unified Cloud IAM, access control and data security, Cloud Composer deploys Cloud SQL and App Engine in the tenant project.

Cloud SQL

  • Cloud SQL stores the Airflow metadata.
  • Composer limits database access to the default or the specified custom service account used to create the environment.
  • Composer backs up the Airflow metadata daily to minimize potential data loss.
  • Only service account used to create the Composer environment can access data in the Cloud SQL database.

 

 

App Engine
  • Its flexible environment hosts the Airflow web server.
  • By default, the Airflow web server is integrated with Identity-Aware Proxy.
  • Also enables you to use the Cloud Composer IAM policy to manage web server access.
  • Composer also supports deploying a self-managed Airflow web server in the customer project.

 

Customer project resources

Composer deploys following in customer project.

  • Cloud Storage: provides the storage bucket for staging DAGs, plugins, data dependencies, and logs.
  • Google Kubernetes Engine: By default, Cloud Composer deploys core components—such as Airflow scheduler, worker nodes, and CeleryExecutor—in a GKE. Composer also supports VPC-native clusters using alias IPs.
  • Redis, the message broker for the CeleryExecutor, runs as a StatefulSet application so that messages persist across container restarts.
  • Cloud Logging and Cloud Monitoring: Composer integrates with Cloud Logging and Cloud Monitoring, to view all Airflow service and workflow logs.

 

Cloud Composer Environment Component

Components for each environment:

  • Web server: The web server runs the Apache Airflow web interface, and Identity-Aware Proxy protects the interface.
  • Database: The database holds the Apache Airflow metadata.
  • Cloud Storage bucket: bucket stores the DAGs, logs, custom plugins, and data for the environment.

 

Airflow management:

Use following Airflow-native tools for management

  • Web interface: Access Airflow web interface from the Google Cloud Console or by direct URL.
  • Command line tools: run gcloud composer commands to issue Airflow command-line commands.
  • Cloud Composer REST and RPC APIs

 

Airflow configuration:
  • configurations Composer provides for Apache Airflow are the same as the configurations for a locally-hosted Airflow deployment.
  • Some Airflow configurations are preconfigured
  • cannot change the configuration properties.
  • Other configurations, to be specified when creating or updating environment.

Airflow DAGs (workflows):

  • An Apache Airflow DAG is a workflow: a collection of tasks with additional task dependencies.
  • Cloud Storage used to store DAGs.
  • To add or remove DAGs add or remove the DAGs from the Cloud Storage bucket
  • Can schedule DAGs
  • can trigger DAGs manually or in response to events

 

Plugins
  • can install custom plugins, into Cloud Composer environment.

Python dependencies

  • can install Python dependencies from Python Package Index.

 

Access control

  • manage security at the Google Cloud project level
  • assign Cloud IAM roles for control.
  • Without appropriate Cloud Composer IAM role, no access to any of environments.
Logging and monitoring:
  • can view Airflow logs that are associated with single DAG tasks
  • View in the Airflow web interface and
  • the logs folder in the associated Cloud Storage bucket.
  • Streaming logs are available for Cloud Composer.
  • access streaming logs in Logs Viewer in Google Cloud Console.
  • Also has audit logs, such as Admin Activity audit logs, for Google Cloud projects.

Networking and security:

During environment creation, following configuration options available

  • Cloud Composer environment with a route-based GKE cluster (default)
  • Private IP Cloud Composer environment
  • Cloud Composer environment with a VPC Native GKE cluster using alias IP addresses
  • Shared VPC
Create a Project

To create a project and enable the Cloud Composer API:

  • In the Cloud Console, select or create a project.
  • Make sure that billing is enabled for project.
  • To activate the Cloud Composer API in a new or existing project, go to the API Overview page for Cloud Composer.
  • Click Enable.

Cloud Composer Versioning

  • Airflow follows the semantical software versioning schema.
  • Composer supports last two stable minor Airflow releases and latest two patch versions for those minor releases.

Pass the GCP Exam Now!

Menu