Migration to Google Cloud Google Professional Data Engineer GCP

  1. Home
  2. Migration to Google Cloud Google Professional Data Engineer GCP

two migration models to transferring HDFS data

push

  • simplest model
  • the source cluster runs the distcp jobs on its data nodes and pushes files directly to Cloud Storage.

Pull

  • is complex
  • An ephemeral Dataproc cluster runs the distcp jobs on its data nodes,
  • pulls files from the source cluster, and copies them to Cloud Storage.

Menu