The migration Process Google Professional Data Engineer GCP
A migration is a journey and involves various phases with multiple options to reach destination. As per diagram

There are four phases of migration:
- 
- Involves assessment and discovery of existing environment,
 - understand app and environment inventory
 - identify app dependencies and requirements
 - perform TCO and app performance benchmarks.
 
 - 
- Create the basic cloud infrastructure for workloads
 - Make plan how to move apps.
 - Planning involves enlisting identity management, organization and project structure, networking, sorting apps, and a prioritized migration strategy.
 
 - 
- Design, implement and execute migration
 - May refine cloud resources as per any need
 
 - Optimize
- Analyze and optimize cloud resource utilization
 - Reduce costs
 - Implement Automation, ML and AI services
 
 
Assess Phase
- Build an inventory of apps – Use teams for each workload in current environment.
 - The inventory should include
- apps
 - Dependencies of each app
 - Services supporting app infrastructure
 - Servers configurations
 - Network devices, firewalls, and other dedicated hardware.
 
 - For each item gather
- Source code location
 - Deployment method
 - Network restrictions or security requirements.
 - Licensing requirements
 
 - Categorize apps
- Categorize to prioritize the apps to migrate first
 - Also understand complexity and risk involved
 - A catalog matrix is used for purpose
 
 
Transferring large datasets
For large datasets transfer involves various steps as
- building the right team
 - planning early
 - testing transfer plan before implementing
 
Data transfer
- Process of moving data without transforming
 - It involves
- Making a transfer plan to decide transfer option and get approvals
 - Coordinating team that executes the transfer
 - Choosing the right transfer tool based on resources, cost, time
 - Overcoming data transfer challenges, like insufficient bandwidth, moving actively used datasets, protecting and monitoring the data during transfer and ensuring successful transfer
 
 - Other types of data transfer projects
- ETL transformation use Dataflow.
 - To migrate a database and related apps use Cloud Spanner
 - For virtual machine (VM) instance migration use Migrate for Compute Engine.
 
 
Step 1: Assembling team
Planning a transfer typically requires personnel with the following roles and responsibilities:
- Storage, IT, and network admins to execute transfer
 - Data owners or governors, legal persons approval for transfer
 
Step 2: Collecting requirements and available resources
- Identify datasets to move.
- Use Data Catalog to organize data into logical groupings
 - Work with teams to update these groupings.
 
 - Identify datasets you can move.
- Any regulatory, security, or other factors prohibit transfer
 - Remove sensitive data or reorganize data as needed. Use Dataflow or Cloud Data Fusion or Cloud Composer.
 
 - For movable datasets decide where to transfer each dataset.
- Select storage option to store data.
 - Understand data access policies to maintain after migration.
 - Any region or geography specific requirement
 - data structure in the destination
 - transfer on an ongoing basis or one off
 
 - For movable datasets also enlist following
- Time: When to transfer
 - Cost: budget available
 - People: Who will execute the transfer
 - Bandwidth (for online transfers):
 
 
Step 3: Evaluating transfer options
Data transfer options are selected as per following factors
- Cost
 - Time
 - Offline versus online transfer options
 - Transfer tools and technologies
 - Security
 
Cost:
It includes
- Networking costs
- Egress charges if any
 - bandwidth charges for transfer
 
 - Storage and operation costs for Cloud Storage during and after the transfer of data
 - Personnel costs for support
 
Time:
- Time involved for transfer
 - when to undertake transfer
 - Connection options for data transfer between private data center and GCP
- A public internet connection by using a public API
 - Direct Peering by using a public API
 - Cloud Interconnect by using a private API
 
 
Connecting with a public internet connection –
- Less predictable
 - Dependent on ISP capacity
 - low costs
 - Google offers peering arrangements if applicable
 
Connecting with Direct Peering –
- Access GCP network with lesser network hops
 - Direct Peering connects ISP network and Google’s Edge Points of Presence (PoPs)
 - A registered Autonomous System (AS) Number need to be set up along with around-the-clock contact with network operations center.
 
Connecting with Cloud Interconnect –
- Cloud Interconnect is a direct connection to GCP by Cloud Interconnect service providers.
 - No need to send data on the public internet
 - more consistent throughput for large data transfers.
 - SLAs for network availability and performance
 
Online versus offline transfer –
- Transfer data over a network, or by using storage hardware.
 
Deciding among Google’s transfer options
Factors to choose a transfer option
| Where you’re moving data from | Scenario | Suggested products | 
| Another cloud provider (for example, Amazon Web Services or Microsoft Azure) to Google Cloud | — | Storage Transfer Service | 
| Cloud Storage to Cloud Storage (two different buckets) | — | Storage Transfer Service | 
| private data center to Google Cloud | Enough bandwidth to meet project deadline | |
| for less than a few TB of data | gsutil | |
| private data center to Google Cloud | Enough bandwidth to meet project deadline | |
| for more than a few TB of data | Storage Transfer Service for on-premises data | |
| private data center to Google Cloud | Not enough bandwidth to meet project deadline | Transfer Appliance | 
gsutil
- suitable for smaller transfers of on-premises data (less than a few TB)
 - include gsutil in default path if using Cloud Shell.
 - By default provided with Cloud SDK.
 - manages Cloud Storage instances,
 - functions provided –
 - copying data to and from the local file system and Cloud Storage.
 - move and rename objects and
 - perform real-time incremental syncs
 - Use scenarios
- transfers as-needed basis, or in command-line sessions by users.
 - If transfer few files or very large files, or both.
 - consuming output of a program like streaming output to Cloud Storage
 - if watching and syncing a directory with a fewer number of files
 
 - For using gsutil, create a Cloud Storage bucket and copy data to it.
 - For security use HTTPS
 - For large datasets transfer
- use gsutil -m for multi-threaded transfers
 - use Composite transfers for a single large file, it breaks large files into smaller chunks to increase transfer speed.
 
 
Storage Transfer Service
- for large transfers of on-premises data
 - Designed for large-scale transfers (up to petabytes of data or billions of files).
 - supports full copies or incremental copies,
 - Offers graphical user interface
 - Usage scenarios
- If sufficient bandwidth available to move the data volumes
 - For large internal users who cannot use gsutil
 - need error-reporting and a record of data moved.
 - limit the impact of transfers on other workloads
 - To run recurring transfers on a schedule.
 
 - Install agents to use Storage Transfer Service on-premises
 - Agents are in Docker containers and run or orchestrate them by Kubernetes.
 - After setup start transfers by providing
- a source directory
 - destination bucket
 - time or schedule
 
 - Storage Transfer Service recursively crawls subdirectories and files in the source directory and creates objects with a corresponding name in Cloud Storage
 - Automatically re-attempts transfer if any transient errors
 - You can monitor files moved and the overall transfer speed
 - After transfer a tab-delimited file (TSV) file lists all files transferred and error messages
 - Best Practices
- Use an identical agent setup on every machine.
 - More agents results in more speed so deploy many agents
 - Bandwidth caps can protect other workloads
 - Plan time for reviewing errors.
 - Set up Cloud Monitoring for long-running transfers.
 
 
Transfer Appliance –
- Used for larger transfers if limited network bandwidth or costly
 - Usage scenarios:
- data at a remote location with limited / no bandwidth.
 - Required bandwidth is not available
 
 - Involves receiving and shipping back the hardware
 - It is Google-owned hardware.
 - Available only in specific countries.
 - Factors for choosing it are cost and speed.
 - Request a appliance in the Cloud Console detailing data to transfer
 - Approximate turnaround time for a appliance to be shipped, loaded with data, shipped back, and rehydrated on Google Cloud is 50 days.
 - cost for the 480 TB device process is less than $3,000.
 
Storage Transfer Service for cloud-to-cloud transfers –
- Storage Transfer Service is a fully managed and highly scalable data transfer service
 - Automates transfers from other public clouds into Cloud Storage.
 - supports transfers into Cloud Storage from Amazon S3 and HTTP.
 - For Amazon S3,
- access key and an S3 bucket details are needed
 - Daily copies of any modified objects is also supported
 - Cannot transfer to Amazon S3.
 
 - For HTTP, list of public URLs in a specified format are needed
 - Script needed with size of each file in bytes, with Base64-encoded MD5 hash of the file contents.
 
Security
- Primary focus during transfer
 - different levels of security offered by GCP
 - consider protection of
 - data at rest (authorization and access to the source and destination storage system),
 - data in transit,
 - access to the transfer product.
 
Security offered by product.
| Product | Data at rest | Data in transit | Access to transfer product | 
| Transfer Appliance | All data is encrypted. | Protected with keys managed by the customer. | Anyone can order an appliance, but to use it they need access to the data source. | 
| gsutil | Access keys required to access Cloud Storage, which is encrypted at rest. | Data is sent over HTTPS and encrypted in transit. | Anyone can download and run gsutil. They must have permissions to buckets and local files in order to move data. | 
| Storage Transfer Service for on-premises data | Access keys required to access Cloud Storage, which is encrypted at rest. The agent process can access local files as OS permissions allow. | Data is sent over HTTPS and encrypted in transit. | You must have object editor permissions to access Cloud Storage buckets. | 
| Storage Transfer Service | Access keys required for non-Google Cloud resources (for example, Amazon S3). Access keys are required to access Cloud Storage, which is encrypted at rest. | Data is sent over HTTPS and encrypted in transit. | You must have Cloud IAM permissions for the service account to access the source and object editor permissions for any Cloud Storage buckets. | 
Step 4: Preparing for transfer
Steps involved are
- Pricing and ROI estimation.
 - Functional testing. to confirm product set up and that network connectivity
- Confirmation of install and operation of the transfer.
 - Enlist issues that block data movement
 - List operations needed like training needed
 
 - Performance testing. run a transfer on a large sample of data and confirm speed and fix bottlenecks
 
Step 5: Ensuring the integrity of transfer
- Enable versioning
 - backup on destination to circumvent any accidental deletes.
 - Validate data before removing the source data.
 
