Storage types Google Professional Data Engineer GCP

  1. Home
  2. Storage types Google Professional Data Engineer GCP
  • Two types – solid-state drives (SSD) or hard disk drives (HDD).
  • SSD storage is the most efficient and cost-effective choice for most use cases.
  • HDD storage is sometimes appropriate for very large data sets (>10 TB) that are not latency-sensitive or are infrequently accessed.
  • HDD use cases
    • store at least 10 TB of data.
    • not to be used for user-facing or latency-sensitive application.
    • workload is Batch workloads or Data archival

 

 

 

Application profiles

  • Application profiles, or app profiles for instances using replication,
  • app profiles control how applications connect to the instance’s clusters.
  • Without replication, app profiles provide separate identifiers for each of applications

Clusters

  • A cluster is a service in a specific location.
  • Cluster belongs to a single instance
  • An instance can have up to 4 clusters
  • application requests are handled by one of the clusters in the instance.
  • cluster is located in a single zone.
  • An instance’s clusters must each be in unique zones.
  • can create more cluster in any zone if Bigtable is available.
  • instances with only 1 cluster do not use replication.

 

Nodes

  • Each cluster in an instance has 1 or more nodes
  • Nodes are compute resources to manage data.
  • Bigtable splits all data from tables into smaller tablets.
  • Tablets are stored on disk, separate from the nodes but in the same zone as the nodes.
  • A tablet is associated with a single node.
  • Each node
    • Keep track of specific tablets on disk.
    • Handle incoming reads and writes for its tablets.
    • Perform maintenance tasks on its tablets

Instance Creation Steps

  • Select or create a GCP project.
    • A project name must be between 4 and 30 characters.
    • A project ID is suggested which can be edited and is 6 to 30 characters, with a lowercase letter as the first character and last character cannot be a hyphen.
  • Make sure billing is enabled for Google Cloud project.
  • Enable the Cloud Bigtable and Cloud Bigtable Admin APIs.

Labels –

  • a key-value pair
  • helps you organize GCP resources
  • Can attach a label to each resource
  • filter the resources based on their labels.

Modifying Instance

  • By default, can provision maximum thirty Cloud Bigtable nodes/zone in each Google Cloud project.
  • For more use the node request form.
  • After creating a Bigtable instance, can update following settings
    • number of nodes in each cluster
    • number of clusters in the instance
    • application profiles for the instance
    • labels for the instance
    • display name for the instance

Adding and deleting clusters

  • can add clusters to an existing instance,
  • a maximum of 4 clusters per instance can be present
  • Clusters can be in any region if Bigtable is available

 

Deleting a cluster

  • Can delete all but 1 of the clusters is needed
  • Deleting all but 1 cluster automatically disables replication.

 

Monitoring

  • monitor Bigtable instance using Cloud Console and Cloud Monitoring
  • A high-level overview is given
  • Key Visualizer tool gives drill down
Menu