Cloud Spanner Instances Google Professional Data Engineer GCP

  1. Home
  2. Cloud Spanner Instances Google Professional Data Engineer GCP
  • To use Cloud Spanner, first create a Cloud Spanner instance within Google Cloud project.
  • instance allocates resources used by Cloud Spanner
  • Instance creation includes instance configuration and the node count
  • An instance configuration defines the geographic placement and replication of the databases in that instance.
  • node count is the number of nodes to allocate to that instance.
  • Each node provides up to 2 TB of storage.
  • After instance creation, can add or remove nodes to the instance later.
  • cannot remove nodes if
    • If store more than 2 TB of data per node.
    • Spanner has created a large number of splits for instance’s data
  • To change nodes, use
    • Cloud Consol
    • the gcloud command-line tool
    • the client libraries

Nodes versus replicas

  • To scale up the serving and storage resources in instance, add more nodes to that instance.
  • Adding a node does not increase the number of replicas but increases the resources
  • total number of servers in a Cloud Spanner instance is the number of nodes the instance has multiplied by the number of replicas in the instance.

Data Management

  • To create, alter, and delete tables and indexes is done by using the default Database editor
  • Use Cloud Console for inserting, editing, and deleting data.
  • run DML statements using client libraries, the Google Cloud Console, and the gcloud command-line tool.
  • execute DML statements inside read-write transactions.
  • During data read, shared read locks is acquired on limited portions of the row ranges to read.
  • During write using DML statements, exclusive locks is acquired

 

  • Cloud Spanner sequentially executes all the SQL statements (SELECT, INSERT, UPDATE, and DELETE) within a transaction and not concurrently except multiple SELECT statements
  • transaction with DML statements has the same limits as any other transaction.
  • Use Partitioned DML for large-scale changes
  • If transaction result in more than 20,000 mutations, a BadUsage error is given
  • If a transaction result larger than 100 MB, a BadUsage error is given
  • Partitioned DML is designed for bulk updates and deletes, particularly periodic cleanup and backfilling.

Query

  • DQL statements is used to query
  • A query execution plan is the set of steps for how the results are obtained.
  • can retrieve a query plan using the Cloud Console, the client libraries, and the gcloud command-line tool.

 

Query Best Practices

  • Use query parameters to speed up frequently executed queries
  • Use secondary indexes to speed up common queries
  • Avoid large reads inside read-write transactions
  • Use ORDER BY to ensure the ordering of SQL results
  • Use STARTS_WITH instead of LIKE to speed up parameterized SQL queries
Menu