Schema and data model Google Professional Data Engineer GCP
- Can contain one or more tables
 - Tables look like relational database tables
 - Table structured with rows, columns, and values
 - table have primary keys.
 - Data is strongly typed
 - must define a schema for each database and specify the data types of each column.
 - Allowable data types include scalar and array types
 - Avoid hot spotting like bigtable
 - Use nested tables with primary keys called interleaving for faster access.
 - Do not use sequential numbers, timestamps.
 - If needed, store timestamps in descending order.
 
Parent-child table relationships
- Two ways to define parent-child relationships – table interleaving and foreign keys.
 - Table interleaving is good if child table’s primary key includes the parent table’s primary key columns.
 - Foreign keys
- are not limited to primary key columns
 - tables can have multiple foreign key relationships
 
 
Primary keys
- Every table must have a primary key
 - primary key can be composed of zero or more columns of that table.
 - For child, the primary key column(s) of the parent table must be the prefix of the primary key of the child table.
 - Spanner stores rows in sorted order by primary key values
 - child rows also inserted between parent rows that share the same primary key prefix.
 - Spanner can physically co-locate rows of related tables.
 
.
Choosing a primary key
- primary key uniquely identifies each row in a table.
 - Options for primary key
- Hash the key and store it in a column.
 - Use a Universally Unique Identifier (UUID).
 - Bit-reverse sequential values.
 
 
Database splits
- can define hierarchies of parent-child relationships between tables up to seven layers deep
 - Spanner divides data into chunks called “splits”
 - individual splits can move independently from each other and assigned to different servers
 - A split is a range of rows in a top-level (in other words, non-interleaved) table, where the rows are ordered by primary key.
 - The start and end keys of this range are called “split boundaries”.
 - Spanner automatically adds and removes split boundaries
 
Key columns
- The keys of a table can’t change
 - can’t add a key column to an existing table
 - cannot remove a key column from an existing table.
 
Storing NULLs
- Primary key columns can be defined to store NULLs.
 - to store NULLs in a primary key column, omit the NOT NULL clause
 
Disallowed types
These cannot be of type ARRAY:
- A table’s key columns.
 - An index’s key columns.
 
Replication
- Spanner automatically gets replication at the byte level
 - Spanner writes database mutations to files in this filesystem
 - Spanner creates replicas of each database split
 - 3 types of replicas: read-write replicas, read-only replicas, and witness replicas.
 - Single-region instances use only read-write replicas,
 - multi-region instance configurations use a combination of all three types
 
For multi-region
| Replica type | Can vote | Can become leader | Can serve reads | 
| Read-write | yes | yes | yes | 
| Read-only | no | no | yes | 
| Witness | yes | no | no | 
Schema Design Best Practices
- Design a schema that prevents hotspots and other performance issues.
 - Place compute resources for write-heavy workloads within or close to the default leader region for optimal write latency
 - Use staleness of at least 15 seconds for optimal read performance outside of the default leader region
 - place critical compute resources in at least two regions to avoid single-region dependency
 - keep high priority total CPU utilization under 45% in each region.
 
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz
		