Google Products and Storage Options Google Professional Data Engineer GCP
Various storage systems in Google cloud are discussed with their uses.
Cloud SQL
- A fully managed relational database service
 - Easily set up and manage RDBMS – PostgreSQL, MySQL, and SQL Server in GCP
 - Apt to be used for WordPress, backends, CRM tools, MySQL, PostgreSQL, and Microsoft SQL Servers
 
Cloud Spanner
- A scalable relational database service
 - Full transactions support
 - Provides strong consistency and high availability
 - Useful for mission-critical applications
 - Provides scale insurance
 
Cloud Bigtable
- NoSQL database service from GCP
 - Provides low latency reads
 - Supports high throughput writes
 - Enables scalability and reliability
 - Suitable for large analytical workloads and low-latency applications
 - Store large amount of structured objects.
 - No support for SQL’s queries or multi-row transactions.
 - Provision for capacity petabytes with a maximum unit size of 10 megabytes per cell and 100 megabytes per row.
 
Cloud Memorystore
- A managed in-memory data store service for Redis
 - Useful for sub-millisecond data access using Redis
 - Can build application caches
 - Provides scalable, secure and highly available GCP infrastructure.
 
Cloud Firestore
- Managed, serverless, cloud-native NoSQL document database.
 - Useful for client side mobile and web applications and gaming leaderboards
 
Firebase Realtime Database
- A NoSQL database from GCP to store and sync data between users in real time.
 - Useful for
- creating onboarding flows
 - rolling out new features
 - building serverless apps
 
 
BigQuery
- Serverless, highly scalable, and cost-effective data warehouse service
 - Lowers data warehouse costs as all infrastructure in GCP
 - Useful for
- real-time analytics
 - advanced and predictive analytics
 - large-scale events
 
 
Cloud Datastore
- NoSQL document database service
 - Fully-managed service by GCP
 - Easy scalability without configuration or downtime.
 - Useful for
- user profiles
 - product catalogues
 - mobile games.
 
 - Useful for web and mobile applications which may require massive scale in future
 - Supports storage of unstructured objects, transactions and SQL-like queries.
 - Provides terabytes of capacity
 - maximum unit size of one megabyte per entity.
 
Cloud Storage
- For storing immutable blobs larger than 10 megabytes like images or movies.
 - Provides huge capacity with a maximum unit size of five terabytes per object.
 
Select the right GCP database service
- Existing database – GCP database service
 - Redis – Cloud Memorystore for Redis
 - MemcacheD – App Engine for MemcacheD
 - MySQL – Cloud SQL for MySQL
 - PostgreSQL – Cloud SQL for PostgreSQL
 - SQL Server – Cloud SQL for SQL Server
 - HBase – Cloud Bigtable
 
Use Case
- If need full SQL support with OLTP use Cloud SQL or Cloud Spanner. Cloud SQL provides terabytes capacity and Cloud Spanner provides petabytes capacity
 - For big data analysis and interactive query use BigQuery
 - For semi structured application use Cloud Datastore
 - For analytical data with heavy read/write events like Advertisement Tech, Financial or IoT data use Bigtable
 - To store structured and unstructured, binary data like images/ multimedia files and backups, use Cloud Storage
 - For popular web frameworks use Cloud SQL
 - For huge database applications more than 2 terabytes use Cloud Spanner. Use cases like financial trading and e-commerce.
 
Feature Comparison Table
| Relational | NoSQL / Nonrelational | |||||
| Cloud Spanner | Cloud SQL | Cloud Bigtable | Cloud Firestore | Firebase Realtime Database | Cloud Memorystore | |
| Scale insurance | Yes | Yes | Yes | |||
| Data distribution | Regional or global | zonal | Regional or global | Regional or global | zonal | zonal | 
| OSS compatibility | Yes | Yes | Yes | |||
| Replica consistency | strong | strong | eventual | strong | n/a | eventual | 
| Multi-primary | Yes | Yes | Yes | |||
| Transactions (strong consistency) | Yes | Yes | Yes | |||
| Joins and complex queries | Yes | Yes | ||||
| Ultra low latency (microsecond or single-digit ms) | Yes | Yes | Yes | |||
| Serverless | Yes | Yes | ||||
| Realtime sync to clients | Yes | Yes | ||||
| In-memory | Yes | |||||
| Direct client access | Yes | Yes | ||||
| Game state | Yes | Yes | Yes | |||
| Gaming leaderboard/player profile | Yes | Yes | Yes | |||
Evaluating Cloud Storage Options
The key considerations in evaluation of GCP data storage options, are:
- Scalability – Able to scale as per requirement
 - Durability – High availability and consistency to store critical data
 - Able to store unstructured/semi-structured and structured data
 - Free from fixed schema
 - Separation of components Able to decouple storage, compute and other components for scaling of each.
 - Cost Effective – Should be cost effective and offer pay as you use model.
 
Functional requirements
- Data format to be stored – data type to store like transactional data, JSON objects, telemetry, search indexes, or flat files.
 - Scale and structure. Need for data partitioning and total storage capacity needed
 - Data size supported – Size of entities to store and will be stored as a document, or can be split across multiple
 - Data relationships. Relationship to support one to one, one-to-many or many-to-many relationships
 - Consistency model. – Level of consistency needed ACID needed or accept eventual consistency
 - concurrency needed during data updation and synchronization, pessimistic or optimistic concurrency
 - Schema flexibility. fixed schema or schema less needed
 - Data lifecycle.
 - Data movement. Level of data movement, ETL for moving data to data stores or data warehouses
 
Non-functional requirements
- Performance and scalability. performance requirements needed for data ingestion or processing rates, acceptable response times for query
 - Level of fault-tolerance needed, backup and restore capabilities
 - Either distribute among multiple replicas or regions, replication capabilities needed
 
Management and cost
- Managed service. managed service provided for easy management
 - Region availability. available in all regions or selected ones
 - Does data need to be migrated
 - Proprietary versus OSS usage and license
 - Overall cost.
 
Security
- encryption used, authentication mechanism needed
 - audit log level of details and what can be audited
 - Networking requirements. Any restriction in access to data
 
DevOps
- Skill set. Specific programming languages, operating systems, or other technology needed
 - Clients client support for development languages
 
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz
		