Google Products and Storage Options Google Professional Data Engineer GCP

  1. Home
  2. Google Products and Storage Options Google Professional Data Engineer GCP

Various storage systems in Google cloud are discussed with their uses.

Cloud SQL

  • A fully managed relational database service
  • Easily set up and manage RDBMS – PostgreSQL, MySQL, and SQL Server in GCP
  • Apt to be used for WordPress, backends, CRM tools, MySQL, PostgreSQL, and Microsoft SQL Servers

 

Cloud Spanner

  • A scalable relational database service
  • Full transactions support
  • Provides strong consistency and high availability
  • Useful for mission-critical applications
  • Provides scale insurance

 

Cloud Bigtable

  • NoSQL database service from GCP
  • Provides low latency reads
  • Supports high throughput writes
  • Enables scalability and reliability
  • Suitable for large analytical workloads and low-latency applications
  • Store large amount of structured objects.
  • No support for SQL’s queries or multi-row transactions.
  • Provision for capacity petabytes with a maximum unit size of 10 megabytes per cell and 100 megabytes per row.

 

Cloud Memorystore

  • A managed in-memory data store service for Redis
  • Useful for sub-millisecond data access using Redis
  • Can build application caches
  • Provides scalable, secure and highly available GCP infrastructure.

 

 

Cloud Firestore

  • Managed, serverless, cloud-native NoSQL document database.
  • Useful for client side mobile and web applications and gaming leaderboards

 

Firebase Realtime Database

  • A NoSQL database from GCP to store and sync data between users in real time.
  • Useful for
    • creating onboarding flows
    • rolling out new features
    • building serverless apps

BigQuery

  • Serverless, highly scalable, and cost-effective data warehouse service
  • Lowers data warehouse costs as all infrastructure in GCP
  • Useful for
    • real-time analytics
    • advanced and predictive analytics
    • large-scale events

Cloud Datastore

  • NoSQL document database service
  • Fully-managed service by GCP
  • Easy scalability without configuration or downtime.
  • Useful for
    • user profiles
    • product catalogues
    • mobile games.
  • Useful for web and mobile applications which may require massive scale in future
  • Supports storage of unstructured objects, transactions and SQL-like queries.
  • Provides terabytes of capacity
  • maximum unit size of one megabyte per entity.

 

Cloud Storage

  • For storing immutable blobs larger than 10 megabytes like images or movies.
  • Provides huge capacity with a maximum unit size of five terabytes per object.

 

 

Select the right GCP database service

  • Existing database – GCP database service
  • Redis – Cloud Memorystore for Redis
  • MemcacheD – App Engine for MemcacheD
  • MySQL – Cloud SQL for MySQL
  • PostgreSQL – Cloud SQL for PostgreSQL
  • SQL Server – Cloud SQL for SQL Server
  • HBase – Cloud Bigtable

 

Use Case

  • If need full SQL support with OLTP use Cloud SQL or Cloud Spanner. Cloud SQL provides terabytes capacity and Cloud Spanner provides petabytes capacity
  • For big data analysis and interactive query use BigQuery
  • For semi structured application use Cloud Datastore
  • For analytical data with heavy read/write events like Advertisement Tech, Financial or IoT data use Bigtable
  • To store structured and unstructured, binary data like images/ multimedia files and backups, use Cloud Storage
  • For popular web frameworks use Cloud SQL
  • For huge database applications more than 2 terabytes use Cloud Spanner. Use cases like financial trading and e-commerce.

Feature Comparison Table

Relational NoSQL / Nonrelational
Cloud Spanner Cloud SQL Cloud Bigtable Cloud Firestore Firebase Realtime Database Cloud Memorystore
Scale insurance Yes Yes Yes
Data distribution Regional or global zonal Regional or global Regional or global zonal zonal
OSS compatibility Yes Yes Yes
Replica consistency strong strong eventual strong n/a eventual
Multi-primary Yes Yes Yes
Transactions (strong consistency) Yes Yes Yes
Joins and complex queries Yes Yes
Ultra low latency (microsecond or single-digit ms) Yes Yes Yes
Serverless Yes Yes
Realtime sync to clients Yes Yes
In-memory Yes
Direct client access Yes Yes
Game state Yes Yes Yes
Gaming leaderboard/player profile Yes Yes Yes

 

Evaluating Cloud Storage Options

The key considerations in evaluation of GCP data storage options, are:

  • Scalability – Able to scale as per requirement
  • Durability – High availability and consistency to store critical data
  • Able to store unstructured/semi-structured and structured data
  • Free from fixed schema
  • Separation of components Able to decouple storage, compute and other components for scaling of each.
  • Cost Effective – Should be cost effective and offer pay as you use model.

Functional requirements

  • Data format to be stored – data type to store like transactional data, JSON objects, telemetry, search indexes, or flat files.
  • Scale and structure. Need for data partitioning and total storage capacity needed
  • Data size supported – Size of entities to store and will be stored as a document, or can be split across multiple
  • Data relationships. Relationship to support one to one, one-to-many or many-to-many relationships
  • Consistency model. – Level of consistency needed ACID needed or  accept eventual consistency
  • concurrency needed during data updation and synchronization, pessimistic or optimistic concurrency
  • Schema flexibility. fixed schema or schema less needed
  • Data lifecycle.
  • Data movement. Level of data movement, ETL for moving data to data stores or data warehouses

 

Non-functional requirements

  • Performance and scalability. performance requirements needed for data ingestion or processing rates, acceptable response times for query
  • Level of fault-tolerance needed, backup and restore capabilities
  • Either distribute among multiple replicas or regions, replication capabilities needed

 

Management and cost

  • Managed service. managed service provided for easy management
  • Region availability. available in all regions or selected ones
  • Does data need to be  migrated
  • Proprietary versus OSS usage and license
  • Overall cost.

 

Security

  • encryption used, authentication mechanism needed
  • audit log level of details and what can be audited
  • Networking requirements. Any restriction in access to data

 

DevOps

  • Skill set. Specific programming languages, operating systems, or other technology needed
  • Clients client support for development languages
Menu