Optimize for Performance and Cost Google Professional Data Engineer GCP

  1. Home
  2. Optimize for Performance and Cost Google Professional Data Engineer GCP

Partitioned table

  • a special table that is divided into segments, called partitions,
  • make it easier to manage and query data.
  • By dividing a large table improve query performance, and control costs
  • partition tables by:
    • Ingestion time
    • Date/timestamp

Clustered tables

  • table data is automatically organized based on the contents of one or more columns in the table’s schema.
  • The columns specified are used to colocate related data.
  • order of columns is important for the sort order of the data.
  • Clustering improve the performance of queries
  • During write data is sorted using the values in clustering columns.
  • clustering is allowed over a partitioned table.
  • Use clustering if
  • data is already partitioned on a date, timestamp, or integer column.
  • using filters or aggregation against particular columns in queries.
  • clustering is based on ingestion time
    • date/timestamp
    • integer range

 

Views

  • a virtual table defined by a SQL query.
  • A view is queried it in the same way you query a table.
  • query results of view contain data only from the tables and fields specified in the query that defines the view.
  • can query views in BigQuery by
    • Query editor box in the Cloud Console
    • Compose Query option in the classic BigQuery web UI
    • BigQuery command-line tool’s bq query command
    • BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
    • BigQuery client libraries
  • Also use view as a data source for a visualization
  • views limitations
    • The dataset that contains view and the dataset that contains the tables referenced by the view must be in the same location.
    • cannot run a BigQuery job that exports data from a view.
    • cannot use the TableDataList JSON API method to retrieve data from a view.
    • cannot reference query parameters in views.
    • If schema change after view is created, the reported schema will be inaccurate
    • cannot include a user-defined function in the SQL query that defines a view.
    • cannot reference a view in a wildcard table query.

 

Reservations

  • two pricing models
  • On-demand pricing: You pay only for the data scanned by queries.
  • Flat-rate pricing: Offers predictable and consistent month-to-month costs.
  • By default, billed as per on-demand pricing model.
  • With BigQuery Reservations, can switch to the flat-rate pricing model by purchasing commitments
  • commitments are dedicated portions of query processing capacity measured in slots with cost of all bytes processed is included in the flat-rate price.

 

INFORMATION_SCHEMA

  • For on-demand pricing, queries against INFORMATION_SCHEMA views 10 MB is the minimum billing amount
  • For flat-rate pricing, queries against INFORMATION_SCHEMA views and tables consume purchased BigQuery slots.
  • INFORMATION_SCHEMA queries are not cached
  • charged each time you run an INFORMATION_SCHEMA query,
  • not charged storage fees for the INFORMATION_SCHEMA views.
Menu