Optimize for Performance and Cost Google Professional Data Engineer GCP
Partitioned table
- a special table that is divided into segments, called partitions,
 - make it easier to manage and query data.
 - By dividing a large table improve query performance, and control costs
 - partition tables by:
- Ingestion time
 - Date/timestamp
 
 
Clustered tables
- table data is automatically organized based on the contents of one or more columns in the table’s schema.
 - The columns specified are used to colocate related data.
 - order of columns is important for the sort order of the data.
 - Clustering improve the performance of queries
 - During write data is sorted using the values in clustering columns.
 - clustering is allowed over a partitioned table.
 - Use clustering if
 - data is already partitioned on a date, timestamp, or integer column.
 - using filters or aggregation against particular columns in queries.
 - clustering is based on ingestion time
- date/timestamp
 - integer range
 
 
Views
- a virtual table defined by a SQL query.
 - A view is queried it in the same way you query a table.
 - query results of view contain data only from the tables and fields specified in the query that defines the view.
 - can query views in BigQuery by
- Query editor box in the Cloud Console
 - Compose Query option in the classic BigQuery web UI
 - BigQuery command-line tool’s bq query command
 - BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
 - BigQuery client libraries
 
 - Also use view as a data source for a visualization
 - views limitations
- The dataset that contains view and the dataset that contains the tables referenced by the view must be in the same location.
 - cannot run a BigQuery job that exports data from a view.
 - cannot use the TableDataList JSON API method to retrieve data from a view.
 - cannot reference query parameters in views.
 - If schema change after view is created, the reported schema will be inaccurate
 - cannot include a user-defined function in the SQL query that defines a view.
 - cannot reference a view in a wildcard table query.
 
 
Reservations
- two pricing models
 - On-demand pricing: You pay only for the data scanned by queries.
 - Flat-rate pricing: Offers predictable and consistent month-to-month costs.
 - By default, billed as per on-demand pricing model.
 - With BigQuery Reservations, can switch to the flat-rate pricing model by purchasing commitments
 - commitments are dedicated portions of query processing capacity measured in slots with cost of all bytes processed is included in the flat-rate price.
 
INFORMATION_SCHEMA
- For on-demand pricing, queries against INFORMATION_SCHEMA views 10 MB is the minimum billing amount
 - For flat-rate pricing, queries against INFORMATION_SCHEMA views and tables consume purchased BigQuery slots.
 - INFORMATION_SCHEMA queries are not cached
 - charged each time you run an INFORMATION_SCHEMA query,
 - not charged storage fees for the INFORMATION_SCHEMA views.
 
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz
		