Loading Data Google Professional Data Engineer GCP

load data from
- From Cloud Storage
- From other Google services
- From a readable data source
- By inserting individual records using streaming inserts
- Using DML statements to perform bulk inserts
- Using a BigQuery I/O transform in a Dataflow pipeline to write data to BigQuery
can load data into a new table or partition
can also append data to an existing table or partition
can overwrite a table or partition.
method for ingesting data into BigQuery:
- the BigQuery Jobs API
- streaming writes
- writing query results into a table
- loading CSV files from Cloud Storage
- using BigQuery as a Cloud Dataflow sink
The default source format for loading data is CSV.
Also supports streaming inserts by BigQuery API and BigQuery buffers records before insertion.

Load data from Cloud Storage and readable data sources in the following formats:

Cloud Storage:

Readable data source (such as local machine):

Choosing a data ingestion format

can load data into variety of formats.
Loaded data is converted into columnar format for Capacitor (BigQuery’s storage format).
During loading select data ingestion format based on
- data’s schema
- Embedded newlines
- External limitations

Loading encoded data

Loading compressed and uncompressed data

The Avro binary format is the preferred for loading both compressed and uncompressed data.

Loading denormalized, nested, and repeated data

Schema auto-detection

available when you load data into BigQuery
Also when you query an external data source.
Steps
- BigQuery starts inference process by selecting a random file in the data source and scanning up to 100 rows of data to use as sample.
- then examines each field and attempts to assign a data type to that field based on the values in the sample.
use schema auto-detection for JSON or CSV files.
not available for Avro files, ORC files, Parquet files, Datastore exports, or Firestore exports

BigQuery Data Transfer Service

It automates loading data into BigQuery from these services:

Google Software as a Service (SaaS) apps

External cloud storage providers

Data warehouses