Big Data Architectures Google Professional Data Engineer GCP

  1. Home
  2. Big Data Architectures Google Professional Data Engineer GCP

A big data architecture is designed to handle

  • Ingestion
  • Processing
  • analysis

of too large or complex data which RDBMS are not able to manage

 

Usually workloads involved are

  • Batch processing for big data sources at rest.
  • Real-time processing for big data in motion.
  • Interactive exploration of big data in motion or rest
  • Predictive analytics and machine learning of big data.

 

Big data architectures is applied when

  • traditional database unable to store and process large volumes of data
  • To process and transform unstructured data for analysis and reporting.
  • For unbounded streams of data, acquire, process and analysis in real time or low latency.

Components of a big data architecture

  • Data sources: It is the essential component and may include
    • Application data stores or RDBMS
    • Static files given by applications like logs
    • Real-time data sources as from IoT devices.
  • Data storage. Distributed file store are used to for data storage to store huge volumes in various formats. .
  • Batch processing. It is applied as huge data volume for data processing jobs like filtering, aggregation, or  prepare for analysis. Steps involved are reading data, processing and writing to new files in a batch manner.
  • Real-time message ingestion. For real-time data sources and involves capture and storage of messages for real-time stream processing.
    • A message ingestion store is used as a buffer and called as stream buffering.
    • Stream processing. After capture, real-time processing done like filtering, aggregation, or data preparation for analysis.
    • Output of processing foes to a output sink.
  • Analytical data store. Big data requires data preparation for analysis and provide data as per analysis tool requirement. The analytical data store provides the storage and replies to the queries.
  • Analysis and reporting. For providing insights by data analysis and reporting.
  • Repeated data processing tasks are orchestrated as workflows and automated from data capture, transform and analysis.
Menu