Cloud Pub/Sub Overview Google Professional Data Engineer GCP
- use Pub/Sub as messaging-oriented middleware
 - use as event ingestion and delivery for streaming analytics pipelines.
 - offers durable message storage and real-time message delivery
 - gives high availability and consistent performance at scale.
 - It is a publish/subscribe (Pub/Sub) service
 - senders of messages are decoupled from the receivers of messages
 - Main terms
 - Message: the data that moves through the service.
 - Topic: a named entity that represents a feed of messages.
 - Subscription: a named entity that represents an interest in receiving messages on a particular topic.
 - Publisher (also called a producer): creates messages and sends (publishes) them to the messaging service on a specified topic.
 - Subscriber (also called a consumer): receives messages on a specified subscription.
 - publisher creates and sends messages to a topic.
 - Subscriber applications create a subscription to a topic to receive messages from it.
 - Communication can be
- one-to-many (fan-out)
 - many-to-one (fan-in),
 - many-to-many.
 
 
The flow of messages through Pub/Sub is as

In above figure
- There are two publishers publishing messages on a single topic.
 - 2 subscriptions to the topic
 - The first subscription has two subscribers, so messages will be load-balanced across them
 - each subscriber receiving a subset of the messages
 - The second subscription has one subscriber that will receive all of the messages.
 - The bold letters are messages.
 - Message A comes from Publisher 1 and sent to Subscriber 2 via Subscription 1, and to Subscriber 3 via Subscription 2.
 - Message B comes from Publisher 2 and is sent to Subscriber 1 via Subscription 1 and to Subscriber 3 via Subscription 2.
 
Publisher and subscriber endpoints
- Publishers should make HTTPS requests to pubsub.googleapis.com
 - It can be
- an App Engine app
 - a web service hosted on Google Compute Engine
 - other third-party network, an app installed on a desktop or mobile device, or even a browser.
 
 

- Pull subscribers make HTTPS requests to pubsub.googleapis.com.
 - Push subscribers must be Webhook endpoints that can accept POST requests over HTTPS.
 
Common use cases
- Balancing workloads in network clusters.
 - Implementing asynchronous workflows.
 - Distributing event notifications.
 - Refreshing distributed caches.
 - Logging to multiple systems.
 - Data streaming from various processes or devices.
 
Architecture
- Publisher and subscriber clients are not aware of the location of the servers to which they connect or how those services route the data.
 - load balancing direct publisher traffic to the nearest GCP data center
 - individual message is stored in a single region.
 - topic may have messages stored in many regions.
 - When a subscriber client requests messages published to this topic, it connects to the nearest server
 - Pub/Sub is divided into two primary parts:
- the data plane managing moving messages between publishers and subscribers,
 - the control plane, managing assignment of publishers and subscribers to servers on the data plane.
 
 - data plane servers are called forwarders
 - control plane servers are called routers
 - publishers and subscribers are connected to their assigned forwarders
 - so easily upgrade the control plane of Pub/Sub without affecting any clients
 - All message is a base64-encoded message body and an arbitrary set of key-value pairs called attributes.
 - There is no structure or context to the message so, JSON or XML entities must be enforced. Each message has a globally unique message ID, to identify if it has already been processed.
 - Messages may be up to 10 MB in total size
 - Two methods for message delivery: push subscriptions and pull subscriptions
- In a push subscription, server sends a request to the subscriber app at a preconfigured URL endpoint.
 - In the pull model, the subscriber requests messages from the server and acknowledges receipt.
 
 - Push subscriptions have a limits of 10,000 messages per second and 10,000 concurrent message deliveries.
 - By default, subscriptions are created with an ack deadline of 10 seconds and the message deadline may be increased to up to 600 seconds.
 - By default, subscriptions expire after 31 days of inactivity
 - Using subscription expiration policies, can configure the inactivity duration
 
Topic and Message Management
- can create, delete, and view topics using
- the API
 - the Google Cloud Console
 - the gcloud command-line tool
 
 - must create a subscription to a topic before subscribers can receive messages published to the topic.
 - create subscriptions with
- the API
 - the Google Cloud Console
 - the gcloud command-line tool
 
 
Resource Name
- Resource name uniquely identifies a Pub/Sub resource
 - Resource can be a subscription or topic
 - must fit the following format: projects/project-identifier/collection/relative-name
 - The project-identifier must be the project ID, available from the Google Cloud Console. For example, projects/myproject/topics/mytopic.
 - The collection must be one of subscriptions or topics.
 - The relative-name must:
- Not begin with the string goog.
 - Start with a letter
 - Contain between 3 and 255 characters
 - Contain only the following characters:
- Letters: [A-Za-z]
 - Numbers: [0-9]
 - Dashes: –
 - Underscores: _
 - Periods: .
 - Tildes: ~
 - Plus signs: +
 - Percent signs: %
 
 
 
Message Storage Security
- If publish messages to a global endpoint, automatic storage in the nearest Google Cloud region.
 - topic message storage policy ensure all data published to a topic is persisted in a specific region or set of regions, regardless of the publish request’s origin.
 - When multiple regions are allowed by the policy, Pub/Sub chooses the nearest allowed region.
- To configure all of the topics in an organization-wide scope, use the Resource Location Restriction organization policy.
 - For fine-grained control, configure a topic’s message storage policy at topic creation, or via the UpdateTopic operation.
 
 - You can configure the policy using the:
- Topic details view
 - gcloud command-line tool
 - Service API (using client libraries)
 
 
Authentication and Access
- Following authentication methods allowed
- Service accounts
 - User accounts – You can authenticate users directly to application, when the application needs to access resources on behalf of an end user.
 
 - Uses Cloud IAM for access control
 - access control can be configured at the project level and at the individual resource level.
 
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz
		