Cloud Pub/Sub Overview Google Professional Data Engineer GCP

  1. Home
  2. Cloud Pub/Sub Overview Google Professional Data Engineer GCP
  • use Pub/Sub as messaging-oriented middleware
  • use as event ingestion and delivery for streaming analytics pipelines.
  • offers durable message storage and real-time message delivery
  • gives high availability and consistent performance at scale.
  • It is a publish/subscribe (Pub/Sub) service
  • senders of messages are decoupled from the receivers of messages
  • Main terms
  • Message: the data that moves through the service.
  • Topic: a named entity that represents a feed of messages.
  • Subscription: a named entity that represents an interest in receiving messages on a particular topic.
  • Publisher (also called a producer): creates messages and sends (publishes) them to the messaging service on a specified topic.
  • Subscriber (also called a consumer): receives messages on a specified subscription.
  • publisher creates and sends messages to a topic.
  • Subscriber applications create a subscription to a topic to receive messages from it.
  • Communication can be
    • one-to-many (fan-out)
    • many-to-one (fan-in),
    • many-to-many.

The flow of messages through Pub/Sub is as

In above figure

  • There are two publishers publishing messages on a single topic.
  • 2 subscriptions to the topic
  • The first subscription has two subscribers, so messages will be load-balanced across them
  • each subscriber receiving a subset of the messages
  • The second subscription has one subscriber that will receive all of the messages.
  • The bold letters are messages.
  • Message A comes from Publisher 1 and sent to Subscriber 2 via Subscription 1, and to Subscriber 3 via Subscription 2.
  • Message B comes from Publisher 2 and is sent to Subscriber 1 via Subscription 1 and to Subscriber 3 via Subscription 2.

Publisher and subscriber endpoints

  • Publishers should make HTTPS requests to pubsub.googleapis.com
  • It can be
    • an App Engine app
    • a web service hosted on Google Compute Engine
    • other third-party network, an app installed on a desktop or mobile device, or even a browser.
  • Pull subscribers make HTTPS requests to pubsub.googleapis.com.
  • Push subscribers must be Webhook endpoints that can accept POST requests over HTTPS.

Common use cases

  • Balancing workloads in network clusters.
  • Implementing asynchronous workflows.
  • Distributing event notifications.
  • Refreshing distributed caches.
  • Logging to multiple systems.
  • Data streaming from various processes or devices.

Architecture

  • Publisher and subscriber clients are not aware of the location of the servers to which they connect or how those services route the data.
  • load balancing direct publisher traffic to the nearest GCP data center
  • individual message is stored in a single region.
  • topic may have messages stored in many regions.
  • When a subscriber client requests messages published to this topic, it connects to the nearest server
  • Pub/Sub is divided into two primary parts:
    • the data plane managing moving messages between publishers and subscribers,
    • the control plane, managing assignment of publishers and subscribers to servers on the data plane.
  • data plane servers are called forwarders
  • control plane servers are called routers
  • publishers and subscribers are connected to their assigned forwarders
  • so easily upgrade the control plane of Pub/Sub without affecting any clients
  • All message is a base64-encoded message body and an arbitrary set of key-value pairs called attributes.
  • There is no structure or context to the message so, JSON or XML entities must be enforced. Each message has a globally unique message ID, to identify if it has already been processed.
  • Messages may be up to 10 MB in total size
  • Two methods for message delivery: push subscriptions and pull subscriptions
    • In a push subscription, server sends a request to the subscriber app at a preconfigured URL endpoint.
    • In the pull model, the subscriber requests messages from the server and acknowledges receipt.
  • Push subscriptions have a limits of 10,000 messages per second and 10,000 concurrent message deliveries.
  • By default, subscriptions are created with an ack deadline of 10 seconds and the message deadline may be increased to up to 600 seconds.
  • By default, subscriptions expire after 31 days of inactivity
  • Using subscription expiration policies, can configure the inactivity duration

Topic and Message Management

  • can create, delete, and view topics using
    • the API
    • the Google Cloud Console
    • the gcloud command-line tool
  • must create a subscription to a topic before subscribers can receive messages published to the topic.
  • create subscriptions with
    • the API
    • the Google Cloud Console
    • the gcloud command-line tool

 

Resource Name

  • Resource name uniquely identifies a Pub/Sub resource
  • Resource can be a subscription or topic
  • must fit the following format: projects/project-identifier/collection/relative-name
  • The project-identifier must be the project ID, available from the Google Cloud Console. For example, projects/myproject/topics/mytopic.
  • The collection must be one of subscriptions or topics.
  • The relative-name must:
    • Not begin with the string goog.
    • Start with a letter
    • Contain between 3 and 255 characters
    • Contain only the following characters:
      • Letters: [A-Za-z]
      • Numbers: [0-9]
      • Dashes: –
      • Underscores: _
      • Periods: .
      • Tildes: ~
      • Plus signs: +
      • Percent signs: %

 

Message Storage Security

  • If publish messages to a global endpoint, automatic storage in the nearest Google Cloud region.
  • topic message storage policy ensure all data published to a topic is persisted in a specific region or set of regions, regardless of the publish request’s origin.
  • When multiple regions are allowed by the policy, Pub/Sub chooses the nearest allowed region.
    • To configure all of the topics in an organization-wide scope, use the Resource Location Restriction organization policy.
    • For fine-grained control, configure a topic’s message storage policy at topic creation, or via the UpdateTopic operation.
  • You can configure the policy using the:
    • Topic details view
    • gcloud command-line tool
    • Service API (using client libraries)

Authentication and Access

  • Following authentication methods allowed
    • Service accounts
    • User accounts – You can authenticate users directly to application, when the application needs to access resources on behalf of an end user.
  • Uses Cloud IAM for access control
  • access control can be configured at the project level and at the individual resource level.
Menu