Azure Architecture and Service Guarantees

In this, we will learn the Azure Architecture and Service Guarantees. However, Microsoft Azure offers a reliable, redundant, energy-efficient infrastructure to spans more than 100 highly secure facilities worldwide, linked by one of the largest networks on earth. Azure permits to gain global reach with local presence, to keep the data secure and compliant with local laws, and have a reduced carbon footprint with Microsoft’s environment-friendly data-centers.

Datacenters and Regions in Azure

Microsoft Azure is comprised of datacenters all over the world. We use physical equipment in one or more of these places when we use a service or generate a resource like a SQL database or virtual machine. End customers do not have direct access to specific datacenters; instead, Azure groups them into regions.

What is a region?

A region is a geographical area on the planet containing at least one, but potentially multiple datacenters that are nearby and networked together with a low-latency network.
Azure intelligently assigns and controls the resources within each region to ensure workloads are appropriately balanced. When we deploy a resource in Azure, it will often need to choose the region where you want your resource deployed.
Azure has more global regions than any other cloud provider. This gives you the flexibility to bring applications closer to your users no matter where they are. It also provides better scalability, redundancy, and preserves data residency for your services.

Special Azure Regions

Azure has specialized regions that you want to use when building out applications for compliance or legal purposes. These include –

US DoD Central, US Gov Virginia, US Gov Iowa and more: These are physical and logical network-isolated instances of Azure for US government agencies and partners.
China East, China North and more: These regions are available through a unique partnership between Microsoft and 21Vianet, whereby Microsoft does not directly maintain the datacenters.
Regions are what you use to identify the location for your resources, but there are two other terms you should also be aware of: geographies and availability zones.

Geographies

Azure divides the world into geographies that are defined by geopolitical boundaries or country borders. An Azure geography is a discrete market typically containing two or more regions that preserve data residency and compliance boundaries. This division has several benefits.
Geographies allow customers with specific data residency and compliance needs to keep their data and applications close. Geographies ensure that data residency, sovereignty, compliance, and resiliency requirements are honored within geographical boundaries.
- However, geographies are fault-tolerant to withstand complete region failure through their connection to dedicated high-capacity networking infrastructure.
Data residency refers to the physical or geographic location of an organization’s data or information. It defines the legal or regulatory requirements imposed on data based on the country or region in which it resides and is an important consideration when planning out your application data storage.

Geographies are broken up into the following areas:

Americas
Europe
Asia Pacific
Middle East and Africa

Each region belongs to single geography and has specific service availability, compliance, and data residency/sovereignty rules applied to it.

Availability Zones

You want to make sure your data and services are redundant so you can secure your data in the event of a breakdown. When hosting your infrastructure, you’ll need to create many hardware environments. Availability Zones in Azure may help you make your program extremely available.

What is an Availability Zone?

Within an Azure region, Availability Zones are physically independent datacenters. Each Availability Zone is made up of one or more datacenters that are powered, cooled, and networked independently. It’s designed to act as an isolation barrier. If one zone fails, the other continues to function. High-speed, private fiber-optic networks connect the Availability Zones.

Azure Architecture and Service Guarantees region

Supported Regions

Not every region has support for Availability Zones. The following regions have a minimum of three separate zones to ensure resiliency.

Central US
East US 2
West US 2
West Europe
France Central
North Europe
Southeast Asia

Using Availability Zones in your apps

By co-locating your computing, storage, networking, and data resources inside a zone and replicating in other zones, you can utilize Availability Zones to run mission-critical applications and embed high-availability into your application design. Keep in mind that duplicating your services and transmitting data between zones may incur costs.

Virtual machines, managed drives, load balancers, and SQL databases all use Availability Zones. There are two types of Azure services that support Availability Zones:

Zonal services – you pin the resource to a specific zone (for example, virtual machines, managed disks, IP addresses)
Zone-redundant services – platform replicates automatically across zones (for example, zone-redundant storage, SQL Database).

Region Pairs

One or more datacenters are used to construct availability zones, with a minimum of three zones within a single region. A significant enough disaster, on the other hand, may generate an outage large enough to affect even two datacenters. Azure also builds region pairings for this reason.

What is a region pair?

Each Azure area is always matched with a region at least 300 miles distant in the same geography (such as the United States, Europe, or Asia). This method enables the replication of resources (such as virtual machine storage) across a geographic area, reducing the chance of service disruptions caused by natural catastrophes, civil unrest, power outages, or physical network failures that strike both locations at the same time.

Additional advantages of region pairs include –

If there’s an extensive Azure outage, one region out of every pair is prioritized to help reduce the time it takes to restore them for applications.
Planned Azure updates are rolled out to paired regions one region at a time to minimize downtime and risk of application outage.
Data continues to reside within the same geography as its pair (except for Brazil South) for tax and law enforcement jurisdiction purposes.

Service Level Agreements for Azure

Microsoft adheres to a comprehensive set of operational rules, standards, and procedures in order to continue its commitment to providing high-quality products and services to its customers. Service-Level Agreements (SLAs) are formal papers that encapsulate the particular phrases that establish the Azure performance requirements.

SLAs describe Microsoft’s commitment to providing Azure customers with specific performance standards.
There are SLAs for individual Azure products and services.
SLAs also specify what happens if a service or product fails to perform to a governing SLA’s specification.

SLAs for Azure products and services

There are three key characteristics of SLAs for Azure products and services:

Performance Targets
Uptime and Connectivity Guarantees
Service credits

Performance Targets

A service level agreement (SLA) specifies the performance goals for an Azure product or service. An SLA specifies performance goals that are unique to each Azure product and service. Some Azure services, for example, include uptime guarantees or connection rates as performance objectives.

Uptime and Connectivity Guarantees

A typical SLA specifies performance-target commitments that range from 99.9 percent (“three nines”) to 99.999 percent (“five nines”), for each corresponding Azure product or service. These targets can apply to such performance criteria as uptime or response times for services.

The following table lists the potential cumulative downtime for various SLA levels over different durations:

For example, the SLA for the Azure Cosmos DB (Database) service SLA offers 99.999 percent uptime, which includes low-latency commitments of less than 10 milliseconds on DB read operations as well as on DB write operations.

Service Credits

SLAs also specify how Microsoft will respond if an Azure product or service fails to meet the requirements of its governing SLA. Customers may, for example, receive a reduction on their Azure payment as compensation for a poor-performing Azure product or service. This example is further explained in the table below.

Monthly uptime percentage SLA objectives for a single instance Azure Virtual Machine are shown in the first column of the table below. If the actual uptime for that month is less than the stated SLA goal, the second column displays the equivalent service credit amount.

Composing SLAs across services (5 minutes)

When combining SLAs across different service offerings, the resultant SLA is called a Composite SLA. The resulting composite SLA can provide higher or lower uptime values, depending on your application architecture.

Calculating downtime

Consider an App Service web app that writes to Azure SQL Database. These Azure services currently have the following SLAs:

In this example, if either service fails the whole application will fail. In general, the individual probability values for each service are independent. However, the composite SLA value for this application is:

99.95 percent × 99.99 percent = 99.94 percent

This means the combined probability of failure is higher than the individual SLA values. This isn’t surprising, because an application that relies on multiple services has more potential failure points.
Conversely, you can improve the composite SLA by creating independent fallback paths. For example, if SQL Database is unavailable, you can put transactions into a queue for processing at a later time.

With this design, the application is still available even if it can’t connect to the database. However, it fails if both the database and the queue fail simultaneously.
If the expected percentage of time for a simultaneous failure is 0.0001 × 0.001, the composite SLA for this combined path of a database or queue would be:
- 1.0 − (0.0001 × 0.001) = 99.99999 percent
Therefore, if we add the queue to our web app, the total composite SLA is:
- 99.95 percent × 99.99999 percent = ~99.95 percent
Notice we’ve improved our SLA behavior. However, there are trade-offs to using this approach: the application logic is more complicated, you are paying more to add the queue support, and there may be data-consistency issues you’ll have to deal with due to retry behavior.

Improve your app reliability

You can use SLAs to evaluate how your Azure solutions meet business requirements and the needs of your clients and users. By creating your own SLAs, you can set performance targets to suit your specific Azure application. This approach is known as an Application SLA.

Understand your app requirements

Knowing your workload needs is necessary for building an effective and dependable Azure solution. After that, you may choose Azure products and services and supply resources based on your needs. Understanding the Azure SLAs, which specify performance standards for Azure products and services inside your solution, is critical. This knowledge will assist you in developing realistic Application SLAs.

Failures will occur in a distributed system. Hardware can break down. Transient failures can occur in the network. Even though it’s unusual for a whole service or area to be disrupted, it must be planned for.

Resiliency

The capacity of a system to recover from failures and continue to function is known as resiliency. It’s not so much about preventing failures as it is about responding to them in a way that minimizes downtime and data loss. The purpose of resilience is to get an application back to full functionality after a failure. Resiliency requires high availability and catastrophe recovery.

You should consider resiliency while developing your architecture, and you should do a Failure Mode Analysis (FMA). An FMA’s purpose is to identify potential failure spots and outline how the application will respond to such failures.

Cost and complexity vs. high availability

Availability refers to the time that a system is functional and working. Maximizing availability requires implementing measures to prevent possible service failures. However, devising preventative measures can be difficult and expensive, and often results in complex solutions.

As your solution grows in complexity, you will have more services depending on each other. Therefore, you might overlook possible failure points in your solution if you have several interdependent services.

For example: A workload that requires 99.99 percent uptime shouldn’t depend upon a service with a 99.9 percent SLA.

Most providers prefer to maximize the availability of their Azure solutions by minimizing downtime. However, as you increase availability, you also increase the cost and complexity of your solution.

Considerations for defining application SLAs

If your application SLA defines four 9’s (99.99%) performance targets, recovering from failures by manual intervention may not be enough to fulfill your SLA. Your Azure solution must be self-diagnosing and self-healing instead.
It is difficult to respond to failures quickly enough to meet SLA performance targets above four 9’s.
Carefully consider the time window against which your application SLA performance targets are measured. The smaller the time window, the tighter the tolerances. If you define your application SLA as hourly or daily uptime, you need to understand these tighter tolerances might not allow for achievable performance targets.
Microsoft provides more global presence than any other cloud provider with over 54 regions distributed worldwide. This infrastructure gives you the scale needed to bring your applications closer to users around the world.
Azure also has dedicated regions to support government use and applications that need to be deployed in China so you can ensure data security and residency and meet compliance and resilience requirements for your customers no matter what type of business requirements you have.

For more on Tutorial visit – Microsoft Azure Fundamental (AZ-900)