Solution for Autoscaling in Azure

AZ-304 exam is retired. AZ-305 replacement is available.

In this tutorial we will learn about the process of autoscaling in Azure with other important functional areas of autoscaling.

Autoscaling refers to the process of dynamically allocating resources for matching performance requirements. As there is an increase in work volume, an application may require additional resources for maintaining the desired performance levels as well as satisfying service-level agreements (SLAs). Autoscaling takes benefits from the elasticity of cloud-hosted environments while easing management overhead. Moreover, it reduces the need for an operator to continually monitoring the performance of a system and making decisions about adding or removing resources.

For scaling application, there are two ways:

Firstly, Vertical scaling. This process is scaling up and down, which means changing the capacity of a resource. However, this requires making the system temporarily unavailable while there is redeployment.
Then, Horizontal scaling. This refers to scaling out and in, which means adding or removing instances of a resource. In this, the application continuously runs without interruption as there is the provisioning of new resources. However, when the provisioning process is complete, the solution deployment is done on these additional resources.

However, an autoscaling strategy typically involves the following pieces:

Firstly, instrumentation and monitoring systems at the application, service, and infrastructure levels.
Secondly, decision-making logic for evaluating these metrics against predefined thresholds or schedules, and decides whether to scale.
Thirdly, components that scale the system.
Lastly, testing, monitoring, and tuning of the autoscaling strategy for ensuring that it functions as expected.

Configuring Autoscaling for an Azure solution

Azure offers built-in autoscaling for most compute options. This include:

Firstly, Azure Virtual Machines autoscale via virtual machine scale sets. This manages a set of Azure virtual machines as a group.
Secondly, Service Fabric. This also supports autoscaling through virtual machine scale sets. In this, every node type in a Service Fabric cluster is set up as a separate virtual machine scale set.
Then, Azure App Service. This has built-in autoscaling in which the Autoscale settings apply to all of the apps within an App Service.
Lastly, Azure Cloud Services. This has built-in autoscaling at the role level.

However, these compute options use Azure Monitor autoscale for providing a common set of autoscaling functionality. You should know that Azure Functions differs from the previous compute options, because you don’t need to configure any autoscale rules. Rather, Azure Functions automatically allocates compute power when your code is running and scaling out as necessary for handling load.

Using Azure Monitor autoscaling

Azure Monitor autoscaling provides a common set of autoscaling functionality for virtual machine scale sets, Azure App Service, and Azure Cloud Service. However, scaling can be performed on a schedule, or based on a runtime metric, such as CPU or memory usage.

However, you can configure autoscaling by using PowerShell, the Azure CLI, an Azure Resource Manager template, or the Azure portal. The Azure Monitoring Service Management Library and the Microsoft Insights Library are SDKs that gives access for collecting metrics from different resources. And then, performs autoscaling by making use of the REST APIs.

Points to consider when using Azure autoscale:

Firstly, consider if you can predict the load on the application accurately enough for using scheduled autoscaling, adding and removing instances for meeting anticipated peaks in demand. However, if not, then use reactive autoscaling based on runtime metrics for handling unpredictable changes in demand.
Secondly, it’s difficult to understand the relationship between metrics and capacity requirements, especially when an application goes through initial deployment. However, provisioning a little extra capacity at the beginning, and then monitoring and tuning the autoscaling rules to bring the capacity closer to the actual load.
Thirdly, configure the autoscaling rules, and then monitor the performance of your application over time. Then, use the results of this monitoring for adjusting the way in which the system scales if necessary.
Fourthly, use Autoscaling rules that provide detection mechanisms for measuring trigger attribute as an aggregated value over time, rather than instantaneous values for triggering an autoscaling action. Moreover, it provides time for new instances that automatically starts settling into running mod.
After that, avoid flapping where scale-in and scale-out actions continually go back and forth. However, the flapping situation can be controlled by choosing an adequate margin between the scale-out and scale-in thresholds.
Then, use manual scaling that can reset by the maximum and a minimum number of instances used for autoscaling. However, if you manually update the instance count to a value higher or lower than the maximum value, the autoscale engine automatically scales back to the minimum (if lower) or the maximum.
Lastly, you should know that the autoscale engine processes only one profile at a time. So, if a condition is not met, then it checks for the next profile. Therefore, keep key metrics out of the default profile because that profile is checked last. Within a profile, you can have multiple rules.

Process of Azure Monitor scaling

If you configure autoscaling using the SDK rather than the portal, then you can specify a more detailed schedule during which the rules are active. Moreover, you can also develop your own metrics and use them with or without any of the existing ones in your autoscaling rules.
Secondly, when autoscaling Service Fabric, the node types in your cluster are made of virtual machine scale sets at the back end. So, it is necessary to set up autoscale rules for each node type. After that, take into account the number of nodes that you must have before you set up autoscaling. Also, the minimum number of nodes that you must-have for the primary node type is driven by the reliability level.
Thirdly, you can use the portal to link resources like SQL Database instances and queues to a Cloud Service instance. This allows you to more easily access the separate manual and automatic scaling configuration options for each of the linked resources.
Fourthly, when you configure multiple policies and rules, they could conflict with each other. So for this, Autoscale uses the following conflict resolution rules to ensure that there is always a sufficient number of instances running.

Lastly, in an App Service Environment, any worker pool or front-end metrics defines autoscale rules.

Related patterns and guidance

The patterns and guidance below may be relevant for your scenario when implementing autoscaling:

Firstly, Throttling pattern. This explains how an application can continue to function and meet SLAs when an increase in demand places an extreme load on resources. It works with autoscaling for preventing a system from affecting while the system scales out.
Secondly, the competing Consumers pattern. This explains how to implement a pool of service instances that can handle messages from any application instance. This approach enables a system for processing multiple messages for optimizing throughput, improving scalability and availability, and balancing the workload.
Lastly, Monitoring and diagnostics. Instrumentation and telemetry are important for collecting information that can run the autoscaling process.

Azure Autoscaling in AZ-304 online course

Reference: Microsoft Documentation

Go back to AZ-304 Tutorials