Partitioning and horizontal scaling in Azure Cosmos DB

  1. Home
  2. Partitioning and horizontal scaling in Azure Cosmos DB

Go back to DP-200 Tutorials

In this tutorial, we will learn and understand about partitioning and horizontal scaling in Azure Cosmos DB. Then, we will discuss the relationship between logical and physical partitions. Further, we will understand the best practices for partitioning with an in-depth view of the working of horizontal scaling in Azure Cosmos DB.

Logical partitions

A logical partition contains a set of items having the same partition key. For example, in a container consisting of data about food nutrition, all items contain a foodGroup property. Here, you can use foodGroup as the partition key for the container. However, the groups of items with specific values for foodGroup like Beef Products,Baked Products, and Sausages and Luncheon Meats, form distinct logical partitions. 

Moreover, a logical partition also explains the scope of database transactions. That is to say, you can update items within a logical partition by using a transaction with snapshot isolation. And, when there is addition of new items to a container, then new logical partitions are transparently created by the system.

Not to mention, there is no limit to the number of logical partitions in your container as each logical partition can store up to 20GB of data. 

Physical partitions

An Azure Cosmos container is scaled by distributing data and throughput across physical partitions. In this, internally there are one or more logical partitions that map to a single physical partition. However, most small Cosmos containers have many logical partitions but only require a single physical partition. In contrast with logical partitions, physical partitions are an internal implementation of the system. 

Further, the number of physical partitions in your Cosmos container depends on the following:

  • Firstly, the amount of provisioned throughput in which each individual physical partition can provide a throughput of up to 10,000 request units per second.
  • Secondly, total data storage in which each individual physical partition can store up to 50GB.
DP-200 practice tests

However, there is no limit to the total number of physical partitions in your container. That is to say, as provisioned throughput or data size grows, Azure Cosmos DB automatically creates new physical partitions by splitting existing ones. And, physical partition splits do not impact your application’s availability. Next, after splitting the physical partition, all data within a single logical partition will still be stored on the same physical partition. However, a physical partition split creates a new mapping of logical partitions to physical partitions.

Lastly, throughput provisioning for a container divides evenly among physical partitions. However, a partition key design that doesn’t distribute the throughput requests evenly might create “hot” partitions. Here, Hot partitions might result in rate-limiting and in inefficient use of the provisioned throughput, and higher costs.

For this, you can view your container’s physical partitions in the Storage section of the Metrics blade of the Azure portal:
container's physical partitioning in the Storage section
Image Source: Microsoft

In the above example container where we have chosen /foodGroup as our partition key. And, each of the three rectangles represents a physical partition. However, in the image, the partition key range is the same as a physical partition. And, the selected physical partition contains three logical partitions that includes Beef Products, Vegetable and Vegetable Products, and Soups, Sauces, and Gravies.

Further, it’s important to choose a partition key that evenly distributes throughput consumption by choosing the right logical partition key. And, if you choose a partition key that evenly distributes throughput consumption across logical partitions. Then you will ensure that throughput consumption across physical partitions is balanced.

Replica sets

Each physical partition contains a set of replicas that is known as a replica set. And, each replica set hosts an instance of the Azure Cosmos database engine. However, a replica set makes the storage of data within the physical partition durable, highly available, and consistent. And, each replica that makes up the physical partition inherits the partition’s storage quota. Further, all replicas of a physical partition collectively support the throughput that allocates to the physical partition. Azure Cosmos DB automatically manages replica sets.

Most small Cosmos containers only need a single physical partition but will still have at least 4 replicas.

The image shows mapping of logical partitions to physical partitions that are distributed globally:

physical partitioning
Image Source: Microsoft
DP-200 Online course

Reference: Microsoft Documentation

Go back to DP-200 Tutorials

Menu