How to Ensure Continuous Availability with Multiple AWS Accounts

To ensure that a modern, high-performance application can operate smoothly even if a data center experiences an outage, it’s crucial to distribute individual application instances across multiple data centers. This approach is widely recognized as a best practice within the industry and is an essential characteristic to incorporate into your application architecture to increase resilience against potential data center issues.

When constructing an application in the cloud, a similar principle applies. However, in the case of cloud-based applications, you usually lack the ability to determine the specific data center where a particular server or cloud resource is situated. This is part of the abstraction that underpins the value of cloud computing. Nevertheless, the lack of visibility regarding the data centers that host your application makes it challenging to integrate multi-data center resiliency into your applications. If you don’t know which data center your application is operating in, it becomes difficult to guarantee that it is functioning in multiple data centers.

From the beginning, cloud providers have solved for this problem by using an abstraction of the data center that allows you to build on this level of resiliency without being exposed to the details of data center location—this abstraction is the availability zone. However, how availability zones map to data centers is something that is not widely understood.

AWS availability zones

An AWS availability zone is an isolated set of cloud resources that allows specifying a certain level of isolation into your applications. Resources within a single availability zone may be physically or virtually near each other, to the extent that they can be dependent on each other and share subcomponents. For example, two EC2 servers that are in the same availability zone may be in the same data center, in the same rack, or even on the same physical server.

However, cloud resources that are in different availability zones are guaranteed to be separated into distinct data centers. They cannot be in the same data center, they cannot be in the same rack, and they cannot be using the same physical servers. They are distinct and independent from each other.

Within a single region, however, the availability zones are connected to each other by very high-speed, low-latency network connections so that resources in multiple availability zones may work together in a coordinated fashion as needed.

Hence, the solution to the resiliency problem. In an AWS cloud-based application, to have the same level of resiliency that you can have with a multiple redundant physical data center-based application, you can build your application to live in multiple availability zones. If you construct your application so instances of your application are distributed across multiple availability zones, you can isolate yourself from hardware failures such as server failures, rack failures, and even entire data center failures. Using multiple availability zones allows you to build in application resiliency.

Availability zones as data centers

Loosely, availability zones can be thought of as data centers. At a first level of approximation, this is a roughly reasonable assumption to make. But there can be danger in that assumption. First, there is not a one-to-one mapping of availability zones to data centers. When you create your AWS account, your availability zone names are mapped to individual data centers in a dynamic fashion. This means one AWS account may have an availability zone named us-east-1a mapped to data center #4, and another account may have the same availability zone mapped to data center #2.

Worse yet, a given data center may map to different availability zones in different accounts. For example, data center #4 in account #1 may be used for availability zone us-east-1a, but the same data center in account #2 may be used for availability zone us-east-1b.

You can find out how your availability zones are mapped to specific data centers in a given account by looking in the AWS console in the Resource Access Manager (RAM). In the console, select “Resource Access Manager” under the Services menu. On the lower right-hand side, you’ll see a display that looks like this:

Your AZ ID

us-west-2a usw2-az2
us-west-2b usw2-az1
us-west-2c usw2-az3
us-west-2d usw2-az4

This shows a mapping of availability zone names to an AZ ID. An AZ ID is a unique identifier that can be effectively used as a data center identifier. It shows the mapping, for your current account, of each availability zone to its associated data center’s AZ ID. This mapping is shown for your currently selected region, but you can simply switch regions to show the mapping for any region in your account.

In the above example, for this account the availability zone us-west-2b maps to the data center with AZ ID of usw2-az1. In another account, there will be a different mapping.

Mysterious AWS status messages

Ever wonder why, when AWS announces a problem on their status page, they will often say the problem “impacts one or more availability zones” in a given region? They never say which availability zones! The reason for this is due to this mapping. When a problem exists on their site, it exists in one or more data centers. The actual availability zone names associated with those data centers may differ from account to account. Hence, on a status message that is shared broadly, AWS cannot know which availability zone will be impacted for any given user. This is the reason for the somewhat cryptic message.

Importance of AZ ID mapping

This mapping is normally hidden from your view and handled transparently by AWS. For the most part, this is reasonable and acceptable. However, you can run into a problem when your application makes use of multiple AWS accounts. Since availability zone names are randomly assigned to data centers on a per-account basis, this means that a given availability zone in different accounts may map to different data centers.

Now, this doesn’t seem too bad. But it also means that two different availability zone names in two different accounts could both map to the same data center! This can be a problem for availability purposes.

What that means is, if you are using multiple accounts, you can no longer assume that two different availability zones are guaranteed to be in different data centers. This makes it hard to implement the multiple data center best practice discussed earlier.

A solution for multiple AWS accounts

If you are using multiple accounts, and you want to guarantee data center uniqueness across accounts, you cannot use the availability zone name. How, then, do you guarantee your application resides in independent data centers for resiliency purposes?

The answer is to not use the availability zone as your method of enforcing data center independence. Instead, you should use the AZ ID. If two availability zones in different accounts have different AZ IDs, you can be sure that those two availability zones are in distinct data centers. Using the AZ ID, rather than the availability zone name, is a safe way to ensure your applications live in distinct data centers across multiple accounts.

It’s important for availability purposes to ensure that your application makes effective use of multiple data centers for redundancy. To ensure data center independence for a large application that spans multiple AWS accounts, you cannot use the availability zone name as your verification check for independence. Instead, use the AZ ID. Failure to do so can result in architecting an application that has unexpected and undesired internal infrastructure dependencies that could negatively impact its availability.

More articles by Lee Atchison:

Image by Pete Linforth from Pixabay.