What does it mean to “architect for scale” and why do you need to do so? Architecting for scale is about building and updating critical applications so they deliver what your increasingly demanding digital customers expect. Remember, your application’s performance, more and more, will be compared with the likes of Amazon and Instagram and Facebook. Architecting for scale is a way of thinking, designing, planning for, and executing so your applications meet the needs and demands of a growing customer base, no matter the size or their expectations.

Architecting for scale is about applications and ensuring they and your business keep up to date with modern customer expectations.

Architecting for scale is critical for all modern applications and all modern digitally engaged organizations. It’s so important that I felt the need to outline eight things you probably don’t know about architecting for scale.

1. Availability and scalability go hand in hand

You can’t scale your application without addressing availability. And you can’t solve availability issues without dealing with scalability.

These two concepts—availability and scalability—go hand in hand. Therefore, you must address them both to adequately address either one.

When customer load increases and the application cannot respond to the increased traffic, the expected result is slowdowns, brownouts, and blackouts. This upsets your customers and ultimately turns them away from you. When your customers can’t use your application the way they need to use it—if they find it too slow, unresponsive, or unavailable—they will go to your competitors instead. You don’t need to worry about scaling your application if you don’t have any customers.

Scalability and availability are two parts of the same problem.

2. Scaling is not just about increasing traffic to your application

Sometimes when we think about scaling, we think about growing the size of our customer base. This might be by increasing the number of simultaneous people who can log in to our application. This might be by increasing the size of our data store to handle the storage needs of more and larger customers. This might be by increasing the number of locations our application can operate from, which increases our customer footprint.

But scaling is more than that. As the number of customers grows, your business expands, and the needs of your application increase. To keep customers coming to your business, you expand and add more capabilities.

Not only is your application handling more customers with more data, but your application needs to have more features and capabilities, which means more developers and other people working on improving your application.

Increasing the number of people working on your application can lead to other types of scaling issues. Testing and rolling out new features becomes harder as the number of people working on the application increases. Just as your application demands increase, your ability to expand your application—and test it to make sure it keeps working as expected—decreases. This is another type of scaling problem.

This is why moving away from monoliths and moving toward service-based architectures is so popular. It improves your ability to scale your application development, which improves your speed of innovation—without sacrificing product quality.

3. Architecting for scale is as much about team culture and organization as it is about technology

Modern application development requires modern teams with a modern team culture to meet your growing customer needs.

This means adopting DevOps culture and organizational strategies. This means adopting DevOps processes and systems. This means using STOSA (single team oriented service architecture) principles for building and operating your services. This means continuous delivery of new features, rather than deploying large product releases.

It means changing your mindset to adapt to the modern needs of your customers, your company, and your application.

4. SLAs are not just for customers

We all know about external SLAs, or service level agreements. These are commitments that major companies give to their customers on how they will perform their services. For a modern application, companies often give SLAs about availability (we will be down less than one hour per month), or performance (on average, a request will be handled within 100 ms), or business performance (we will deliver the package within two business days).

But when it comes to building a high-scalability, high-availability application, you need to specify internal SLAs—performance promises between services. In a true service-oriented architecture, in order for a service to perform as needed for its customers, it must be able to depend on internal services to perform at their promised level of performance.

In a STOSA team, a service—each of which is owned by an individual team—needs to make performance guarantees to all the other services that depend on them. Every service must make promises to each dependent service in order for the application as a whole to meet external commitments. Monitoring these internal SLAs can be used to determine the source of a problem during an outage.

If, for example, an application API call is running slower than normal, by checking the SLA commitments and performance of all the called services, you can narrow down what service might be the cause of the slow-running API call.

Internal SLAs are a critical component in operating a large, growing modern application.

5. Improving availability requires managing risk

Many people do not realize how much risk is inherent in their applications today. Much of this risk is in the form of what’s called “technical debt” in the code, but some of it is based on known decisions that were made about how the system should operate without knowing the outcomes of those decisions.

Donald Rumsfeld, the former U.S. Secretary of State, famously said that the problems to be concerned about are the “unknown unknowns”—those problems we don’t even know that we don’t know about.

Risk management is about turning the unknowns into knowns. In the case of modern applications, risk management is about identifying and managing areas of concern, then addressing the risks that have the highest impact to our business.

A risk matrix is a common tool to help manage application risk. Risk matrices give visibility and prioritization to technical debt and pending problems. They are a great communications tool between development teams and management.

Effective use of risk matrices will help reduce availability issues in your application.

6. The best way to make your application work is to break it

The best way to keep your application operating—and keep it meeting customer needs—is to break the application occasionally.

This process, called Game Day testing, involves force-breaking some portion of a running and operating application in some manner, and seeing how it behaves and how the application—and the teams supporting it—respond to resolve the forced error.

The idea is that the best way for a team to learn how to resolve certain application failures is to see how they fail in real life. But rather than waiting for them to fail at some random point, you instead force a failure at a more convenient time to evaluate how your systems and teams resolve the problems. Random failures tend to occur at inconvenient times, such as the middle of the night or during a critical operation. Forcing them at a more convenient time lets you choose a low usage period, or a daytime hour when everyone is in the office and can work together easily on the problem.

Game Day testing can be planned events, such as testing a data center failure by disconnecting an important data center for a few hours. Or they can be randomly generated failures, using tools such as Netflix’s Chaos Monkey. This allows you to see whether your application can self-recover from the problem, or see how quickly a support team can resolve the problem.

Constantly testing various failure scenarios is a great way to keep your application operating and improves your application and support team’s ability to quickly resolve problems in the future. Nothing improves your application availability better than breaking it regularly.

7. The cloud is essential to scaling

Initially, cloud computing was an interesting novelty. Then it grew to become a useful tool that application developers and operations teams used to manage their applications. Today, however, it is an essential and critical element in nearly all highly scaled, high-availability applications across almost all industries and businesses.

The cloud’s ability to quickly add new resources, such as servers, storage, and network capacity, to an application is invaluable in building a cost-effective application. These resources allow the application to scale up and down based on application usage. They can even add significant resources quickly (burst) to support sudden and unexpected usage spikes. In the modern world, where a celebrity’s or influencer’s mention of a business on social media can cause that company’s traffic to increase by many orders of magnitude almost instantly, the ability to handle usage spikes is critical to application scalability.

But it’s more than that—the ability to create an entirely new data center by replicating an existing data center quickly allows you to create disaster recovery plans that do not require investing in standby hardware. Instead, if a disaster occurs that causes an application’s data center to go offline, for example, a new data center can quickly be brought up to run the operation with relatively little effort.

Such abilities are critical to modern digital applications, and can only be practically implemented using public cloud networks.

8. You should ignore the serverless hype

Serverless computing is a relatively new capability of cloud computing that allows you to use computational resources, database resources, data storage resources, and other application resources so you don’t have to allocate and manage individual servers. There are many examples of serverless resources in use in almost all cloud-based applications—for example, any application making use of Amazon S3 uses serverless data storage. But serverless computation, such as AWS Lambda, is a relatively new and useful model that provides semi-scalable computation resources without the need to allocate the underlying computer resources.

But while serverless computing has great value in many situations, the serverless hype prevalent in the industry would indicate that everything should be serverless. 

However, nothing can be farther from the truth. In many situations, using serverless computation may actually cause your application problems, or at least cause it to run much more expensively than is needed.

The pros and cons discussion of when and how to use serverless computing is beyond the scope of this article, but the point is that serverless is not appropriate for all situations.

Make sure to only use serverless capabilities where they will benefit your application—not where all the hype says you should use it.

Architecting for Scale—the book

architecting for scale book cover

If you want to read more about building modern, scalable applications, take a look at my book published by O’Reilly Media titled Architecting for Scale. Now in its second edition, the book gives you a high-level framework for how to design and manage your organization, so it can build and operate highly scalable, highly available web applications. The newly updated edition provides an expanded view on modern architecture paradigms, such as microservices and cloud computing, that you can use to build highly scalable modern applications.

Interested in learning even more? See my full list of books and reports, and check out my online courses as well as my column in InfoWorld.

Photo by Alex Wong on Unsplash.