The holiday shopping season, which traditionally kicks off with Black Friday the day after Thanksgiving, is just around the corner. What started out as a uniquely American shopping holiday day has now become a global phenomenon, with retailers everywhere steeling themselves for the annual onslaught of shoppers, both in store and online.
If you’re running an e-commerce business, it’s crucial that your website be able to scale to meet the huge surges of traffic that you will (hopefully) see this season. If your website is slow or unresponsive, frustrated shoppers will abandon their shopping carts and make a digital beeline for your competitors. If your website should crash completely, you’ll be losing bucketfuls of revenue by the second.
Because the holiday rush happens every year, your e-commerce business has no excuse not to be ready for it. If you’re worried you’re not sufficiently prepared, it’s not too late to prepare your e-commerce business for the holidays by ensuring the most important elements are in place. Here are six things all e-commerce businesses should do right now to make sure they’re ready for the shoppiest time of the year.
Prepare your e-commerce business for the holidays in 6 steps
1. Build a risk matrix
You can’t plan for risk if you don’t understand what the potential risks are. Risk management is the most important thing you can do to prepare your company for the unknowns ahead. For an e-commerce company, no single part of your business is more important than your digital storefront, so it’s crucial to understand the risks to your web applications before the holiday shopping season begins.
Risk matrices are a great way to understand these risks, and to prepare your applications for the coming onslaught. Don’t know how to build and use a risk matrix? There are lots of informative resources out there on this topic, including my book Architecting for Scale, which covers risk management in depth.
2. Prepare scaling plans
Do you know when the heaviest traffic days and times will be for your application? You might have some idea already. You may know which specific days of the week will be heavier than others, based on historical holiday trends. You may also know what time of the day traffic peaks based on your customers’ browsing habits. And you certainly should know when your marketing department is planning on specific sales and promotions, allowing you to prepare for the traffic you’ll have from those activities.
But, are you sure? Nothing describes this year’s retail experiences better than the word “unpredictable.” Unpredictable means you really can’t be sure what will happen with your web traffic. Will your traffic be 10% higher than last year? 80%? 500%? Will the high traffic times match your past usage?
And what happens if a YouTube influencer or celebrity latches on to your product and tells everyone about it?
These are all good things and can spell huge success for your company, but they can just as easily spell failure.
In my experience, the most common time for an application to fail is when it is the most stressed—when it is the busiest. This means that just after that celebrity endorsement comes through and everyone is going to your storefront to check out what the big deal is all about, your site begins to slow down, becomes sluggish, and then eventually fails outright.
What a waste. Avoid this by being prepared to meet your scaling requirements, even the unpredictable and completely unforeseen scaling needs. Nothing makes this easier than operating your application in the public cloud, where adding additional resources to your application is relatively quick and easy.
But additional resources can only help an application that is capable of handling those additional resources. Your application must be scale-ready, and able to take additional resources on demand and use them to their fullest. This requires advanced planning and work. But even at this late date, getting yourself as ready as possible to scale is a critical step in holiday preparedness.
3. Establish internal, measurable SLAs
Do you know which of your systems and services is your bottleneck? You should. If you want to apply resources to improve the scalability of your application, you better know where and how to be able to apply those resources. You need advance warning of pending performance problems in order to apply resources and avoid a failure.
This requires you to understand how your services perform normally, and what to look for in terms of early indicators of problem performance. This means measuring the performance of your systems in steady state, and creating internal SLAs for how you expect your applications to perform in the future.
SLAs (service level agreements) are not just for your customers. SLAs are often used internally within organizations to measure the expected performance of subsystems and services that are critical to the operation of an application. Measuring internal performance against SLAs is a great way to be able to spot upcoming problems, know who needs to deal with the problem, and what they need to do to fix the issue before you start losing customers.
4. Establish clear escalation processes
Even with all this preparation, however, incidents will happen. When your application begins to fail—or, better yet, right before it fails—you want to have your support teams properly engaged to address the problem quickly. This can limit the negative impact of large problems.
There are two critical measures of success in an application’s readiness response: Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR). MTTD measures how long it takes you to notice a problem has started to occur, while MTTR measures how long it takes you to resolve a problem once you’ve noticed it. Together, these two numbers determine whether an incident is a minor annoyance, or a major customer-losing problem.
Internal SLAs and measurements are important for improving MTTD, but so is having a solid and repeatable process in place for how you respond when a problem is detected. To prepare your e-commerce business for the holidays, ask yourself the following questions:
- Do you have a clear set of first responders identified who are called in immediately via automated systems whenever a problem is detected?
- What is the training level of these individuals?
- Do they understand how the application functions?
- Are they empowered to perform the needed actions to resolve a problem?
- Do they have the tools they need to accomplish this task?
- Do they have the proper expertise?
- Is there a clear escalation path for who is brought in next if the problem is beyond what the first responder is capable of fixing?
- Do your second-level responders understand their responsibilities and are they available to be brought in during an emergency?
- When does (or should) management get involved?
These questions must all be addressed in your escalation process in advance. Your escalation process must have clear and unambiguous defined paths that describe who is to be brought in, what they are capable of doing, and what they are allowed to do before bringing in the next higher level of responsibility. Your escalation process must spell out all steps of the process for all types of incidents. Your systems and processes must be well documented so that the process is continuous, repeatable, and sustainable.
And it is absolutely critical that everyone involved in the process understands their role in the process, their responsibility in the process, and has the tools and knowledge to implement their responsibilities.
5. Test failure scenarios
You have scaling plans, defined internal performance triggers, and an established, clear escalation path. This means you are ready for the holidays, right?
Well, almost. Are you sure all those processes work?
Nothing shows preparedness more than actually testing failures to ensure everything works as designed.
I’m a firm believer in testing in production. You can’t be certain your production systems will work unless you test them failing. And simulating this in a test or staging environment is insufficient.
This type of production testing is often called Game Day testing. It involves intentionally failing a production system and making sure that your systems, processes, and people all respond as they are supposed to in order to resolve the incident in a timely manner.
Often these sorts of Game Days are planned in advance and occur during low-traffic times to your site. However, this only gives you a partial assurance. To properly validate your systems, you need to test failure at busy times, too.
Some companies do this using tools that randomly insert failures into their applications, in order to test the response processes and systems. One such tool is Chaos Monkey, built and used by Netflix. Whether you go to this extreme or perform simpler and safer testing is up to you. But test to make sure your systems and processes work before you need them to. This is an important step to prepare your e-commerce business for the holidays and other periods of high traffic.
6. Study past incidents
Finally, you’ve had incidents in the past with your application. Have you performed post-mortem reviews of what caused those incidents to make sure they don’t reoccur?
Every time you have an atypical problem with your application, make sure to analyze it after the event to see what you can do to improve your systems and processes to reduce the likelihood of a recurrence of that same event and to improve your response to the problem.
Companies that have high availability perform a formal post-mortem analysis after each and every incident, even minor incidents. Great lessons can be learned by simply looking at how your systems have performed in the past.
It’s never too late to prepare your e-commerce business for the holidays
Hopefully your e-commerce website is ready for the holiday season. The truth is, some of the things on this list aren’t easy to implement, and if you haven’t done them already, you might not be able to get them in place in time for peak holiday traffic. But the good news is that some of the things on this list can be implemented late in the process. Every little bit helps, and the one improvement you do make could be the one critical piece that keeps this holiday season running smoothly.
And, of course, get started early next year so your e-commerce business is fully prepared for the 2022 holiday season. Want to learn more about planning for high availability and application scaling? Check out my book, Architecting for Scale, published by O’Reilly Media, now in its second edition.
More articles from Lee Atchison: