In this episode, the hardest part of your cloud migration is moving your data to the cloud. Moving your data to the cloud, without suffering planned or unplanned downtime, can be a challenge.
I’m going to give you three strategies that will help you avoid downtime during the migration of your critical application data to the cloud.
And in Tech Tapas, we are going to take a look at what it means to fly two mistakes high, and how that relates to application availability.
Links and More Information
The following are links mentioned in this episode, and links to related information:
- 3 Strategies to Avoid Downtime When Migrating Data to the Cloud (https://blog.newrelic.com/engineering/migrating-data-to-cloud-avoid-downtime-strategies/)
- Modern Digital Applications Website (https://mdacast.com)
- Lee Atchison Articles and Presentations (https://leeatchison.com)
- Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com)
Main Story – 3 Strategies to Avoid Downtime when Migrating Data to the Cloud
Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. During the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime.
Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration.
Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud:
- Offline copy migration
- Master/read replica switch migration
- Master/master migration
It doesn’t matter if you’re migrating an SQL database, a noSQL database, or simply raw data files—each migration method requires a different amount of effort, has a different impact on your application’s availability, and presents a different risk profile for your business.
Strategy 1: Copy Data While Application is Offline
An offline copy migration is the most straightforward method. Bring down your on-premise application, copy the data from your on-premise database to the new cloud database, then bring your application back online in the cloud.
An offline copy migration is simple, easy, and safe, but you’ll have to take your application offline to execute it. If your dataset is extremely large, your application may be offline for a significant period of time, which will undoubtedly impact your customers and business.
For most applications, the amount of downtime required for an offline copy migration is generally unacceptable. But if your business can tolerate some downtime, and your dataset is small enough, you should consider this method. It’s the easiest, least expensive, and least risky method of migrating your data to the cloud.
Strategy 2: Read Replica Switch
The goal of a read replica switch migration is to reduce application downtime without significantly complicating the data migration itself.
For this type of migration, you start with your master version of your database running in your on-premise data center. You then set up a read replica copy of your database in the cloud with one way synchronization of data from your on-premise master to your read replica. At this point, you still make all data updates and changes to the on-premise master, and the master synchronizes those changes with the cloud-based read replica. The master-replica model is common in most database systems.
You’ll continue to perform data writes to the on-premise master, even after you’ve gotten your application migrated and operational in the cloud. At some predetermined point in time, you’ll “switchover” and swap the master/read replica roles. The cloud database becomes the master and the on-premise database becomes the read replica. You simultaneously move all write access from your on-premise database to your cloud database.
You’ll need a short period of downtime during the switchover, but the downtime is significantly less than what’s required using the offline copy method.
Strategy 3: Master/Master Migration
This is the most complicated of the three data migration strategies and has the greatest potential for risk. However, if you implement it correctly, you can accomplish a data migration without any application downtime whatsoever.
To begin, create a duplicate of your on-premise database master in the cloud and set up bi-directional synchronization between the two masters, synchronizing all data from on-premise to the cloud, and from the cloud to on-prem. Basically, you’re left with a typical multi-master database configuration.
After you set up both databases, you can read and write data from either the on-premise database or the cloud database, and both will remain in sync. This will allow you to move your applications and services independently, on your own schedule, without needing to worry about your data.
At the completion of your migration, simply turn off your on-premise master and use your cloud master as your database.
It’s important to note, however, that this method is not without complexity. Setting up a multi-master database is quite complicated and comes with the risk of skewed data and other untimely results. For example, what happens if you try and update the same data simultaneously in both masters? Or what if you try to read data from one master before an update to the other master has synchronized the data?
As such, this model only works if your application’s data access patterns and data management strategies can support it. You’ll also need application specific synchronization and sync resolution routines to handle sync-related issues as they arise.
If your application, data, and business can handle this migration method, consider yourself fortunate and use it. It’s the cleanest and easiest of the three strategies.
Mitigate migration risks
Any data migration comes with some risk, especially the risk of data corruption. Your data is most at risk while the migration is in progress; swift and determined execution of the migration is critical. Don’t stop a migration until you have completed the process or you have rolled it back completely. And never stop a migration halfway through—half-migrated data isn’t useful to anyone.
Risk of data corruption is especially high when migrating extremely large datasets. Offline data copy and transfer tools such as AWS Snowball can help manage the migration of large quantities of data, but they do nothing to help with your application’s data usage patterns during a migration. Even if you use a transfer device such as Snowball, you’ll still need to use one of the migration strategies described above.
Tech Tapas — Two Mistakes High
I fly model R/C airplanes. There is an expression we use in that hobby while you are learning to fly. It is “keep your plane two mistakes high”.
What does it mean to “keep two mistakes high”?
Well, when you are flying, and you are doing some maneuver, such as a stunt of some sort. If you make a mistake, the result is your plane will typically lose some altitude.
Now, while you are recovering from that mistake — and you are now lower in altitude — what happens if you make another mistake? If your plane isn’t high enough to recover from the second mistake…well…it’s bad news for you and your plane…
You always want to be high enough so you can recover from a mistake, even while you are recovering from a mistake… You always want to stay high enough so you don’t crash, no matter what goes wrong.
This is where the expression “keep two mistakes high” comes from. You always want to perform stunts high enough so that if you make a mistake, you have room to recover even if you make a mistake during the recovery.
This is a good analogy for maintaining availability in our most critical modern web applications. Keeping two mistakes high in our critical modern applications means that even when something is going wrong with our application, you want to be able to keep your application running reliably enough so that you can afford for something else to go wrong while you are still recovering from the main problem.
This applies to many different aspects of modern application availability planning, from dealing with hardware failures, to data redundancy, to capacity planning, to performing retries in your inter-service API calls, to risk management and disaster planning.
Keeping “two mistakes high” is a lesson about redundancy. It’s a lesson about availability, and it’s a lesson about resiliency. It applies to model aviation, and it applies just as effectively to modern application development and operation.
In future episodes of Modern Digital Applications, I will be using this principle in giving you guidance and suggestions to keeping your critical modern applications operating.