DNS is a highly available, highly redundant, highly reliable service that is absolutely essential to your company’s application and business operations. A failure in your DNS system can bring your company’s business to a halt jeopardizing your company’s future.

DNS is essential to the operation of all aspects of the internet and modern digital businesses. The problem with DNS, is that a very tiny mistake in a configuration file can cause ripples throughout the entire DNS system and impact all aspects of your company’s operations, it’s customer’s ability to use the company’s products and a company’s ability to make money. All of it can be brought to its knees by a very tiny mistake in a single configuration entry. Without solid DNS configuration management in place, you make yourself vulnerable to simple but costly mistakes.

But how do you implement a high quality DNS hygiene solution? In this episode, I’ll give you eight steps to higher quality DNS systems.

Today on Modern Digital Business.

{{useful-links-research-links}}

{{about-lee}}

{{architecting-for-scale-ad}}

{{signup-dont-miss-out}}

Transcript
Lee:

DNS is a highly available, highly redundant, highly reliable service, that

Lee:

is absolutely essential to your company's applications and business operations.

Lee:

Yet, DNS configurations are highly sensitive and simple mistakes

Lee:

can cause catastrophic problems.

Lee:

But how do you implement a high quality DNS hygiene solution?

Lee:

In this episode, I'll give you eight steps to higher quality DNS systems.

Lee:

Are you ready?

Lee:

Let's go.

Lee:

DNS is a highly available, highly redundant, highly reliable service that

Lee:

is absolutely essential to your company's application and business operations.

Lee:

A failure in your DNS system can bring your company's business to a halt

Lee:

jeopardizing your company's future.

Lee:

DNS is essential to the operation of all aspects of the internet

Lee:

and modern digital businesses.

Lee:

The problem with DNS, is that a very tiny mistake in a configuration file can

Lee:

cause ripples throughout the entire DNS system and impact all aspects of your

Lee:

company's operations, it's customer's ability to use the company's products

Lee:

and a company's ability to make money.

Lee:

All of it can be brought to its knees by a very tiny mistake in

Lee:

a single configuration entry.

Lee:

Without solid DNS configuration management in place, you make yourself

Lee:

vulnerable to simple but costly mistakes.

Lee:

That's where problems often occur.

Lee:

Why are DNS configurations so sensitive to mistakes?

Lee:

The root cause of this sensitivity is that DNS changes are so common

Lee:

and so simple that they are rarely considered risky business operations.

Lee:

For smaller organizations, the development team probably manages

Lee:

their own DNS servers or has some other way to make DNS changes on the fly.

Lee:

As organizations get larger and more complex, the number of DNS servers

Lee:

and the number of people who can make changes to them tends to multiply.

Lee:

With so many people, making so many changes, it's not surprising that

Lee:

something goes wrong occasionally.

Lee:

In fact, it would be much more surprising if things didn't go wrong.

Lee:

DNS outages can be caused by a variety of factors, including human error,

Lee:

software issues, and hardware failures, but the most common cause of DNS

Lee:

outages is incorrect configuration files being deployed to DNS servers.

Lee:

What steps can smaller companies without quality DNS hygiene make

Lee:

in order to put a high quality DNS management process in place?

Lee:

Here are eight things any company can do to improve their

Lee:

overall DNS quality to keep your applications operational and healthy.

Lee:

Number one.

Lee:

manage DNS configuration using revision control.

Lee:

This is the simplest and most basic thing you can do to improve the

Lee:

quality of your DNS infrastructure.

Lee:

At the core, DNS configurations are simply flat text files.

Lee:

Many DNS providers do give you a front end control panel to these configuration

Lee:

files in order to let you make changes easier, and with less actual knowledge on

Lee:

the impact of the changes you are making.

Lee:

Don't use these control panels.

Lee:

Instead, manage your configuration files, using the standard flat text file format.

Lee:

Once you have moved to this flat file format, you can easily manage these

Lee:

configuration files using the same revision control program you use for

Lee:

managing your application source code.

Lee:

For most companies, this is some variation of GIT.

Lee:

You undoubtedly have processes in place today in your company

Lee:

for managing your source code.

Lee:

Use the same or similar process for managing a DNS

Lee:

configuration files as well.

Lee:

This simple change will allow many other process improvements to come naturally,

Lee:

such as configuration reviews, approval workflows, and the ability to track

Lee:

when specific changes were made that may have impacted your application.

Lee:

This is an essential base necessary to keep your DNS

Lee:

service operating and error free.

Lee:

Number two.

Lee:

Review all needed, DNS changes.

Lee:

This falls right behind the first recommendation.

Lee:

Once you're managing your changes using a revision control program,

Lee:

make sure that all changes you make are reviewed and approved.

Lee:

This can be accomplished just like your application source code using

Lee:

branches, pull requests and merges.

Lee:

Establish a process for approvals for all DNS changes.

Lee:

Make sure at least one or more people review all changes before

Lee:

they are incorporated into your production configuration.

Lee:

This review process should include checks for things like syntax

Lee:

errors, incorrect DNS settings, and other potential problems.

Lee:

Problems with DNS configurations can be subtle.

Lee:

So the review should be thorough and methodical by a knowledgeable reviewer.

Lee:

Number three.

Lee:

Document the intent of all changes.

Lee:

Every change you make should be documented.

Lee:

If you following the above steps, then this can naturally be

Lee:

accomplished using the code checking commit and poll request process.

Lee:

This documentation will help you later if a problem exists or an

Lee:

incompatible change is proposed.

Lee:

Understanding why a previous change was made will help repair problems

Lee:

and help you avoid future problems.

Lee:

Number four.

Lee:

Automate the configuration deploy process.

Lee:

Once you have the process in place to manage your configuration files,

Lee:

establish a process to automate the deployment of those configuration file

Lee:

updates to your production DNS system.

Lee:

By automating this process, you reduce the likelihood of an incorrect

Lee:

change being pushed to production or a simple human error causing your DNS

Lee:

system to fail or produce bad results.

Lee:

If you find yourself copying and pasting changes from one configuration file to

Lee:

another, during a deployment process, you're much more likely to make a mistake

Lee:

and introduce a bug into the DNS system.

Lee:

Automatically deploying changes using scripts, we'll make sure

Lee:

the changes are applied in a consistent and reliable manner.

Lee:

Part of the automated system should include an automated rollback mechanism.

Lee:

This may be a natural extension of your revision control process or a separate

Lee:

deployment rollback process, but being able to quickly and effectively undo a

Lee:

change may make the difference between a mistake being a small inconvenience

Lee:

or a massive product outage.

Lee:

Number five.

Lee:

Grow into a more sophisticated change management system.

Lee:

As your DNS system grows in complexity, you may want to consider putting an

Lee:

entire change management system on top of the simple version control system

Lee:

that you've already established.

Lee:

This might include using change request forms, request

Lee:

for authorization, multi-team sign-offs and other such processes.

Lee:

These changes may seem onerous, but DNS configuration is not a

Lee:

place for slacking off and process.

Lee:

A simple DNS change can impact many teams within your organization.

Lee:

Allowing those teams input before the change is made, or even the

Lee:

proposal for changes accepted can save you many headaches later on.

Lee:

The size and complexity of your change management system will naturally be

Lee:

tied to the size and complexity of your organization, and other software

Lee:

management processes that you employ.

Lee:

Number six.

Lee:

Use an independent DNS provider.

Lee:

A high quality DNS system requires more than configuration management.

Lee:

It requires a high quality operational environment as well.

Lee:

Many of your existing service providers may provide DNS services that you can

Lee:

easily and inexpensively leverage.

Lee:

In particular, most cloud providers naturally provide DNS services and

Lee:

usually rather high quality DNS services.

Lee:

However, be careful using a DNS service that is provided by a company

Lee:

that provides you any other services, including other cloud services.

Lee:

The reason why?

Lee:

Well, during a service outage, the most critical tool you need to be

Lee:

operating normally is your DNS system.

Lee:

You need it to help you diagnose and repair most other outages.

Lee:

If your DNS system is also down, the length of your outage

Lee:

will extend significantly.

Lee:

The reverse is also true.

Lee:

if you are dealing with a DNS issue, the last thing you also want to be dealing

Lee:

with is an outage caused by another service in your application ecosystem.

Lee:

Avoid these problems by using a high quality DNS provider that only provides

Lee:

DNS services to you and nothing else.

Lee:

This allows you to isolate your DNS and problems with your DNS system,

Lee:

from any other service in your application, reducing the likelihood

Lee:

of a DNS related extended outage.

Lee:

And be careful, make sure the provider you select isn't dependent on service

Lee:

providers, such as cloud providers, that you are also already relying on.

Lee:

If AWS has an outage, you want your independent DNS

Lee:

provider to keep operating.

Lee:

That doesn't happen if that service provider is also depending on AWS.

Lee:

Now, some people run their own DNS systems.

Lee:

If you decide to run your own DNS, make sure you operate it

Lee:

using independent resources from the rest of your application.

Lee:

This means operating it in different data centers, availability zones

Lee:

and even cloud regions, than the rest of your application.

Lee:

Number seven.

Lee:

Separate internal and external DNS.

Lee:

Let's take that last point one step further.

Lee:

You have DNS needs that are internal to your company and external DNS

Lee:

needs that your customers depend on.

Lee:

Your internal DNS provides access to internal documentation, internal systems

Lee:

including email and communications tools and other internal processes and systems.

Lee:

Your external DNS provides access to your company's applications, products, and

Lee:

services that your customers depend on.

Lee:

Make sure these two DNS needs are handled by different providers.

Lee:

If your external DNS goes down, fixing that problem will be substantially

Lee:

harder if your internal DNS is also down.

Lee:

This is part of what took Facebook so long to fix their application when

Lee:they went down in October of:Lee:

There external DNS went down and they couldn't diagnose and

Lee:

fix the problem easily because their internal DNS was also down.

Lee:

And conversely, if your internal DNS goes down, you don't want that problem

Lee:

to bleed out to your external customers.

Lee:

Using different providers, along with different DNS configurations and

Lee:

configuration processes is extremely valuable to avoid these sorts of problems.

Lee:

And lastly, number eight.

Lee:

Duplicate your DNS in another provider.

Lee:

Let's go one final step further.

Lee:

Set up your production, DNS using two different providers, use one

Lee:

as a primary provider and the secondary is a backup provider.

Lee:

This way.

Lee:

If your primary provider goes down, for some reason, you may be able

Lee:

to switch your production DNS over to your backup provider quickly.

Lee:

The backup provider should have a complete, operational and

Lee:

fully tested copy of your DNS configuration set up and operating.

Lee:

So it can be put into play quickly if needed.

Lee:

This process will be easier if you have implemented the automated deployment

Lee:

processes, we talked about previously.

Lee:

This automated process can help assure that you keep your changes in sync

Lee:

between your primary and backup providers.

Lee:

The worst thing that can happen is for your primary provider to go down, you

Lee:

switched to your backup provider, but you end up with an incomplete or incorrect

Lee:

DNS configuration because you haven't tested your backup provider setup.

Lee:

DNS is a critical system that should be designed for high availability

Lee:

and reliability from the start.

Lee:

You also need to think about security when designing your DNS infrastructure.

Lee:

Make sure you have redundant systems in place, and that access to your

Lee:

DNS system is tightly controlled.

Lee:

Finally monitoring DNS is critical to ensuring your

Lee:

system continues to run smoothly.

Lee:

You need tools that will alert you if problems occur so you

Lee:

can take steps to mitigate the impact as quickly as possible.

Lee:

DNS outages are common occurrences, but they don't have to bring your

Lee:

entire company to a standstill.

Lee:

By using the proper processes and tools.

Lee:

You can minimize the impact of any outages and keep your business running smoothly.