Enterprise Data Center Redundancy: Best Practices
To begin, a reminder: Many companies do not need three nines of uptime or complete redundancy. Many companies will be perfectly content with restoring from backup when things go awry or dealing with a few hours of downtime every year; the cost of having downtime isn’t outweighed by the cost of maintaining redundancy.
For others, however, it’s mission-critical to have as much uptime as possible. Data centers in particular need to be on top of their game when it comes to redundancy; often, they process so much data (and much of it very time-sensitive) that to have more than an hour or so of collective downtime a year is unacceptable (in fact, it’s not unheard of for data centers to shoot for six nines of uptime).
So how do you go about redundancy? Check out some of the best practices below for ways to have that crucial uptime!
1. Plan Out Your Needs From the Beginning
Too often data centers will try and add in redundancy after the fact; if you’re a data center, long-term planning and meticulous attention to detail will be much easier and more efficient than later rehauling (and often, the rehauling may prove difficult or impossible). Does the space you’re looking at have the potential to be cooled easily, or is it a nightmarish heat trap? Does it have space for you to grow, expanding not only your servers but redundant units as well? Is the cabling installed efficiently and tidily, or are there breaks, gaps, or odd design decisions? These are all things the potential data center has to consider, and is perhaps one of if not the most important best practices on this list.
Too many data center operators are so concerned with their server and networking systems that they neglect the most critical part of the data center: Power. Make sure that you will always have power no matter what! Your servers should have two hot-swappable PSUs, your UPSes should be chained together in case one of them fails, and you should have generators on-site to kick in the moment power is gone. You should also have more than one and preferably several independent power suppliers linked to different substations in the event power blacks out in one of them. While this measure may seem extraordinary, it is not uncommon for large data centers to have five or more dedicated electrical feeds to as many different substations! Power is the core of all your services and it cannot be underestimated or ignored.
3. Internet Access/Networking
Much in the same vein as the power, a data center is only as effective as the data it’s serving throughout the world. Don’t trust one fiber line to deliver all your networking needs; spread out among a few and have the appropriate equipment installed to make sure that there’s uninterrupted service. Like your power needs, don’t just trust one or two outside lines; five or more should be installed, lending redundancy to your data center as well as spare bandwidth for unexpected loads.
Don’t just protect your outside line, though; make sure that every piece of switching and routing equipment has a backup to ensure that data doesn’t get hindered if a switch or router fails. At no point should data fail to route properly across your network due to faulty hardware. This applies also to server NICs, which should be duplicated in case one NIC fails (or, worst-case scenario, have a backup wireless connection).
4. Maintain Your Redundancy
Redundancy is not a set-it-and-forget-it deal. With good redundancy comes constant vigilance: always make sure that the backup hardware you have in place is actually working. It’s easy to forget that backup switch or PSU that you put it until the primary one fails; it’s another thing altogether to remember to maintain even the most obscure link in the chain. Make sure things are working, and have a schedule set up to test, replace, and reconfigure redundant services when necessary: when it comes time for them to perform, you’ll be very glad you did.
5. 2N Redundancy
What we’ve been describing so far is N+1 redundancy; that is, have one component more than required for operation, and when one of these components goes down the backup can take its place seamlessly. There are, however, situations where N+1 redundancy is insufficient; for example, an appliance that requires 2 PSUs to power it. N+1 dictates that there should be 3 PSUs in the server, but in the event 2 PSUs fail the server will cease to function. In this case, you would need four PSUs to ensure redundancy if the original components fail, or “2N” redundancy (also called N+N). While 2N redundancy is optimal for maximum uptime, it can also be prohibitively expensive and difficult to keep in place rigorously; before opting for 2N redundancy, make sure you have the budget and organizational framework in place to attempt such an undertaking for your own data center.
And there are some best practices for enterprise data centers; following those will lay a very good groundwork for your future exploits in data center redundancy. Note, however, that there is no one-size-fits-all approach to this idea. While these best practices are certainly something to take seriously, each and every IT professional must tailor their redundancy plans to the budget, needs, and operational capacity of their individual institutions!