data center resiliency

Contributor(s): Stephen J. Bigelow

Resiliency is the ability of a , network, storage system, or an entire data center, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption.  

Data center resiliency is a planned part of a facility’s architecture and is usually associated with other and data center disaster-recovery considerations such as data protection. The adjective resilient means "having the ability to spring back."

Data center resiliency is often achieved through the use of components, subsystems, systems or facilities. When one element fails or experiences a disruption, the redundant element takes over seamlessly and continues to support computing services to the user base. Ideally, users of a resilient system never know that a disruption has even occurred.

For example, if an ordinary server’s power supply fails, the server fails -- and all of the workloads on that server become unavailable until the server is repaired and restarted (or the workloads can be restarted on another suitable server). If the server incorporates a redundant power supply, the backup supply keeps the server running until a technician can replace the failed power supply. Techniques, such as server , support redundant workloads on multiple physical servers. When one server in the cluster fails, another node takes over with its redundant workloads.

The same concept holds true all the way up to entire data center facilities. For example, an organization may power its data center with two separate utility feeds from different utility providers so that a backup provider is available when the first utility provider fails. As another example, organizations that support can support data center collocation–shifting an entire operation from one facility to another in response to any kind of local disruption or regional disaster.

The resiliency techniques employed in a data center can vary with the importance of the respective workloads.Organizations with mission-critical workloads will utilize more resiliency techniques at more levels within the data center, because the cost of not preserving critical computing services is typically costlier during a prolonged service outage. For example, critical business services, such as transaction processing software or database systems, may be designed with comprehensive data center resiliency, including clustering, and off-site redundancy. Conversely, nonessential workloads that can tolerate some level of disruption may receive little resiliency or simply remain offline until they can be restored.

This was last updated in February 2012

Continue Reading About data center resiliency

Dig Deeper on Data center capacity planning

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

  • A small investment in time to execute these Windows Server performance tuning tips and techniques can optimize server workloads ...

  • A move to Office 365 doesn't require cutting the cord from on-premises Active Directory, but it is an option. Here's what you ...

  • In addition to fixing the exploited Windows flaw, Microsoft rolls out a new look to the Security Update Guide that draws some ...

  • Virtual clusters enable admins to deploy, track and manage containers across various systems to ensure performance, security and ...

  • Virtualized power systems promise to ease deployment and maintenance, but the market is still in its nascency.

  • Mini PCs are a low-cost hardware alternative to servers that enable organizations to maintain maximum data center features and ...

  • Stay on top of the latest news, analysis and expert advice from this year's re:Invent conference.

  • Familiarize yourself with user pools and identity pools in Amazon Cognito and learn how to better protect your workloads in a ...

  • There are many benefits to containerization as part of a migration, but only for the right type of app. If you're moving to Azure...