Data Center Redundancy vs Data Center Resilience

neuCentrIX - 31/12/2021 09:00

When talking about data centers, redundancy is a very common term. But have you heard of data center resilience? Are redundancy and resilience different? How are they connected? In this article, we will take a closer look at each term to see both their differences and connection.

How are redundancy and resilience different?
Reliability is one of the most critical attributes of a data center. Redundancy and resilience are two factors that play a major role in ensuring a data center’s reliability. Both allow a data center system to be fault-tolerant, allowing it to stay operable regardless of system failure, power outage, cyberattacks, and other issues. However, despite their similar nature, redundancy and resilience are not the same thing, and, thus, shouldn’t be used interchangeably.

As discussed in our previous article Data Center Redundancy, redundancy essentially means duplication. It refers to the level of backup components a data center has in place to take over when the primary components or system fails. In data center operations, redundancy is usually required for power supplies and cooling components which are vital to keep the system up and running. 

When we talk about redundancy, we talk about specific equipment’s capacity. It can be measured in terms of the backup components a system has to ensure its reliability. That’s why there are different levels of data center redundancy which are measured in N ratings. N+1 redundancy means there is one extra unit for every four components in use at a data center. 2N redundancy means a data center has a whole duplicate system, with a duplicate unit available for every component in use. Meanwhile, as the highest level of data center redundancy, 2N+1 means a data center has a duplicate unit for every component used, with an extra unit for every four components.

Resilience, on the other hand, refers to a data center’s ability to continue operating when there is equipment or system failure or anything else which disrupts normal operation. While redundancy is about specific equipment’s capacity, resilience is about the data center as a whole being able to maintain the consistency of its services in the face of faults.

The factors that build data center resilience are diverse and constantly evolving, and redundancy is only one of them. Other factors include equipment failure forecast and prevention, system protection, failed components isolation and removal, system recovery, and system performance restoration. There are also other external factors that contribute to resilience, such as having staff on site 24 hours a day.

More and more data centers today are shifting their focus from redundancy to various strategies including load balancing and virtualization to implement fault tolerance. Cloud computing is also playing a critical role here because more resources are now being hosted — meaning there is less need for redundant data center systems. Moreover, data center managers are also using more DCIM tools to monitor operations (including the electrical infrastructure) and identify unnecessary redundancy and potential points of failure. All in all, what’s important is how a data center can ensure data availability and provide consistent services to its customers in various scenarios.

How are redundancy and resilience connected?
Based on the explanation above, we can see why both redundancy and resilience are important for a data center. Redundancy is one of the factors that enhance a system’s resilience, and resilience ensures a data center’s availability. Therefore, when choosing a data center provider, it’s essential to choose one that implements not just one, but both.