Understanding RAID Performance at Various Levels

FEBRUARY 14TH, 2024
https://www.arcserve.com/sites/default/files/2023-08/Aftab-360x360.jpg
Aftab Alam
Executive Vice President, Product Management

Choosing a RAID level is an exercise in balancing many factors, including cost, reliability, capacity, and performance. RAID performance and speed can be challenging to understand, mainly as distinct RAID levels use varying techniques and behave somewhat differently in practice. 

This article will explore the standard RAID levels of RAID 0, 5 (now deprecated), 6, and 10 to see how their performance differs. For this article, RAID 1 will be assumed to be a subset of RAID 10.  Put simply, a RAID 1 is the same as a RAID 10 array, except that it only includes a single mirrored pair member. As RAID 1 is genuinely a single pair of RAID 10 and behaves as such, this works wonderfully to make RAID performance easy to understand. It simply maps into the RAID 10 performance curve.

RAID Reading, Writing 101

There are two types of performance to look at with all storage: reading and writing. Regarding RAID, reading is straightforward, and writing is rather complex. With solid-state drives (SSDs) now becoming more common in RAID arrays, the performance dynamics, especially for write operations, have significantly improved, reducing write penalties and rebuild times for RAID levels that benefit from faster access times.

Read performance is effectively stable across all types. Writing, however, is not. To make discussing performance easier, we need to define a few terms as we will work with some equations. In our discussions, we will use “N” to represent our array's total number of drives, often called spindles. We will use “X” to refer to the performance of each drive individually. This allows us to talk about relative performance as a factor of the drive performance.

We can abstract away the RAID array without thinking about raw IOPS (input/output operations per second). This is important as IOPS are often very hard to define. But we can compare performance meaningfully by speaking to it in relation to the individual drives within the array. 

It’s also important to remember that we are only talking about the array's performance, not an entire storage subsystem. Artifacts such as memory caches and solid-state caches will do amazing things to alter the overall performance of a storage subsystem. But they will not fundamentally change the array's performance under the hood.

There is no simple formula for determining how different cache options will impact the overall performance. Suffice it to say that it can be dramatic, depending heavily on the cache choices and workload. Even the biggest, fastest, most robust cache options cannot change an array's long-term, sustained performance. 

RAID is complex, and many factors influence the final performance. One is the implementation of the system itself. A poor implementation might cause latency. Or it may fail to use the available spindles (such as having a RAID 1 array read only from a single disk instead of from both simultaneously) 

There is no easy way to account for deficiencies in specific implementations. We must assume that all are working to the limits of the specification. Any enterprise RAID system will do this. It is primarily hobby and consumer RAID systems that fail in this aspect.

The CPU's Role in RAID Performance

Some types of RAID also have surprising amounts of computational overhead associated with them, while others do not. Primarily, parity RAID levels require heavy processing to handle write operations, with different levels having different amounts of computation necessary for each operation. Advancements in CPU technology have made this particularly relevant in software RAID by mitigating performance impacts and making RAID a more viable option for many applications.  

Parity introduces latency but does not curtail throughput. This latency will vary, however, based on the implementation of the RAID level as well as on the processing capability of the system. 

Hardware RAID will use a general-purpose CPU (often a Power or ARM RISC processor) or a custom ASIC to handle this. ASICs can be very fast but are expensive to produce. Software RAID hands this off to the server's CPU. Usually, the server CPU is faster here but consumes system resources.

This latency impacts storage performance but is difficult to predict and can vary from nominal to dramatic. So, while we'll look at the relative latency impact with each RAID level, we won't attempt to measure it. In most RAID performance calculations, this latency is ignored. However, it is still present. Depending on the array's configuration, it could noticeably impact a workload. There is, it should be mentioned, a small performance impact on read operations due to efficiencies in the layout of data on the disk itself.

Parity RAID requires data on the disks that are useless during a healthy read operation but cannot be used to speed it up. This results in it being slightly slower. However, this impact is minimal and is typically not measured. It can be ignored. 

Factors such as stripe size also impact performance, of course. But as that is configurable and not an intrinsic artifact on any level, we will ignore it here. It is not a factor when choosing a RAID level but does come into play when it's time for configuration.

Read/Write Ratio for Storage

The final factor we want to mention is the read-to-write ratio of storage operations. Some RAID arrays will be used almost purely for read operations, some for write operations. Most will blend the two, likely around 80 percent read and 20 percent write

This ratio is critical in understanding the performance you will get from your specific array and understanding how each RAID level will impact you. We refer to this as the read/write blend. We measure storage performance primarily in IOPS. IOPS stands for input/output operations per second. 

We use the terms RIOPS for Read IOPS, WIOPS for Write IOPS, and BIOPS for Blended IOPS, which would come with a ratio such as 80/20. Many people talk about storage performance with a single IOPS number. When this is done, they usually mean Blended IOPS at 50/50.

However, any workload rarely runs at 50/50, so that number can be highly misleading. We need two numbers, RIOPS and WIOPS, to understand performance. We can use these two together to find any IOPS blend we need. For example, a 50/50 blend is as simple as (RIOPS * .5) + (WIOPS * .5). The more common 80/20 blend would be (RIOPS * .8) + (WIOPS * .2).

Now that we have established some criteria and background understanding, we will delve into our RAID levels and see how performance varies. For all RAID levels, we calculate the Read IOPS number using NX. This does not address the nominal overhead numbers mentioned above, of course. This is a "best case" number. 

But the real-world number is so close that it is practical to use this formula. Take the number of spindles (N) and multiply by the IOPS performance of an individual drive (X). Keep in mind that drives often have different read and write performance. 

So be sure to use the drive's Read IOPS rating or tested speed for the Read IOPS calculation and the Write IOPS rate or tested speed for the Write IOPS calculation. Read More: Practical RAID Decision-Making

RAID 0 Performance

RAID 0 is the easiest level to understand because there is effectively no overhead to worry about or resources consumed to power it, and both read and write get the full benefit of every spindle. So for RAID 0, our formula for write performance is straightforward: NX.

RAID 0 is always the highest-performing level. An example would be an eight-spindle RAID 0 array. If an individual drive in the array delivers 125 IOPS, our calculation would be done with N = 8 and X = 125, so 8 * 125 yields 1,000 IOPS. 

Both read and write IOPS are the same here. So, it is elementary as we get 1K RIOPS, 1K WIOPS, and 1K without any blending. If we didn't know the absolute IOPS of an individual spindle, we could refer to an eight-spindle RAID 0 as delivering 8X Blended IOPS.

RAID 10 Performance

RAID 10 is the second simplest level when it comes to calculations. Because RAID 10 is a RAID 0 stripe of mirror sets, we have no overhead to worry about from the stripe, but each mirror has to write the same data twice to create the mirroring. 

This cuts our write performance in half compared to a RAID 0 array of the same number of drives. That gives us a simple write performance formula: NX/2 or .5NX. We should note that this is based on the same capacity as RAID 0 rather than the same number of spindles.

RAID 10 has the same write performance as RAID 0 but double the read performance because it requires twice as many spindles to match the same capacity. So, an eight-spindle RAID 10 array would be N = 8 and X = 125, and our resulting calculation comes out to be (8 * 125)/2, which is 500 WIOPS or 4X WIOPS. A 50/50 blend would result in 750 Blended IOPS (1,000 Read IOPS *.5 and 500 Write IOPS*.5.) 

This formula applies equally to RAID 1, RAID 10, RAID 100, and RAID 01. Uncommon options, such as triple mirroring in RAID 10, would alter this write penalty. RAID 10 with triple mirroring would be NX/3, for example. Read More: Understanding and Using RAID 10

RAID 5 Performance

RAID 5 is deprecated and not recommended for new arrays due to its vulnerability during rebuilds with large-capacity drives. It introduces a significant write penalty because of the need to manage parity data. The consensus among IT professionals is that RAID 5's vulnerability to disk failures during rebuilds and its performance under large drive sizes make it less suitable for modern storage needs than RAID 6 or RAID 10.

Parity RAID adds a somewhat complicated need to verify and re-write parity with every write that goes to disk. This means that a RAID 5 array will have to read the data, read the parity, write the data, and finally write the parity. Four operations for each effective one. This gives us a write penalty on RAID 5 of four. So, the formula for RAID 5 write speed is NX/4. 

Following the eight-spindle example where the write IOPs of an individual spindle is 125, we would get the following calculation: (8 * 125)/4 or 2X Write IOPS, which comes to 250 WIOPS. In a 50/50 blend, this would result in 625 Blended IOPS.

RAID 6 Performance

RAID 6, after RAID 10, is probably the most common and useful RAID level in use today, especially considering its increased reliability over RAID 5 by incorporating another level of parity. That makes it better suited for today's large-capacity storage environments. 

This makes it dramatically safer than RAID 5, which is important but also imposes a substantial write penalty. Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity, and finally write the second parity. This comes out to be a write penalty of six, which is pretty dramatic. Our formula is NX/6. 

Continuing with our example, we get (8 * 125)/6, which comes to ~167 Write IOPS or 1.33X. In our 50/50 blend example, this is a performance of 583.5 Blended IOPS. As you can see, parity writes cause a very rapid decrease in write performance and a noticeable drop in blended performance.

Performance as a Factor of Capacity

When producing RAID performance formulae, we think of these in terms of the number of spindles, which is incredibly sensible. This is very useful in determining the performance of a proposed array or even an existing one where measurement is not possible and allows us to compare the relative performance between different proposed options.

We universally think of RAID performance in these terms. However, this is not always a good approach because we typically consider RAID a capacity factor rather than performance or spindle count. It would be very rare, but certainly possible, for someone to consider an eight-drive RAID 6 array versus an eight-drive RAID 10 array. Once in a while, this will occur due to a chassis limitation or some other similar reason. But typically, we view RAID arrays from the standpoint of total array capacity (e.g., the capacity we can use) rather than spindle count, performance, or any other factor.

Therefore, it is odd that we should switch to viewing RAID performance as a function of spindle count. Suppose we change our viewpoint and pivot to capacity as the common factor while still assuming that individual drive capacity and performance (X) remain constant between comparators. In that case, we arrive at a completely different performance landscape. In doing this, we see, for example, that RAID 0 is no longer the most performant RAID level and that read performance varies dramatically instead of being a constant.

Capacity is a fickle thing, but we can distill it to the number of spindles necessary to reach the desired capacity. This makes this discussion far easier. So, our first step is determining the spindle count needed for raw capacity. If we need a capacity of 10TB and are using 1TB drives, we would need ten spindles, for example. Or, if we need 3.2TB and are using 600GB drives, we would need six spindles.

We will, different than before, refer to our spindle count as “R.” (We use "R" here to denote that this is the Raw Capacity Count rather than the total number of spindles.) As before, the performance of the individual drive is represented as “X.” RAID 0 remains simple. Performance is still RX, as there are no additional drives. Both read and write IOPS are simply NX.

RAID 10 has RX Write IOPS but 2RX Read IOPS. This is dramatic. Suddenly, when viewing performance as a factor of stable capacity, we find that RAID 10 has double the read performance over RAID 0!

RAID 5 gets slightly trickier. Write IOPS would be expressed as (R + 1) * X)/4. The Read IOPS is expressed as (R +1) * X). RAID 6, as we expect, follows the pattern that RAID 5 projects. The Write IOPS for RAID 6 are (R + 2) * X)/6. And the Read IOPS is expressed as (R + 2) * X).

This vantage point changes the way we think about performance, and when looking purely at read performance, RAID 0 becomes the slowest RAID level rather than the fastest, and RAID 10 becomes the fastest for both read and write no matter what the values are for R and X! 

Let's take a real-world example of 10 2TB drives to achieve 20TB of usable capacity, with each drive having 100 IOPS performance, and assume a 50/50 blend. The resultant IOPS would be: RAID 0 with 1,000 Blended IOPS, RAID 10 with 1,500 Blended IOPS (2,000 RIOPS / 1,000 WIOPS), RAID 5 with 687.5 Blended IOPS (1,100 RIOPS / 275 WIOPS), and RAID 6 with 700 Blended IOPS (1,200 RIOPS / 200 WIOPS). RAID 10 is a dramatic winner here.

Latency and System Impact with Software RAID

As we noted earlier, RAID 0 and RAID 10 effectively have no system overhead to consider. With the integration of SSDs, the latency and system impact—even in software RAID configurations—have become negligible for many applications. This further broadens the scope and usability of RAID configurations in performance-critical environments.

Essentially, the mirroring operation requires no computational effort and is immeasurably small for all intents and purposes. Parity RAID has computational overhead, resulting in latency at the storage layer and system resources being consumed. 

Of course, those resources are dedicated to the RAID array if we use hardware RAID. They have no function but to be consumed in this role. However, if we use software RAID, these are general-purpose system resources (primarily CPU) consumed for the RAID array processing. 

The impact on a very small system with a large amount of RAID is still minimal, but it can be measured and should be considered, if only lightly. Latency and system impact are directly related to one another. There is no simple way to state latency and system impact for different levels. Here's one way we can put it:

  • RAID 0 and RAID 10 have effectively no latency or impact.
  • RAID 5 has some latency and impact
  • RAID 6 has roughly twice as much computational latency and impact as RAID 5

In many cases, this latency and system impact will be so small that they cannot be measured with standard system tools. As modern processors become increasingly powerful, the latency and system impact will continue to diminish. 

The impact has been considered negligible for RAID 5 and RAID 6 systems, even on low-end commodity hardware, since approximately 2001. There could be contention between the RAID subsystem and other processes requiring system resources on heavily loaded systems with a large amount of parity RAID activity.