Do SSD failures follow the bathtub curve? Ask Backblaze – The Register

Cloud-based storage and backup provider Backblaze has published the latest report on usage data gathered from its solid state drives (SSDs), asking if they show the same failure pattern as hard drives.

Backblaze uses SSDs as boot drives in the server infrastructure for its Cloud Storage platform, while high-capacity rotating drives are typically used for storing and serving up data.

However, they do more than just boot the storage servers, holding log files and temporary files produced by each server. The volume of data a boot drive will read, write, and delete thus depends on the activity of the storage server itself.

The company previously reported that its SSDs appeared to be at least as reliable as hard drives, but warned this could change as it has not collected SSD data for as long as hard drives and the accumulation of more data could alter the statistics.

Backblaze says it has added 238 SSDs to its infrastructure since the last SSD report, ending in Q4 2022. These comprised 110 Crucial drives (model: CT250MX500SSD1), 62 WDC drives (WD Blue SA510 2.5) and 44 Seagate drives (ZA250NM1000).

Looking at the Q1 2023 and Q2 2023 figures, Backblaze notes that some drives appear to have exceptionally high annualized failure rates, with the Seagate model SSDSCKKB240GZR listed with an annualized failure rate (AFR) of over 800 percent, for example.

This is a fluke of the statistics because of the low number of drives; in Q1 there were just two of this model, one of which failed shortly after being installed. During Q2, the remaining drive did not fail and thus the AFR for that period was zero.

These figures illustrate why Backblaze considers at least 100 instances of a specific drive model and 10,000 drive days of operation in a specific quarter as a minimum before the calculated AFR can be considered to be reasonable, according to Backblaze storage cloud evangelist Andy Klein

Looking at the AFR over time, Backblaze reports that the AFR across its SSDs was 0.96 percent during Q1 of 2023 and 1.05 percent during Q2. This failure rate is thus up from the previous quarter, but down slightly from the same quarter a year ago. In fact, a chart of the AFR per quarter over the past three years shows that it has fluctuated between 0.36 percent and 1.72 percent, with no apparent underlying pattern.

However, Backblaze says that the quarterly data is still vital as it can reveal issues such as one particular drive model that was the primary cause of a jump in AFR from 0.58 percent in Q1 2021 to 1.51 percent in Q2 then 1.72 percent in Q3.

"It happens from time to time that a given drive model is not compatible with our environment, and we will moderate or even remove that drive's effect on the system as a whole," Klein said.

Backblaze earlier this year calculated the average age at which failure occurred for its entire collection of hard drives, and has repeated the calculation for SSDs in this latest report.

This involved collecting the SMART data for the 63 failed SSD drives the company has had to date, which is not a great dataset size for statistical analysis, as Klein admitted. The resulting figure calculated from the data is 14 months, compared with two years and seven months across all hard drives.

But Backblaze cautions this figure is likely to be unrepresentative, as the average age of the entire fleet of SSDs it has in operation is just 25 months.

Looking at three drive models for which the company has a reasonable amount of data, Klein found that the average age of the failed drives increases as the average age of drives in operation increases, and it is therefore reasonable to expect that the average age for an SSD failure will increase with time.

Turning to the lifetime annualized failure rate for all of its SSDs, Backblaze reports a figure of 0.9 percent, covering a period from Q4 2018 through to the end of Q2 2023. This figure is up slightly from the 0.89 percent it found at the end of Q4 2022, but down from the same quarter a year ago, when the figure was 1.08 percent.

However, this includes those drives which have high apparent failure rates because there is just not enough data to make the calculation reliable.

If the calculation is limited to just those drive models for which there are 100 units in operation and over 10,000 drive days, and also with a confidence interval of 1 percent or lower between the low and the high values, then it cuts the data down to just three drives and an AFR of just 0.6 percent.

Meanwhile, Backblaze has also produced a graph of SSD failures over time to see how well the data matches the classic bathtub curve used in reliability engineering, as the comparable graph for its hard drives does.

According to Klein, while the actual curve (blue line) showing the SSD failures over each quarter is a bit "lumpy," the trend line (red) does have "a definite bathtub curve look to it."

The trend line is about a 70 percent match to the actual data, so Backblaze says it cannot be totally confident at this point, but for the limited amount of data available, it would appear that the occurrences of SSD failures are on a path to conform to the tried-and-true bathtub curve.

As ever, Backblaze makes the raw data used in its report available on a Drive Stats Data page for anyone to download and analyze as long as you cite Backblaze as the source if you use the data, and don't sell it, of course.

Go here to see the original:
Do SSD failures follow the bathtub curve? Ask Backblaze - The Register

Related Posts

Comments are closed.