In a recent FAST 2016 paper titled Flash Reliability in Production, an eye-catching conclusion was that MLC drives are as reliable as the more costly SLC "enterprise" drives. Whether you’re a value-added reseller (VAR) helping end users make SSD choices for client or enterprise needs, both end-user groups see SSD reliability as the aspect that matters the most. That doesn’t mean that all other considerations are inconsequential, but across all use scenarios, reliability is the aspect that can ultimately progress or halt the business.
Educating clients on SSD reliability is a constant refrain for supporting VARs, but it always has to be exhibited through their knowledge of specific use cases. For example, cost differences can bring questions of whether a client SSD, rather than an enterprise SSD, will be sufficient for a particular use scenario. Many end users understand how over-provisioning that provides ample bad-block replacement and the elimination of write slowdowns as a contributor to that increased enterprise SSD cost.
Of course, that brings us back to the FAST 2016 paper, which concluded that age, not use, correlates with increasing error rates. To many users, that translates to meaning that over-provisioning for fear of flash wear-out is not needed.
The unpredictability of knowing when an HDD will wear out is one of the reasons that so many users in the client and enterprise markets have made the switch to SSDs. The reliability argument doesn’t get any more basic than knowing that an SSD will continue to be readable even when it reaches its write endpoint.
That truth, coupled with the fact that SSDs have significantly more life and longer read-write cycles, has generally fueled the SSD reliability perspective. Adding in the ability of SSDs to work in a broad temperature range lends even more credence to that perspective.
The recent advances in SSDs have enabled greater IOPs and capacity as well as decreased costs that have plotted the course of where SSD technology is going. That doesn’t change the fact that not all SSDs are designed or capable of high write endurance workloads. Until recently, it was an accepted fact that SLC flash (while being more expensive) is more reliable than MLC. If that greater reliability is no longer unquestioned, what measures of reliability will matter to the end user?
SSD manufacturers define reliability in one of two ways. The first is mean time between failure (MTBF), which defines the amount of failure per million hours of operation for a product. The other is annualized failure rate (AFR), which estimates the probability of failure over a year of use in terms of percentage. Generally, MTBF for consumer SSDs falls between 1 million and 1.5 million hours, while enterprise SSDs provide a higher MTBF of 1.2 million to 2 million hours.
Despite the difference, it’s fairly easy to see why an end user would be less inclined to place a premium value on the difference. Of course, other factors beyond the MTBF and AFR, as well as the MLC-versus-SLC difference, can also point to the greater reliability for enterprise SSDs.
The fact that enterprise SSDs utilize elaborate built-in power loss protection features can influence that reliability. This works by monitoring the power being supplied to the SSD; if it senses a loss of power, it automatically stops I/O and makes sure that all in-flight data are written to the storage media.
While enterprise SSDs have higher reliability ratings than consumer SSDs for a variety of unchallenged reasons, it’s up to the VAR to understand the use scenarios of the end user. When SSD reliability is the overriding concern with a potential end-user client, other factors like cost can be less relevant.
Ultimately, VARs need to understand each particular use environment in order to help the client make the right decision. Any decision that is ultimately perceived as the wrong choice in terms of SSD reliability could effectively close the door to an ongoing business relationship.