Methods to Improve the Reliability of Solid-state Drives

It was mentioned earlier that flash memory chips have a limit on the number of P/E cycles, exceeding which it will affect the reliability of storing data in solid state SSD. The following methods or measures are all considered to improve the reliability of solid state SSD.

Capacity redundancy for solid state SSD

In order to avoid the failure of the entire solid state SSD caused by partial flash damage, there is capacity redundancy in the design of solid state SSD. For example, the actual physical flash capacity in a 100GB nominal solid state SSD is generally more than 110GB. The ratio of excess capacity to nominal capacity is called redundancy ratio. Generally speaking, the higher the redundancy ratio, the better the reliability, lifespan, performance and stability of the solid state SSD. The redundancy ratio of enterprise-level storage products generally needs to exceed 28%.

Bad block management of solid state SSD

During the use of solid state SSD, although there are various mechanisms and algorithms to extend the lifespan as much as possible, flash damage is unavoidable. Capacity redundancy provides the basic guarantee to solve the problem of flash damage. Flash damage is granularized by pages, i.e, a block contains multiple pages, some of which may be in normal state and some in damage state. In fact, when there are multiple damaged pages within a block, most of the other pages in this block are on the edge of damage. Therefore, the firmware of solid state SSD generally manages the flash damage situation in blocks: when the number of pages that cannot read data within a block due to damage exceeds a threshold, this block is determined to be damaged. Then, the valid data in this damaged block is migrated to other available blocks, and this damaged block is marked as damaged, so it is no longer used to store any business data.

The means by which the firmware of solid state SSD detects bad blocks include two types: host IO triggering and internal inspection. During the lifespan of a solid state SSD, about1.5% of blocks become bad blocks.

Data redundancy protection for solid state SSD

Solid state SSD uses various redundancy checking methods to protect user data from bit flipping, operations or losses. In the controller memory of solid state SSD, error correction codes (ECC) and cyclic redundancy checks (CRC) are used to prevent data changes or operations. In the flash memory chips, low-density parity check codes (LDPC) andCRC are used to prevent data loss caused by flash memory chip errors. Xor redundancy is used to prevent data loss caused by flash failures between flash memory chips.

