A new small study has uncovered a pretty big problem that was previously unknown. The problem is simple: if the data is stored in the cache and even if the corresponding buffer is emptied with it, it could be lost, which generates a big problem in some SSD models. This raises the most obvious question: how can an SSD have a problem losing data from its cache due to power failure?
Some manufacturers have had the intelligence to see and anticipate this situation, so above all it calms down, since it is not a problem that is found on all models. So far, only 4 SSD models have been tested and two have reported the failure, which was discovered by an Apple programmer: Russian bishopwhere you now want to increase the range of scanned devices to get to the bottom of it.
SSD drives, cache and data and power loss issue
Fun story: I tested a random selection of four NVMe SSDs from four vendors. Half of the FLUSH data is lost in the event of power loss. That is, the flush went to the drive, confirmed, success reported to userspace. Then I manually pulled the cable. Boom, the data is gone.
— Russ Bishop (@xenadu02) February 21, 2022
We have to emphasize this fact, because we are not talking about the data itself which is housed in the NAND Flash cells and which is your normal storage, but rather the data which is stored in the cache, which is equally important in time for work or play. It should also be stated that this has been logically tested on systems Applebut the mode of operation is indistinguishable from the PC as the SSD works the same way and the cache is also handled by NVMe and its data flow to it.
With that in mind, the problem is easily reproducible by any programmer, as Bishop says, but the data is truly concerning.
The other half never lost confirmed data after a flush (F_FULLFSYNC on macOS) no matter how much I abused it. All four had a perf hit from the flush, so they do the job.
The two best flush players? We lost data 40% of the time. The other has never lost.
— Russ Bishop (@xenadu02) February 21, 2022
What Bishop means here is that the data was lost in the process after flushing the cache because it was not transferred to NAND Flash storage. Why did this happen? First of all, it must be understood that the cache is a volatile SRAM memory and therefore when Bishop forced the power cut by pulling the cable of the PC, the data which was there was lost, whereas it shouldn’t be because they shouldn’t be.
Which models are free from this defect?
As we say, this error should not occur, but it is seen that there are manufacturers who have stopped this problem and others who have not. The four SSDs analyzed were the Samsung 970 EVO PLUS, WD RED SN700 1TB, SK Hynix Gold P31 2TB and Sabrent Rocket 512 with Phison PH-SBT-RKT-303 meter.
The Samsung and the WD were the two that maintained the data correctly performing the drain in an exemplary manner, while the SK Hynix and the Sabrent were the ones that were affected. By this week, Bishop hopes to have data from more SSDs:
Tomorrow I will have the results for:
Intel 670p
SAMSUNG 980
WD Black SN750
WD Green SN350
Kingston NV1
Seagate Firecuda 530
Crucial P2
Crucial P5 Plus— Russ Bishop (@xenadu02) February 23, 2022
This is particularly interesting because some models like the Samsung 980 or Kingston NV1 they have no cache as such and use HMB (Host Memory Buffer) with system RAM to simulate its operation, which should in principle produce the same problem of data loss as soon as we find ourselves without power in the PC for the reason whatever. It will be interesting to see the results in a few days.