- US - English
- China - 简体中文
- India - English
- Japan - 日本語
- Malaysia - English
- Singapore - English
- Taiwan – 繁體中文
Introduction – FDP and Latency Monitor are additional ingredients to high-resiliency SSD architecture.
In two of my recent blogs, I focused on topics that seemed to be somewhat unrelated but after some recent dialog with industry colleagues, I’ve concluded they are more related than I realized, and this blog is going to explore that connection further; exploring latency mitigation and how strategies such as NVM Express’ Flexible Data Placement (FDP) and OCP Storage Workgroup's Latency Monitor (LM) can both be considered vertically integrated high-resiliency ingredients.
Recap – Vertically Integrated Resiliency
From my blog “Working to revolutionize SSD resiliency with shift-left approach” I gave an overview of all the host and storage device ingredients associated with creating high resiliency. Starting with the Version 1 OCP SSD specification to today’s third release (Version 2.5) there has been a progression of incremental resiliency ingredients added. Panic detection, panic recovery and standardized telemetry were discussed. However, there are two I’d like to dive deeper into, 1) Flexible Data Placement and 2) Latency Monitor
Figure 1 - Summary of features for Vertically Integrated Resiliency
NVM Express™ Flexible Data Placement (FDP) enhances Vertically Integrated Resiliency through tail Latency Mitigation
FDP for enhanced SSD resiliency seems reasonable and obvious as mitigating write amplification offers extended endurance reach for host applications. This has been discussed extensively in the industry and in other Micron blog posts by my colleagues. It also, as discussed, shows multiple venues where FDP can be used to improve performance. This isn’t a surprise. As with reduced garbage collection, the real-to-life workloads observed in the industry can improve.
But can latency outlier mitigation be considered a Vertically Integrated Resiliency feature and can FDP help? My prior blog titled Why latency in data center SSDs matters and how Micron became best in class explored the theory of why latency outliers are so painful in scale-out solutions and the techniques we at Micron implemented for a world-class solution. However, it made no assumption of vertical integration and how a vertically integrated FDP solution would further improve latency.
If FDP’s role is to reduce the quantity of garbage collection (GC), then it seems clear that FDP would also reduce latency outliers. Of course, even with no GC and a WAF=1 there will still be latency variations due to different channel die, planes with reads, programs and erases that were host initiated. However, when the write amplification is around 2-3 or more, the likelihood of collisions and, thus, greater latency outlier impacts. But how much can FDP enhance latency outliers?
Figure 2- High-level summary of host and internal operations impacting NAND operations. These internal operations are what increase traffic and are commonly measured as "write amplification"
Using the methodology common in the industry and discussed in detail in my latency blog, let’s look at a full pressure 70% read, 30% write 4KiB random workload and plot read latency distributions. Covering the non-FDP case is straightforward and follows industry practice. To measure a WA of 1 as a proxy to the FDP best case, a fresh namespace was created and only a portion of the total capacity was preconditioned before running the 70/30 workload. In this case, an 8TB Micron 7500 SSD was preconditioned to 1TB and IOs were constrained to that 1TB (to prevent reading unmapped LBAs) until the physical media filled and GC started.
Figure 3 below displays read latency distribution first with no write pressure followed by sustained write pressure. Many prior results have suggested FDP WA is around 1, and thus, one can expect up to 30% reduction in tail latency variation.
Figure 3- Reducing garbage collection pressure through FDP can significantly reduce tail latency distributions.
OCP Storage Workgroup’s Latency Monitor enhances SSD resiliency
In my previous blog on latency mitigation, I explained that when database queries are split into multiple sub-queries that run in parallel, there is a low-chance but high-impact effect of tail latency on the total latency. That blog discussed how finding and fixing the causes of latency outliers is hard. What wasn’t discussed is how one demonstrates that the latency outliers are as expected. Tools like FIO can, at the system level timestamp every transaction and compute needed latency histograms. However, with today's drives capable of millions of IOPS, it would be too expensive computationally in real applications to create system-level real-time block traces and look for outliers. And even if we could, by the time the host detects the outlier and sends a Telemetry Host-Initiated request with the needed debug data, the SSD debug information might be gone.
I personally ran into this as part of the work in the FMS 2017 paper; the SSD controller was demonstrated to have enough computational power to record the time of arrival and departure of every command and measure the latency, create detailed histograms and trigger asserts if values were too high. In that case, the data was collected internally for later extraction and analysis. The data presented was all generated internally by the SSD itself.
Building on that work plus some critical needs by Meta formalized the concept of an SSD Latency Monitor and it was first included in version 2 of the OCP Datacenter NVMe SSD Specification. At a high level, each egressed command is timestamped, compared against the associated ingress timestamp and a latency is calculated. Four host-configurable histogram buckets are created with timestamps. To address the latency associated with the host requesting a debug log, immediately at command egress if a command latency threshold is exceeded then an internal debug log is saved for later vendor analysis. And as OCP Storage’s Standardized Telemetry is deployed even further debug information to latency outliers is possible.
At the Storage Developer Conference in 2022, Vineet Parekh and Venkat Ramesh from Meta discussed the deployment of latency monitoring into their fleet and an example of a key issue they identified. They pointed out that hypothetically, if their entire fleet at 1000 IOPS had 1 second 9-nines outliers that would make more than 5000 latency events per day! Parekh and Ramesh showed an example of a problematic SSD drive with Latency Monitor deployed that was able to efficiently debug a latency outlier issue that was unresolved for months before the latency monitor. Figure 4 below summarizes the Latency Monitor Architecture and results from their difficult outlier debug success.
An additional benefit of Latency Monitor is the ability to identify host latency issues that were previously blamed on the storage device. There are two recent examples. One related to a problem with Linux that resulted in an extension of an NVMe specification enhancement because of a latency outlier. The other was discussed in a blog by my Micron colleague Sayali Shirode, who used a Latency Monitor to show that a measured latency outlier WAS NOT caused by the SSD in question.
Figure 4 - Example latency outlier debug presented by Meta at Storage Developer Conference 2022
Conclusion – Micron is embracing High Resiliency SSD architecture and design
In my November 2023 blog, I talked about how a resiliency revolution requires a "shift left." This ecosystem needs measurement and detection, and also integrates vertically to cover host and vendor interactions. I already mentioned some ingredients, like panic reporting, panic recovery and a strong reduction of tail latencies. I wanted to highlight two ingredients that combine these two ideas.
- The Latency Monitor feature in OCP is crucial for eliminating latency outliers not only in SSD devices but also in the software ecosystem.
- Data placement, especially FDP, helps with endurance and performance and reduces latency outliers by up to 30%.
Two other final thoughts
- At FMS, I’m chairing a session on Hyperscale Applications where Meta will be adding further detail about their experience deploying the Latency Monitor at Scale. Please join us. I expect some great conversations around FDP this fall at conferences well.
- OCP Storage’s plugin for NVMe-CLI a vendor agnostic and seamless way to configure and report out both FDP and the Latency Monitor as well as other key inputs such as decoding standardized telemetry.
Acknowledgement
- I’d like to thank Micron’s Chandra Guda for the FDP experiment design and John Mazie for test execution support.
Further Reading
Working to revolutionize SSD resiliency with shift-left approach | Micron Technology Inc.
Why latency in data center SSDs matters and how Micron became best in class | Micron Technology Inc.
Identifying latency outliers in workload testing | Micron Technology Inc.
Avoiding Costly Read Latency Variations in SSDs Through I/O Determinism (flashmemorysummit.com)
Benefits of flexible data placement on real workloads using Aerospike | Micron Technology Inc.
Eliminating the I/O blender: The promise of flexible data placement | Micron Technology Inc.