- US - English
- China - 简体中文
- India - English
- Japan - 日本語
- Malaysia - English
- Singapore - English
- Taiwan – 繁體中文
Latency is one of the most important metrics when characterizing storage performance as it refers to the time it takes for storage device to respond to a request for data. Lower latency equals faster access to data and can significantly impact the performance of applications and improve user experience. Factors affecting latency include various hardware components, network stack, workload characteristics, storage architecture, and software stack.
RocksDB is a storage-focused key value database that is the backbone of many operations at Meta. Here’s their description from RocksDB.org:
"RocksDB builds on LevelDB to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.”
The goal of the Yahoo! Cloud Serving Benchmark(opens in a new tab) (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different “key-value” and “cloud” serving stores. The project comprises two major components:
- The YCSB Client: an extensible workload generator
- The Core workloads: a set of workload scenarios to be executed by the generator
While running the RocksDB YCSB read heavy workload, we saw large latency spikes multiple times. The read latencies are expected to be less than 5ms (which aligned with the earlier runs), but after running back-to-back runs, we started seeing large latencies of about 113ms. To emulate this issue, we used FIO and were able to replicate it. Maximum observed latency in FIO was 18ms which, while different from the latency observed in RocksDB, is still higher than the expected value.
In the case of FIO, we looked at a transaction log for a job file and saw that nothing is occurring for a specific amount of time, which is equivalent to the latency seen. However, when we look at the NVMe™ driver layer, no transaction has a latency anywhere near 18ms, so we know the source of the latency must be the application.
Latency excursion is not a drive issue, but likely the behavior of layer above it.
- Even simple tools like FIO can experience system effects that can result in the reporting of high latencies.
- Noise on the system can affect max latency reporting in FIO
- QoS latencies out to 7x9’s looks consistent.
RocksDB latency of 113ms was seen at the block layer which was measured by BPF trace scripts. Bpftrace(opens in a new tab) is a tracing framework for Linux that allows the user to trace and profile various aspects of the system at runtime. We referred to BIO snoop script from Brendan Gregg’s Performance Tools(opens in a new tab) which outputs details for each storage device I/O with latency and looks for time-ordered patterns.
It uses kprobes (kernel probes), a Linux kernel mechanism for dynamic tracing which allows the user to insert breakpoint at almost any kernel function, invoke your handler and then continue executing.
To debug the root cause, we collected various stats to understand where the high latencies are coming from.
- IO stat
- System input/output statistics for devices and partitions
- Measured at the kernel layer
- Bus analyzer - Captures and analyzes the data transmitted on PCIe bus
- Individual transaction information – like transaction type, address, data payload
- Shows the signal timing
- OCP latency monitor
- Measured at the hardware and firmware layers.
In conclusion, the observed high latencies are on the system stack since neither the latencies on kernel layer, PCIe bus, nor hardware and firmware layer are close to the latency spikes seen in the workload traces.