Изменения

Перейти к: навигация, поиск

Ceph performance

3551 байт добавлено, 21:35, 30 мая 2022
Нет описания правки
[[File:Warning icon.svg|32px|link=]] A useful habit is to leave an empty partition for later benchmarking on each SSD you deploy Ceph OSDs on, because some SSDs tend to slow down when filled.
 
==== Lyrical digression ====
 
Why use this approach in benchmarking? After all, disk performance depends on many parameters, such as:
* Block size;
* Mode — read, write, or various mixed read/write modes;
* Parallelism — queue depth and the number of threads, in other words, the number of parallel I/O requests;
* Test duration;
* Initial disk state — empty, filled linearly, filled randomly, randomly written over a specific period of time;
* Data distribution — for example, 10% of hot data and 90% of cold data or hot data located in a certain place (e.g., at the beginning of the disk);
* Other mixed test modes, e.g, benchmarking using different block sizes at the same time.
 
The results can also be presented with varying levels of detail — you can provide graphs, histograms, percentiles, and so on in addition to mere average operation count or megabytes per second. This, of course, can reveal more information about the behavior of the disk under test.
 
Benchmarking also contains a bit of philosophy. For example, some manufacturers of server SSDs argue that you must do preconditioning by randomly overwriting the disk at least twice to fill translation tables before testing. I rather believe that it puts the SSD in unrealistically bad conditions rarely seen in real life.
 
Others say you should plot a graph of latency against the number of operations per second, but my opinion is that it’s also a bit strange because it implies that you plot a graph of F1(q) against F2(q) instead of «q» itself.
 
In short, benchmarking can be a never-ending process. It can take quite a few days to get complete view. This is usually what resources like 3dnews do in their SSD reviews. But we don’t want to waste several days. We need a test that allows us to estimate performance quickly.
 
Therefore we isolate a few «extreme» modes, check the disk in them and pretend that other results are somewhere between these «extreme points», forming some kind of a smooth function depending on the parameters. It’s also handy that each of these modes also corresponds to a valid use case:
 
* Applications that mainly use linear or large-block access. For such applications, the crucial characteristic is the linear I/O speed in megabytes per second. Therefore, the first test mode is linear read/write with 4 MB blocks and medium queue depth — 16-32 operations. Test results should be in MB/s.
* Applications that use random small-block access and support parallelism. This leads us to 4 KB random I/O modes with large queue depth — at least 128 operations. 4 KB is the standard block size for most filesystems and DBMS. Multiple (2-4-8) CPU threads should be used if a single thread can’t saturate the drive during test. Test results should include iops (I/O operations per second), but not latency. Latency is meaningless in this test because it can be arbitrarily increased just by increasing queue depth — latency is directly related to iops with a formula latency=queue/iops.
* Applications that use random small-block access and DO NOT support parallelism. There are more such applications than you might think; regarding writes, all transactional DBMSs are a notable example. This leads us to 4 KB random I/O test with queue depth of 1 and, for writes, with an fsync after each operation to prevent the disk or storage system from «cheating» by writing the data into a volatile cache. Results should include either iops or latency, but not both because, as already said, they directly relate to each other.
=== Test your Ceph cluster ===

Навигация