Изменения

Перейти к: навигация, поиск

Ceph performance

2422 байта добавлено, 17:30, 13 января 2020
Нет описания правки
The latency doesn’t scale with the number of servers or OSDs-per-SSD or two-RBD-in-RAID0. When you’re benchmarking your cluster with iodepth=1 you’re benchmarking only ONE placement group at a time (PG is a triplet or a pair of OSDs). The result is only affected by how fast a single OSD processes a single request. In fact, with iodepth=1 IOPS=1/latency. There is Nick Fisk’s presentation titled «Low-latency Ceph». By «low-latency» he means 0.7ms, which is only ~1500 iops.
 
=== Expected performance ===
 
Estimating the cluster performance based on the performance of disks is absolutely wrong.
 
The real expected performance for Bluestore is like the following (iops applies to random 4KB reads/writes):
 
1 HDD (usual SATA, 7200 rpm, non SMR, without SSD cache) is:
* ~100-120 iops with QD=128
* ~66 iops with QD=1
* ~40 MB/s with linear read/write
* The numbers will be worse if you're short on available RAM, because you'll get a lot of metadata cache misses
 
1 fast SSD or NVMe SSD with capacitors (see below) and write iops >= 25000:
* ~1000 write iops with QD=1. May vary between 300 and, in the best possible case, ~2500 iops depending on CPU frequency and settings.
* Up to ~10000-20000 write iops with QD=128 per 1 OSD.
* Read iops are around 2-2.5 times better: QD=1 ~2000 iops (up to ~4000), QD=128 ~20000 (up to ~50000).
* Of course, the QD=128 iops number is limited by the performance of the disk itself :). However, as good SSDs usually perform great in parallel mode, they're usually not a bottleneck.
* By running multiple OSDs on a single drive, you can multiply your parallel (QD=128) iops number by the number of OSDs, as long as the drive allows it. Of course, you get the same increase in CPU load. HUGE increase.
* Linear reads and writes are almost as fast as raw disk reads and writes.
* Difference between SATA SSDs and NVMes in terms of random I/O in Ceph is negligible as long as they both have capacitors. Of course, server NVMes are still the best and you should try to get them instead of SATA and SAS, but it's hard to notice the difference with Ceph and random I/O.
* Modern SSDs often have slower QD=1 random reads than writes, just because they write into a fast capacitor-protected cache, but they can't serve all random reads from it. The difference is usually like 8000 QD=1 read iops compared to 40000 QD=1 write iops.
 
Aggregate performance:
* Linear read from the cluster = OSD number * MB/s of one OSD
* Linear write to a replicated pool = OSD number / Replica number * MB/s of one OSD
* Linear write to a EC pool = OSD number / (K+M) * K * MB/s of one OSD
* Random QD=1 performance is the average for all OSDs (treat it like latency); iops with QD=128 is the sum
* Random IOPS are limited by the client, too. 1 RBD client can squeeze out up to ~30000 read iops and up to ~15000 write iops
* Linear I/O is of course limited by the network bandwidth, too
=== Micron setup example ===

Навигация