Изменения

Ceph performance

16 байтов добавлено, 14:53, 24 июля 2019
Нет описания правки
* CPU is the main bottleneck for Ceph running on good SSDs.
* As Nick Fisk said in his presentation — presentation — Ceph is a Software-Defined Storage '''and every piece of Ceph «Software»''' will run faster with every GHz of CPU clock speed.* Server CPUs often have NUMA (Non-Uniform Memory Access). Which means that some CPU cores don't don’t have direct access to the RAM and/or part of the hardware, but forward requests to other cores.* You should try to use CPUs with more clock speed and without NUMA to maximize the performance. That is, a slower CPU with more cores is probably worse than a faster CPU with smaller number of cores...cores…* ...but …but within reason as one Bluestore OSD can eat up to ~6 cores under full load.* "Clock speed" «Clock speed» means nominal, not Turbo Boost speed, because Turbo Boost is only beneficial for single-threaded workloads.* CPU pinning recommendations (tasket) are almost outdated, because Ceph OSDs are multi-threaded. At least 4 threads are active during writes, so you'll you’ll only slow your OSDs down if you allocate less than 4-6 cores for each of them.* There are 2 parameters responsible for OSD worker thread count - count — osd_op_num_shards and osd_op_num_threads_per_shard…
* …But trying to tune them is pointless, default configuration (1x5 for HDDs and 2x8 for SSDs) is optimal. The problem is that all worker threads still serialize writes into a single kv_sync_thread, and the whole scheme only scales up to ~6 worker threads.
* There is one thing that decreases latency 2-3 times at once. It's It’s disabling all power-save of CPUs:** <tt>cpupower idle-set -D 1</tt>  — this disables C-States (or you can pass <tt>processor.max_cstate=1 intel_idle.max_cstate=0</tt> to the kernel command-line)** <tt>for i in $(seq 0 $((`nproc`-1))); do cpufreq-set -c $i -g performance; done</tt>  — this disables frequency scaling.
* When power-save is disabled CPU heats up as a GTX, but you get 2-3 times more iops.
* High CPU requirement is one of the cases NOT to use Ceph in a "hyperconverged setup"«hyperconverged setup», the setup in which storage and compute nodes are combined.
* You can also disable all hardware vulnerability mitigations: <tt>noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier</tt> (or just <tt>mitigations=off</tt> for newer kernels)