Изменения

Ceph performance

1943 байта добавлено, 15:48, 24 июля 2019
Нет описания правки
* High CPU requirement is one of the cases NOT to use Ceph in a «hyperconverged setup», the setup in which storage and compute nodes are combined.
* You can also disable all hardware vulnerability mitigations: <tt>noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier</tt> (or just <tt>mitigations=off</tt> for newer kernels)
 
== Network, DPDK and SPDK ==
 
* Fast network mostly matters for linear read/write and rebalancing. Yes, you need 10G or more, but 0.05ms-0.1ms latency is totally enough for Ceph. Improving it further won’t improve your random read/write performance. Jumbo frames (mtu=9000) also only matter for linear read/write.
* DPDK = Data Plane Developer Kit, fast Intel library for working with network and RDMA (Infiniband) devices in userspace, without kernel context-switches
* SPDK = Storage Performance Developer Kit, additional Intel library for working with NVMe SSDs in userspace, also very fast. There is also libnvme — a fork of SPDK with removed DPDK dependency.
* There is DPDK and SPDK support in Ceph:
** DPDK is enabled with ms_type=async+dpdk
** SPDK is enabled for NVMes by passing <tt>spdk:PCI_serial_number</tt> as the device name and deploying OSDs using the Manual Deployment documentation
** But…
* DPDK support is broken — build scripts are broken and even if you fix them by hand there are some bugs that make OSD crash after processing ~50 packets.
* SPDK build scripts are OK and Ceph is even built with it by default. There are even some reports that it works, however, my OSDs have just hung when I tried to start them with SPDK.
* Both are pointless to use because Ceph itself isn’t that fast. It doesn’t matter if your network latency is 0.05ms or 0.005ms — Ceph software takes 0.5-1ms. There was an experiment report in the mailing list — one guy tried to isolate AsyncMessenger from all other Ceph code and benchmark it alone — https://www.spinics.net/lists/ceph-devel/msg43555.html - and he only got ~80000 iops.
* SPDK is unneeded in the long term even for NVMes, because Linux 5.1 finally has a proper asynchronous I/O implementation io_uring: https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/ - it gives you almost the same latency as SPDK with a lot less complexity
== Drive cache is slowing you down ==