Изменения

Ceph performance

1628 байтов убрано, 11:36, 25 октября 2019

Нет описания правки

*: Reading from an empty RBD image is very fast :) so pre-fill it before testing.

*: Run tests from node(s) where your actual RBD users will reside. The results are usually slightly better when you run tests from a separate physical server.

* The same from inside a VM or through the kernel RBD driver (krbd:*# fio -ioengine=libaio -sync=1 -direct=1 -name=test -bs=4M -iodepth=16 -rw=write -pool=rpool_hdd -runtime=60 -rbdname=testimg*# fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -pool=rpool_hdd -runtime=60 -rbdname=testimg*# fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=128 -rw=randwrite -pool=rpool_hdd -runtime=60 -rbdname=testimg* Всё то же самое изнутри виртуалки или через krbd):

*# fio -ioengine=libaio -direct=1 -name=test -bs=4M -iodepth=16 -rw=write -runtime=60 -filename=/dev/rbdX

*# fio -ioengine=libaio -direct=1 -sync=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=/dev/rbdX

*# fio -ioengine=libaio -direct=1 -name=test -bs=4k -iodepth=128 -rw=randwrite -runtime=60 -filename=/dev/rbdX

*: ~~Заметьте, что при тестировании задержки добавилась опция~~ Don't miss the added -sync=1option. ~~Это не случайно~~It is added on purpose, ~~а соответствует режиму работы СУБД (транзакционная запись в 1 поток)~~to match the ioengine=rbd test. В ioengine=rbd ~~понятие~~ has no concept of sync ~~отсутствует~~, ~~там всё всегда~~ everything is always «sync»with it. Overall this write pattern — transactional single-threaded write — corresponds to a DBMS.*: Note that regardless of the supposed overhead of moving data in and out the kernel, the kernel client is actually faster.* ceph-gobench*: Or https://github.com/vitalif/ceph-bench. The original idea comes from the «Mark’s bench» from russian Ceph chat ([https://github.com/socketpair/ceph-bench original outdated tool was here]). Both use a non-replicated Ceph pool (size=1), create several 4MB objects (16 by default) in each separate OSD and do random single-thread 4kb writes in randomly selected objects within one OSD. This mimics random writes to RBD and allows to determine the problematic OSDs by benchmarking them separately.*: To create the non-replicated benchmark pool use {{Cmd|ceph osd pool create bench 128 replicated; ceph osd pool set bench size 1; ceph osd pool set bench min_size 1}}. Just note that 128 (PG count) should be enough for all OSDs to get at least one PG each.

* The first recommended tool is again `fio` with `-ioengine=rbd -pool=<your pool> -rbdname=<your image>`. All of the above tests valid for raw drives can be repeated for RBD and they mean the same things. Sync, direct and invalidate flags can be omitted, because RBD has no concept of «sync» — all operations are always «sync». And there’s no page cache involved either, so «direct» also doesn’t mean anything.* The second recommended tool, especially useful for hunting performance problems, comes in several improved varieties of «Mark’s bench» from russian Ceph chatNotes: https://github.com/rumanzo/ceph-gobench or https://github.com/vitalif/ceph-bench. Both use a non-replicated Ceph pool (size=1), create several 4MB objects (16 by default) in each separate OSD and do random single-thread 4kb writes in randomly selected objects within one OSD. This mimics random writes to RBD and allows to determine the problematic OSDs by benchmarking them separately. Original Mark’s bench (outdated) was here: https://github.com/socketpair/ceph-bench*~~: To create the non-replicated benchmark pool~~ Never use ~~{{Cmd|ceph osd pool create bench 128 replicated; ceph osd pool set bench size 1; ceph osd pool set bench min_size 1}}. Just note that 128 (PG count) should be enough for all OSDs~~ dd to ~~get at least one PG each~~test disk performance.* ~~Do not~~ Don't use `rados bench`. It creates a small number of objects (1-2 for a thread) so all of them always reside in cache and improve the results far beyond they should be.* You can ~~also~~ use ~~the simple~~ `~~fio -ioengine=libaio~~rbd bench` ~~with a kernel-mounted RBD. However~~, that requires to disable some features of that RBD, because kernel client still lacks their support. Note that regardless of the overhead of moving data in and out the kernel, the kernel client is actually faster.* And you can also use it from inside your VMs, the results are usually similar to the above. Just note that the result also depends on the storage driver being used. Virtio is the fastest, virtio-scsi is slightly slower and everything else (like LSI emulation) is terribly slow. Results are also considerably affected by whether the RBD cache is enabled or not (cache=writeback qemu option turns on RBD cache automatically). For random reads or writes, disabled RBD cache but fio is ~~faster~~better.

== Why is it so slow ==

VitaliyFilippov

Бюрократ, администратор

13 864

правки

Изменения

Ceph performance

YourcmcWiki