Изменения

Ceph performance

2330 байтов добавлено, 14:55, 22 июля 2019

м

Нет описания правки

[[Category:VitaliPrivate]]

== ~~Benchmarking~~ General benchmarking principles ==

~~You should mainly~~ Main test ~~for the following use-~~casesfor benchmarking are:* Linear read and write (big blocks, big queue)in MB/s* Highly parallel random read and write of small blocks (4-8kb, iodepth=32-128) in IOPS (Input/Output ops per second)* Single-threaded transactional random write (4-8kb, iodepth=1) and read (though single-threaded reads are more rare)in IOPS

Single-threaded random ~~reads~~ read and ~~writes are~~ write is where the latency matters, and the latency doesn’t scale with the number of servers. Whenever you’re benchmarking your cluster with iodepth=1 you’re benchmarking only ONE placement group (triplet or pair of OSDs) at a time. The result is only affected by how fast 1 OSD is responding to 1 request. In fact, with only one parallel request IOPS = 1/latency.

The latency really matters because not many applications can do random writes with high parallelism/iodepth. For example, a DBMS can’t, because it’s transactional and it needs to serialize its writes to the journal.

There is Nick Fisk’s presentation titled «Low-latency Ceph». By «low-latency» he only means 0.7ms, which is only ~1500 iops.

=== Test your disks ===

Run `fio` on your drives:

{{Box|[[File:Warning icon.svg|32px|link=]] {{red|WARNING!}} For those under a rock — fio write test is DESTRUCTIVE. Don’t dare to run it on disks which have important data… for example, OSD journals (I’ve seen such cases).}}

* Try to disable drive cache before testing: {{Cmd|hdparm -W 0 /dev/sdX}} (SATA drives), {{Cmd|1=sdparm --set WCE=0 /dev/sdX}} (SAS drives). This is usually ABSOLUTELY required for server SSDs like Micron 5100 or Seagate Nytro (see [[#Drive cache is slowing down]]) as it increases random write iops ''more than by two magnitudes'' (from 288 iops to 18000 iops!). In some cases it may not improve anything, so try both options -W0 and -W1.

* Linear read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -iodepth=32 -rw=read -runtime=60 -filename=/dev/sdX}}

* Linear write: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -iodepth=32 -rw=write -runtime=60 -filename=/dev/sdX}}

* Peak parallel random read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128 -rw=randread -runtime=60 -filename=/dev/sdX}}

* Single-threaded read latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=randread -runtime=60 -filename=/dev/sdX}}

* Peak parallel random read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128 -rw=randwrite -runtime=60 -filename=/dev/sdX}}

* Journal write latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=write -runtime=60 -filename=/dev/sdX}}. Also try it with <tt>-fsync=1</tt> instead of <tt>-sync=1</tt> and write down the worst result, because sometimes one of sync or fsync is ignored by messy hardware.

* Single-threaded random write latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=/dev/sdX}}

WHY SO SLOW? See below.

[[File:Warning icon.svg|32px|link=]] A useful habit is to leave an empty partition for later benchmarking on each SSD you deploy Ceph OSDs on, because some SSDs tend to slow down when filled.

== Bluestore vs Filestore ==

VitaliyFilippov

Бюрократ, администратор

13 878

правок

Изменения

Ceph performance

Навигация

Персональные инструменты

Пространства имён

Варианты

Просмотры

Ещё

Поиск

Навигация

разделы

Инструменты