Изменения

Перейти к: навигация, поиск

Ceph performance

2330 байтов добавлено, 14:55, 22 июля 2019
м
Нет описания правки
[[Category:VitaliPrivate]]
== Benchmarking General benchmarking principles ==
You should mainly Main test for the following use-casesfor benchmarking are:* Linear read and write (big blocks, big queue)in MB/s* Highly parallel random read and write of small blocks (4-8kb, iodepth=32-128) in IOPS (Input/Output ops per second)* Single-threaded transactional random write (4-8kb, iodepth=1) and read (though single-threaded reads are more rare)in IOPS
Single-threaded random reads read and writes are write is where the latency matters, and the latency doesn’t scale with the number of servers. Whenever you’re benchmarking your cluster with iodepth=1 you’re benchmarking only ONE placement group (triplet or pair of OSDs) at a time. The result is only affected by how fast 1 OSD is responding to 1 request. In fact, with only one parallel request IOPS = 1/latency.
The latency really matters because not many applications can do random writes with high parallelism/iodepth. For example, a DBMS can’t, because it’s transactional and it needs to serialize its writes to the journal.
There is Nick Fisk’s presentation titled «Low-latency Ceph». By «low-latency» he only means 0.7ms, which is only ~1500 iops.
 
=== Test your disks ===
 
Run `fio` on your drives:
 
{{Box|[[File:Warning icon.svg|32px|link=]] {{red|WARNING!}} For those under a rock — fio write test is DESTRUCTIVE. Don’t dare to run it on disks which have important data… for example, OSD journals (I’ve seen such cases).}}
 
* Try to disable drive cache before testing: {{Cmd|hdparm -W 0 /dev/sdX}} (SATA drives), {{Cmd|1=sdparm --set WCE=0 /dev/sdX}} (SAS drives). This is usually ABSOLUTELY required for server SSDs like Micron 5100 or Seagate Nytro (see [[#Drive cache is slowing down]]) as it increases random write iops ''more than by two magnitudes'' (from 288 iops to 18000 iops!). In some cases it may not improve anything, so try both options -W0 and -W1.
* Linear read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -iodepth=32 -rw=read -runtime=60 -filename=/dev/sdX}}
* Linear write: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -iodepth=32 -rw=write -runtime=60 -filename=/dev/sdX}}
* Peak parallel random read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128 -rw=randread -runtime=60 -filename=/dev/sdX}}
* Single-threaded read latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=randread -runtime=60 -filename=/dev/sdX}}
* Peak parallel random read: {{Cmd|1=fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128 -rw=randwrite -runtime=60 -filename=/dev/sdX}}
* Journal write latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=write -runtime=60 -filename=/dev/sdX}}. Also try it with <tt>-fsync=1</tt> instead of <tt>-sync=1</tt> and write down the worst result, because sometimes one of sync or fsync is ignored by messy hardware.
* Single-threaded random write latency: {{Cmd|1=fio -ioengine=libaio -sync=1 -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60 -filename=/dev/sdX}}
 
WHY SO SLOW? See below.
 
[[File:Warning icon.svg|32px|link=]] A useful habit is to leave an empty partition for later benchmarking on each SSD you deploy Ceph OSDs on, because some SSDs tend to slow down when filled.
== Bluestore vs Filestore ==

Навигация