Изменения

Ceph performance

10 байтов добавлено, 14:37, 24 июля 2019

Нет описания правки

You say OK, I don’t care. I’ll just read from both drives and if I encounter different data I’ll just pick one of the copies, and I’ll either get the old data or the new.

But then imagine that you have RAID 5. Now you have three drives: two for data and one for parity. Now suppose that you overwrite a sector again. Before ~~writing~~ the write, your disks contain: (A1), (B1) and (A1 XOR B1). You want to overwrite (B1) with (B2). To do so you write (B2) to the second disk and (A1 XOR B2) to the third. A power failure occurs again… ~~And then, on~~ At the ~~next boot, you also find out that~~ same time disk 1 (one that you didn’t write anything to) ~~is dead~~fails. You might think that you can still reconstruct your data because you have RAID 5 and 2 disks out of 3 are still alive.

But imagine that disk 2 succeeded to write new data~~, while~~ and disk 3 failed(or vice versa). Now you have: (lost disk), (B2) and (A1 XOR B1). If you try to reconstruct A from these copies you’ll get (A1 XOR B1 XOR B2) which is obviously not equal to A1. Bang! Your RAID5 corrupted the data that you haven’t even been writing at the time of the power loss.

Because of this problem, Linux `mdadm` refuses ~~at all~~ to start an incomplete array after unclean shutdownat all. There’s no solution to this problem except full data journaling at the level of each disk drive. And this is… exactly what Ceph does! So, Ceph is actually safer than RAID. Slower — but safer :)

== Quick insight into SSD and flash memory organization ==

VitaliyFilippov

Бюрократ, администратор

13 878

правок

Изменения

Ceph performance

YourcmcWiki