13 787
правок
Изменения
Нет описания правки
== Bluestore vs Filestore ==
TODO: This section lacks random read performance comparisons.
Bluestore is the «new» storage layer of Ceph. All presentations and documents say it’s better in all ways, which in fact seems reasonable for something «new».
Bluestore is also more feature-rich: it has checksums, compression, erasure-coded overwrites and virtual clones. Checksums allow 2x-replicated pools self-heal better, erasure-coded overwrites make EC usable for RBD and CephFS, and virtual clones make VMs run faster after taking a snapshot.
In HDD-only (or bad-SSD-only) setups Bluestore uses a lot more RAM, though, because it uses RocksDB for all metadata, additionally caches some of them by itself and is also tries to cache some data blocks to compensate 2x faster than Filestore for the lack of page cache usagerandom writes. The general rule of thumb This is 1GB again because it can do 1 commit per 1TB of storagewrite, but not less than 2GB per an OSDat least if you apply this patch: https://github.com/ceph/ceph/pull/26909 and turn bluefs_preextend_wal_files on. In fact it’s OK to say that Bluestore’s deferred write implementation is really optimal for transactional writes to slow drives.
Filestore writes everything to the journal and only starts to flush it to the data device when the journal fills up to the configured percent. This is very convenient because it makes journal act as a «temporary buffer» that absorbs random write bursts.
But it’s still a shame that the increase is only 5-10 % for that amount of architectural effort.
=== HDD-only (or bad-SSD-only) RAM usage ===
=== About the sizing of block.db ===