Commit Graph

644 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov a8d744ca0e Fix wording 2021-03-16 12:48:36 +03:00
Vitaliy Filippov b5ff44fb6f Change Telegram chat link 2021-03-16 12:48:36 +03:00
Vitaliy Filippov f918bc4543 Fix Russian README for CMake build 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 6875a838e0 Capture all by value in qemu_proxy 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 20781abd3d Add LICENSE 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 1f02f645c0 Add Russian version of the README 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ee44f64927 Introduce image names and metadata storage in etcd
Each inode has: image name, parent inode number & pool, size and readonly flag

Snapshots are created by switching image name to a different inode number
while using the older inode as parent.
2021-03-16 12:48:36 +03:00
Vitaliy Filippov abf0611d93 Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling 2021-03-16 12:48:36 +03:00
Vitaliy Filippov edbf0eb040 Add a test for snapshots, fix bugs. Now the test passes 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 09725038e7 Begin snapshot test 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 18f71b059a Fix part bitmap addresses 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 2db2ed22ea Fix several snapshot I/O bugs 2021-03-16 12:48:36 +03:00
Vitaliy Filippov aa7699da24 Fix subop generation for snapshot implementation 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 853ecba780 Actual snapshot support (untested) 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 2f9c76b8fc Report inode I/O statistics, aggregate it in the monitor 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 8da7f26459 Report inode space usage statistics to etcd, aggregate it in the monitor 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 9998b50c7e Add inode space usage statistics tracking to blockstore 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 0422d94a70 Send bitmaps with primary-reads, actually read bitmaps for READ ops 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ff2208ae70 Allocate bitmaps along with stripes to avoid memory fragmentation 2021-03-16 12:48:36 +03:00
Vitaliy Filippov ae54dddb0c Remove cryptic bitmap inlining from bs_op_t and osd_op_t, use bitmap in primary OSD code 2021-03-16 12:48:36 +03:00
Vitaliy Filippov bfc175fe0f Add "external" bitmap support to the secondary OSD protocol 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 07e10210b6 Use bitmap granularity for alignment checks 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 221b728fc9 Add "external" bitmap support to blockstore 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 6625aaae00 Add "external" bitmap support to osd_rmw 2021-03-16 12:48:36 +03:00
Vitaliy Filippov 7e6e1a5a82 Release 0.5.10
The version seems to be stable after this bunch of fixes :)

- Fix delete & write operation ordering during rebalance to not lose objects in the immediate_commit=off mode
- Fix a possible crash caused by very high iodepths
- Re-distribute PG primaries over OSDs that come up after a short downtime
- Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio
- Fix a journal flushing deadlock which sometimes occurred in the immediate_commit=off mode
- Fix a bug where OSDs could hang if the data device filled up
- Fix an allocator bug where it was unable to allocate up to last (n%64) data device blocks
- Fix monitor crash that occurred on removal of some etcd keys
- Fix a bug where PGs could remain incomplete due to incorrect PG history with just zeroes in osd_sets
2021-03-16 12:48:26 +03:00
Vitaliy Filippov 435045751d Delete objects only after a SYNC during rebalance in the non-immediate_commit mode
Previously OSDs could commit deletes before writes during recovery or rebalance
in the "lazy fsync" (immediate_commit=off) mode which could result in lost objects
2021-03-16 12:48:26 +03:00
Vitaliy Filippov c5fb1d5987 Do not duplicate blockstore operations when io_uring fills up
This bug was leading to OSDs dying with "Assertion `fulfilled == read_op->len' failed"
when testing fio -rw=randread -numjobs=8 -iodepth=128
2021-03-16 12:48:26 +03:00
Vitaliy Filippov 9f59381bea Re-distribute PG primaries over OSDs that come up after a short downtime 2021-03-16 12:48:26 +03:00
Vitaliy Filippov 9ac7e75178 Allow to specify etcd URLs for OSDs with http://, do not die with a strange error if -etcd option is missing for fio 2021-03-16 12:48:26 +03:00
Vitaliy Filippov 88671cf745 Fix a bug causing all flushers to wait for an fsync without actually trying to do it
This happened because flusher_count became dynamic and fsync_batch() was comparing the number
of flushers currently ready to do an fsync with the maximum number of flushers. Also the number
wasn't rechecked on every loop which was also incorrect.

Now the interrupted_rebalance test passes even without IMMEDIATE_COMMIT=1.
2021-03-13 17:27:29 +03:00
Vitaliy Filippov fe1749c427 Fix the multiple_interrupted_rebalance test 2021-03-13 17:19:45 +03:00
Vitaliy Filippov ceb9c28de7 Set default log_level before passing config to etcd_state_client 2021-03-13 17:19:45 +03:00
Vitaliy Filippov 299d7d7c95 Use common macro for get_sqe 2021-03-13 17:19:45 +03:00
Vitaliy Filippov d1526b415f Correctly resume writes when OSD is full to return an error 2021-03-13 17:19:45 +03:00
Vitaliy Filippov f49fd53d55 Fix a bug where allocator was unable to allocate up to last (n%64) blocks, add tests for it 2021-03-13 02:19:02 +03:00
Vitaliy Filippov dd76eda5e5 Test multiple interrupted rebalancings
Currently only passes with immediate_commit=all configuration
(env variable IMMEDIATE_COMMIT=1 for the bash script)
2021-03-12 12:55:44 +03:00
Vitaliy Filippov 87dbd8fa57 Use empty hash as the default value for some etcd keys in the monitor 2021-03-12 12:40:15 +03:00
Vitaliy Filippov b44f49aab2 Ignore zero OSDs in history osd_sets 2021-03-12 12:40:15 +03:00
Vitaliy Filippov 036555638e Release 0.5.9
- Fix two monitor bugs which led to objects being "logically lost" (physically
  present on some secondary OSDs while primary doesn't know about it) after multiple
  interrupted rebalancings
- Implement "no_recovery" and "no_rebalance" flags
2021-03-11 00:39:10 +03:00
Vitaliy Filippov af5155fcd9 Implement "no_recovery" and "no_rebalance" flags 2021-03-11 00:36:31 +03:00
Vitaliy Filippov 0d2efbecc9 Preserve previous PG history when changing PG distribution
Fixes incorrect PG history in case when a new rebalance is started
before the finish of the previous one which could make primary OSDs unable
to locate some objects on some secondaries.
2021-03-11 00:16:10 +03:00
Vitaliy Filippov e62e8b6bae Use real pg configuration instead of the "last clean" one for generating PG history
Basically fixes the bug introduced in 0.5.7 where an rebalance interrupted
by the monitor could result in forgetting objects moved to the new place
2021-03-10 02:01:44 +03:00
Vitaliy Filippov c4ba24c305 Do not print ping op latency 2021-03-10 02:01:44 +03:00
Vitaliy Filippov 19e47a0279 Release 0.5.8
- Add heartbeats (fixes failover in case of network issues or offline nodes)
- Fix a bug where a PG could incorrectly become listed as 'incomplete' if historical osd_sets
  included a set with the the PG's primary OSD as the only alive one
- Use osd_out_time = 10 minutes by default instead of 30 minutes
- Make monitors stick to a single selected etcd URL on start and not try to select random ones
  on every request - this was leading to etcd interaction errors when some etcds were unavailable
2020-03-09 02:38:17 +03:00
Vitaliy Filippov bd178ac20f Fix history osd_set check - local OSD is always available! 2021-03-09 02:18:18 +03:00
Vitaliy Filippov 7006875a24 Make monitor stick to one etcd until the restart 2021-03-09 02:15:38 +03:00
Vitaliy Filippov ad577c4aac Add PING operation and timeouts to detect OSD failures when a host goes down 2021-03-09 02:15:38 +03:00
Vitaliy Filippov 836635c518 Use osd_out_time = 10 minutes by default 2021-03-09 02:15:38 +03:00
Vitaliy Filippov 88a03f4e98 Release 0.5.7
- Fix multiple bugs leading to OSDs sometimes being unable to correctly activate PGs
  when a lot of PG peering events occurred in a small amount of time
- Fix a bug where OSDs could list incomplete object versions during peering. The bug
  manifested with "local rollback operation failed" messages in OSD logs
- Fix a bug where misplaced chunks for degraded and incomplete objects were not removed
  from extra OSDs during recovery
- Fix incorrect PG history configuration resulting in OSDs being unable to find some
  of the objects after a PG count change
- Simplify block layer write ordering logic
- Avoid extra data move when a lot of OSDs are first stopped for long time and then restarted
- Fix incorrect degraded & misplaced object statistics after a completed rebalance
- Fix incorrect usage of pg_minsize instead of the minimal possible object chunk count in EC pools
2021-03-08 23:37:02 +03:00
Vitaliy Filippov 2a5036669d Fix PG count change procedure
In previous versions PG histories were calculated incorrectly during
PG count change which led to objects being lost on OSDs not in PG's osd set.
2021-03-08 23:15:58 +03:00