Commit Graph

1299 Commits (hier-failure-domains)

Author SHA1 Message Date
Vitaliy Filippov 72f0cff79d WIP Use random_hier_combinations
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m31s Details
Test / test_cas (push) Successful in 10s Details
Test / make_test (push) Successful in 36s Details
Test / test_change_pg_size (push) Successful in 18s Details
Test / test_change_pg_count (push) Successful in 49s Details
Test / test_create_nomaxid (push) Successful in 8s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_etcd_fail (push) Successful in 52s Details
Test / test_interrupted_rebalance (push) Successful in 1m21s Details
Test / test_add_osd (push) Successful in 2m35s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_minsize_1 (push) Successful in 18s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 52s Details
Test / test_move_reappear (push) Successful in 25s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m33s Details
Test / test_rebalance_verify (push) Successful in 2m37s Details
Test / test_rebalance_verify_imm (push) Successful in 2m34s Details
Test / test_rm (push) Successful in 11s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m15s Details
Test / test_snapshot (push) Successful in 20s Details
Test / test_rebalance_verify_ec (push) Successful in 3m6s Details
Test / test_splitbrain (push) Successful in 22s Details
Test / test_snapshot_ec (push) Successful in 32s Details
Test / test_write_no_same (push) Successful in 18s Details
Test / test_write (push) Successful in 55s Details
Test / test_write_xor (push) Successful in 1m41s Details
Test / test_heal_pg_size_2 (push) Successful in 4m21s Details
Test / test_heal_ec (push) Successful in 4m48s Details
Test / test_change_pg_count_ec (push) Successful in 43s Details
2023-05-18 17:44:00 +03:00
Vitaliy Filippov c1d470522c Replace flatten_tree with extract_tree_levels 2023-05-18 17:44:00 +03:00
Vitaliy Filippov 57feb7f390 Implement multi-level tree extractor for hierarchical failure domains 2023-05-18 17:44:00 +03:00
Vitaliy Filippov 431f780347 Implement a PG generator for hierarchical failure domains 2023-05-18 17:44:00 +03:00
Vitaliy Filippov 98077a1712 Remove unused dependencies from CSI 2023-05-18 11:54:47 +03:00
Vitaliy Filippov 1c7d53996d Reweight only 2 OSDs to zero in test_rebalance_verify, otherwise the test does not pass with EC 3+2
Test / buildenv (push) Successful in 9s Details
Test / build (push) Successful in 2m20s Details
Test / test_cas (push) Successful in 11s Details
Test / make_test (push) Successful in 35s Details
Test / test_change_pg_size (push) Successful in 22s Details
Test / test_change_pg_count (push) Successful in 52s Details
Test / test_create_nomaxid (push) Successful in 19s Details
Test / test_change_pg_count_ec (push) Successful in 1m3s Details
Test / test_failure_domain (push) Successful in 13s Details
Test / test_etcd_fail (push) Successful in 1m0s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m3s Details
Test / test_interrupted_rebalance (push) Successful in 1m14s Details
Test / test_minsize_1 (push) Successful in 22s Details
Test / test_move_reappear (push) Successful in 18s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m1s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m38s Details
Test / test_rebalance_verify (push) Successful in 2m20s Details
Test / test_rebalance_verify_imm (push) Successful in 2m1s Details
Test / test_rm (push) Successful in 26s Details
Test / test_rebalance_verify_ec (push) Successful in 2m30s Details
Test / test_snapshot (push) Successful in 22s Details
Test / test_snapshot_ec (push) Successful in 28s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_write (push) Successful in 48s Details
Test / test_write_no_same (push) Successful in 15s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m11s Details
Test / test_write_xor (push) Successful in 1m28s Details
Test / test_heal_pg_size_2 (push) Successful in 4m48s Details
Test / test_heal_ec (push) Successful in 5m12s Details
Test / test_add_osd (push) Successful in 1m20s Details
2023-05-18 00:42:40 +03:00
Vitaliy Filippov 2ca07b1ea7 Raise timeout in test_rebalance_verify
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m27s Details
Test / test_cas (push) Successful in 11s Details
Test / make_test (push) Successful in 34s Details
Test / test_change_pg_size (push) Successful in 22s Details
Test / test_change_pg_count (push) Successful in 52s Details
Test / test_create_nomaxid (push) Successful in 8s Details
Test / test_failure_domain (push) Successful in 12s Details
Test / test_etcd_fail (push) Successful in 1m0s Details
Test / test_interrupted_rebalance (push) Successful in 1m15s Details
Test / test_add_osd (push) Successful in 2m33s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m4s Details
Test / test_change_pg_count_ec (push) Successful in 2m52s Details
Test / test_minsize_1 (push) Successful in 19s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 53s Details
Test / test_move_reappear (push) Successful in 21s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m36s Details
Test / test_rebalance_verify (push) Successful in 2m22s Details
Test / test_rebalance_verify_imm (push) Successful in 2m22s Details
Test / test_rm (push) Successful in 15s Details
Test / test_snapshot (push) Successful in 19s Details
Test / test_snapshot_ec (push) Successful in 27s Details
Test / test_rebalance_verify_ec (push) Failing after 3m6s Details
Test / test_splitbrain (push) Successful in 17s Details
Test / test_write_no_same (push) Successful in 20s Details
Test / test_rebalance_verify_ec_imm (push) Failing after 3m9s Details
Test / test_write (push) Successful in 49s Details
Test / test_write_xor (push) Successful in 1m17s Details
Test / test_heal_ec (push) Successful in 4m53s Details
Test / test_heal_pg_size_2 (push) Failing after 10m10s Details
2023-05-17 01:58:01 +03:00
Vitaliy Filippov 022176aa98 Fix NaN during PG optimisation if there are nonexisting OSDs in node_placement
Test / buildenv (push) Successful in 11s Details
Test / build (push) Successful in 2m28s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 40s Details
Test / test_change_pg_size (push) Successful in 23s Details
Test / test_change_pg_count (push) Successful in 1m1s Details
Test / test_create_nomaxid (push) Successful in 7s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_change_pg_count_ec (push) Successful in 1m35s Details
Test / test_etcd_fail (push) Successful in 51s Details
Test / test_add_osd (push) Successful in 2m27s Details
Test / test_interrupted_rebalance (push) Successful in 1m14s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m3s Details
Test / test_minsize_1 (push) Successful in 28s Details
Test / test_move_reappear (push) Successful in 41s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m49s Details
Test / test_rebalance_verify (push) Successful in 2m21s Details
Test / test_rm (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_snapshot (push) Successful in 20s Details
Test / test_snapshot_ec (push) Successful in 28s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_write (push) Successful in 1m6s Details
Test / test_write_xor (push) Successful in 1m42s Details
Test / test_heal_pg_size_2 (push) Successful in 4m57s Details
Test / test_heal_ec (push) Successful in 4m42s Details
Test / test_rebalance_verify_ec_imm (push) Failing after 2m19s Details
Test / test_rebalance_verify_ec (push) Failing after 2m25s Details
2023-05-17 01:20:30 +03:00
Vitaliy Filippov 120e3fa7bc Fix pool deletion
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 2m32s Details
Test / test_cas (push) Successful in 13s Details
Test / make_test (push) Successful in 35s Details
Test / test_change_pg_size (push) Successful in 21s Details
Test / test_change_pg_count (push) Successful in 53s Details
Test / test_create_nomaxid (push) Successful in 17s Details
Test / test_change_pg_count_ec (push) Successful in 1m3s Details
Test / test_failure_domain (push) Successful in 16s Details
Test / test_etcd_fail (push) Successful in 1m3s Details
Test / test_add_osd (push) Successful in 2m36s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m10s Details
Test / test_interrupted_rebalance (push) Successful in 1m24s Details
Test / test_minsize_1 (push) Failing after 28s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m8s Details
Test / test_move_reappear (push) Failing after 1m2s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m8s Details
Test / test_rebalance_verify_imm (push) Successful in 2m12s Details
Test / test_rebalance_verify (push) Successful in 2m22s Details
Test / test_rm (push) Successful in 21s Details
Test / test_snapshot (push) Successful in 24s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m19s Details
Test / test_snapshot_ec (push) Successful in 27s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_rebalance_verify_ec (push) Successful in 2m33s Details
Test / test_write_no_same (push) Successful in 15s Details
Test / test_write (push) Successful in 1m14s Details
Test / test_write_xor (push) Successful in 2m9s Details
Test / test_heal_ec (push) Successful in 4m25s Details
Test / test_heal_pg_size_2 (push) Successful in 4m59s Details
2023-05-17 00:45:59 +03:00
Vitaliy Filippov 629999f789 Clear journal_device and meta_device before initialising the next OSD in automatic mode 2023-05-15 23:58:55 +03:00
Vitaliy Filippov 93eca11ba2 Fix rhel 9 installation docs 2023-05-15 13:09:18 +03:00
Vitaliy Filippov 5a9e1ede52 Release 0.8.9
Test / buildenv (push) Successful in 9s Details
Test / build (push) Successful in 2m31s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 33s Details
Test / test_change_pg_size (push) Successful in 19s Details
Test / test_change_pg_count (push) Successful in 55s Details
Test / test_create_nomaxid (push) Successful in 21s Details
Test / test_change_pg_count_ec (push) Successful in 58s Details
Test / test_failure_domain (push) Successful in 13s Details
Test / test_etcd_fail (push) Successful in 1m4s Details
Test / test_interrupted_rebalance (push) Successful in 1m13s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m7s Details
Test / test_add_osd (push) Successful in 2m59s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m22s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m1s Details
Test / test_rebalance_verify (push) Successful in 2m12s Details
Test / test_minsize_1 (push) Successful in 15s Details
Test / test_rebalance_verify_imm (push) Successful in 2m4s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m9s Details
Test / test_rm (push) Successful in 17s Details
Test / test_snapshot (push) Successful in 23s Details
Test / test_rebalance_verify_ec (push) Successful in 2m31s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_ec (push) Successful in 30s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_write (push) Successful in 53s Details
Test / test_write_xor (push) Successful in 1m19s Details
Test / test_heal_pg_size_2 (push) Successful in 4m30s Details
Test / test_heal_ec (push) Successful in 4m32s Details
- The tests are now stable and run in a CI system based on Gitea CI
- The release includes final bug fixes for EC:
  - Implement missing EC recovery of allocation bitmap when built with ISA-L
  - Fix broken snapshot export with EC (allocation bitmap reads were giving incorrect results previously)
- Also fixed bugs manifesting under heavy load:
  - Fix monitor possibly applying incorrect PG history on retries
  - Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number
  - Allow writes to wait for free space again, but now correctly (previously dropped in 0.8.2)
  - Fix a rare segfault in client (handle client stop during incoming stream handling in 1 more place)
  - Make monitor correctly handle etcd connection errors - it could die instead of connecting to another etcd
  - Fix OSD rarely being unable to report PG states after a PG was taken over by another OSD
- Fixed return code for incomplete EC objects (now EIO) and made cluster client retry this error
- Made other small changes for tests: timeouts, nice/ionice for etcd, waiting conditions, NBD device checks and so on
2023-05-14 01:25:09 +03:00
Vitaliy Filippov 1c9a188600 Add tests to CI
Test / buildenv (push) Successful in 10s Details
Test / build (push) Successful in 10s Details
Test / test_cas (push) Successful in 12s Details
Test / make_test (push) Successful in 34s Details
Test / test_change_pg_size (push) Successful in 17s Details
Test / test_create_nomaxid (push) Successful in 9s Details
Test / test_change_pg_count (push) Successful in 1m29s Details
Test / test_failure_domain (push) Successful in 11s Details
Test / test_change_pg_count_ec (push) Successful in 1m35s Details
Test / test_etcd_fail (push) Successful in 52s Details
Test / test_add_osd (push) Successful in 2m13s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m4s Details
Test / test_interrupted_rebalance (push) Successful in 1m28s Details
Test / test_minsize_1 (push) Successful in 21s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m4s Details
Test / test_move_reappear (push) Successful in 30s Details
Test / test_interrupted_rebalance_ec (push) Successful in 1m53s Details
Test / test_rebalance_verify_imm (push) Successful in 2m14s Details
Test / test_rebalance_verify (push) Successful in 2m16s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m4s Details
Test / test_rm (push) Successful in 22s Details
Test / test_snapshot (push) Successful in 28s Details
Test / test_rebalance_verify_ec (push) Successful in 2m27s Details
Test / test_splitbrain (push) Successful in 24s Details
Test / test_snapshot_ec (push) Successful in 34s Details
Test / test_write_no_same (push) Successful in 19s Details
Test / test_write (push) Successful in 1m19s Details
Test / test_write_xor (push) Successful in 1m36s Details
Test / test_heal_pg_size_2 (push) Successful in 4m34s Details
Test / test_heal_ec (push) Successful in 4m21s Details
2023-05-14 00:06:09 +03:00
Vitaliy Filippov de3e609166 Add a FIXME about QEMU driver thread safety 2023-05-14 00:06:09 +03:00
Vitaliy Filippov 11481170f5 Add a FIXME about ENOSPC 2023-05-13 23:59:44 +03:00
Vitaliy Filippov e69d459d43 Allow rebalance to start in test_interrupted_rebalance, raise etcd start timeout 2023-05-13 15:16:28 +03:00
Vitaliy Filippov da82754baa Wait for conditions in test_move_reappear instead of waiting a fixed amount of time 2023-05-12 23:18:07 +03:00
Vitaliy Filippov d356aca030 Add missing $NO_SAME OSD argument to test_splitbrain 2023-05-12 23:18:07 +03:00
Vitaliy Filippov 04a273d213 Raise NBD timeout in tests 2023-05-12 23:18:07 +03:00
Vitaliy Filippov 6442010f93 Skip offline PGs during state reporting when the state is already deleted or taken over by another OSD
This fixes OSDs being unable to report PG states in rare conditions
2023-05-12 23:17:45 +03:00
Vitaliy Filippov 6f4dc16c59 Handle etcd connection errors correctly in mon (unhandled error events) 2023-05-11 11:02:44 +03:00
Vitaliy Filippov ce4a8067b5 Handle client stop during incoming stream handling in 1 more place 2023-05-11 01:53:41 +03:00
Vitaliy Filippov e431ecb715 Make tests more stable in CI 2023-05-11 01:53:41 +03:00
Vitaliy Filippov 8cac795445 Return EIO instead of EINVAL for incomplete EC objects 2023-05-11 01:15:23 +03:00
Vitaliy Filippov a409598b16 Wait for free space again, but count on big_write flushes instead of just flusher activity 2023-05-10 01:51:02 +03:00
Vitaliy Filippov f4c6765522 Ignore ENOENT in epoll_ctl 2023-05-08 20:39:20 +03:00
Vitaliy Filippov ad2916068a Fix test_add_osd rebalance timeout check 2023-05-08 20:39:20 +03:00
Vitaliy Filippov 321cb435a6 Fix monitor incorrectly changing PG count when last_clean_pgs contains less PGs than the new number 2023-05-08 20:39:20 +03:00
Vitaliy Filippov cfcf4f4355 Support checking /dev/nbdX nodes in Docker 2023-05-08 20:39:20 +03:00
Vitaliy Filippov e0fb17bfee Make etcd more stable in tests (add ionice and raise timeout) 2023-05-08 20:36:00 +03:00
Vitaliy Filippov 5b9031fecc Fix monitor possibly applying incorrect PG history under heavy load
Monitor could deceive itself by immediately saving PG configuration changes
which weren't applied to etcd yet in memory, and apply incorrect PG history
changes next time if the first update fails.

This usually only happened under heavy load and was caught in CI. :-)
2023-05-07 23:23:00 +03:00
Vitaliy Filippov 5da1d8e1b5 Fix EC just-bitmap reads (len=0) (fixes SCHEME=ec test_snapshot.sh) 2023-05-07 14:00:08 +03:00
Vitaliy Filippov 44f86f1999 Add a basic EC 2+2 recovery test (not really required, but let it be there) 2023-05-07 11:26:27 +03:00
Vitaliy Filippov 2d9a80c6f6 Implement missing bitmap recovery with ISA-L \(°□°)/ 2023-05-07 11:25:51 +03:00
Vitaliy Filippov 5e295e346e Do not make vitastor-mon part of vitastor.target 2023-04-29 00:17:47 +03:00
Vitaliy Filippov d9c0898b7c Notes about config and vitastor-disk cache status 2023-04-29 00:08:24 +03:00
Vitaliy Filippov 04cfb48361 Add a note about PVE 7.4 2023-04-28 11:37:11 +03:00
Vitaliy Filippov ab615849d6 Release 0.8.8
- Fix vitastor-cli rm/rm-data broken in 0.8.6 (missing messenger initialization)
- Prepare OSD read handler for upcoming version with scrub - allow "secondary reads" to return errors
- Fix OSDs re-peering PGs infinitely with a big number of PGs (reproduced in test_add_osd)
- Fix another variant of flusher sync-waiting stall (reproduced in test_write)
- Fix other tests in tests/ (will add them to Gitea CI soon)
- Add patches for QEMU 6.2-8.0
- Fix QEMU driver compatibility with QEMU 8.0
- Build packages for RHEL 9 clones (based on AlmaLinux 9)
2023-04-28 11:22:00 +03:00
Vitaliy Filippov 38be9a49c0 Add AlmaLinux 9 build to documentation 2023-04-28 02:00:52 +03:00
Vitaliy Filippov 7d6bf84a3e Add scripts/meson-buildoptions.sh to QEMU patches 2023-04-28 01:43:22 +03:00
Vitaliy Filippov 41a40a4123 Add QEMU spec patch for Alma/Rocky/RH 9 2023-04-28 01:32:06 +03:00
Vitaliy Filippov b94587ef0e Fix some build warnings 2023-04-28 00:44:27 +03:00
Vitaliy Filippov 2a2f4f6738 Add Almalinux 9 build 2023-04-28 00:40:50 +03:00
Vitaliy Filippov c768a9015f Fix QEMU driver compatibility with QEMU 8.0 2023-04-25 11:20:21 +03:00
Vitaliy Filippov 0d9e10cf96 Add patches for QEMU 6.2-8.0 2023-04-25 11:20:21 +03:00
Vitaliy Filippov b74ccb613c Fix another variant of flusher sync-waiting stall 2023-04-24 00:44:41 +03:00
Vitaliy Filippov 5052174918 Fix test_write_no_same (too large image) 2023-04-24 00:44:41 +03:00
Vitaliy Filippov eec9cf5575 Fix test_snapshot.sh - qemu-img requires explicit backing_fmt 2023-04-24 00:44:41 +03:00
Vitaliy Filippov a04dab0840 Initialize messenger in cluster_client listings 2023-04-24 00:44:41 +03:00
Vitaliy Filippov 160863f707 Print op pointer values in slow log 2023-04-23 17:54:00 +03:00