Commit Graph

851 Commits (master)

Author SHA1 Message Date
Vitaliy Filippov ecfc753e93 Add basic NFS tests, fix bugs 2024-03-16 13:24:36 +03:00
Vitaliy Filippov a574f9ad71 Return block NFS implementation back as an option too 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 7c235c9103 Move KV FS header into a separate file 2024-03-16 13:24:36 +03:00
Vitaliy Filippov e5bb986164 Implement packing small files into shared inodes 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 181795d748 Split new NFS proxy implementation into multiple files 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 8cdc38805b WIP VitastorFS with metadata storage in VitastorKV 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 0cd455d17f First just recheck version without actually re-reading block in vitastor-kv 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 32ba653ba6 Fix vitastor-kv hang on reopen & unfinished closed listing 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 231d4b15fc Add loadable dump format to vitastor-kv (dump) 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 9dc4d5fd7b Fix freeing r/w buffers on errors in kv_db 2024-03-16 13:24:36 +03:00
Vitaliy Filippov e58538fa47 Fix eviction when random_pos selects the end 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 11ac9e7024 Implement min/max list_count to make listings during performance test reasonable 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 511bc3df1c Fix and improve parallel allocation
- Do not try to allocate more DB blocks in an inode block until it's "confirmed" and "locked" by the first write
- Do not recheck for new zero DB blocks on first write into an inode block - a CAS failure means someone else is already writing into it
- Throw new allocation blocks away regardless of whether the known_version is 0 on a CAS failure
2024-03-16 13:24:36 +03:00
Vitaliy Filippov a64f0d1f73 Implement key_prefix for K/V stress test 2024-03-16 13:24:36 +03:00
Vitaliy Filippov ec5f7c6b87 More fixes
- do not overwrite a block with older version if known version is newer
  (read may start before update and end after update)
- invalidated block versions can't be remembered and trusted
- right boundary for split blocks is right_half when diving down, not key_lt
- restart update also when block is "invalidated", not just on version mismatch
- copy callback in listings to avoid closure destruction bugs too
2024-03-16 13:24:36 +03:00
Vitaliy Filippov 3ebed9a749 Add logging and one more assert 2024-03-16 13:24:36 +03:00
Vitaliy Filippov eab67a6e8f Make get_block() wait for updating when unrelated block is found along the path 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 20993d9b7a Fix a race condition where changed blocks were parsed over existing cached blocks and getting a mix of data 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 5cf9b343c0 Simplify code by removing an unneeded "optimisation" 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 79ae0aadcd Add kv_log_level, print warnings on level 1, trace ops on level 10 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 605afc3583 Fix duplicate keys in listings on parallel updates -- do not rewind key "iterator position" 2024-03-16 13:24:36 +03:00
Vitaliy Filippov c0681d8242 Implement key suffix to avoid collisions of multiple test workers 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 763e77b4f4 Do not complain on empty first block 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 19426aa4c5 Add JSON output for stress-tester 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 08f586bcec Print total stats 2024-03-16 13:24:36 +03:00
Vitaliy Filippov f1cd87473a Do not send more than op_count operations (fix segfault on finish) 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 1bd8d2da56 Add some more resiliency to serialize() 2024-03-16 13:24:36 +03:00
Vitaliy Filippov a7396d2baf Invalidate blocks being updated too 2024-03-16 13:24:36 +03:00
Vitaliy Filippov e98a38810d Change new block allocation method: make each writer choose multiple empty PG blocks and place blocks in them 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 28c4324c36 Remove blocks from cache on unsuccessful updates 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 31ec3fa8f5 Allow to track multiple updates per block (it should never happen though) 2024-03-16 13:24:36 +03:00
Vitaliy Filippov e4fa26f60a Do not call stop_updating after failed write_new_block and after clear_block (both delete the item) 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 59ae27f9e5 Track versions of parent blocks and recheck if changed during update 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 2c6a301d9b Fix resume_split condition (key_lt can also be "") 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 01558349f8 Experiment: transform offsets for better sharding 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 36f4717d0d More post-stress-test fixes
- Prevent _split types of new blocks
- Stop updating new blocks only after the whole update, otherwise pointers
  may become invalid
- Use recheck_none for updates initially
- Use UINT64_MAX as initial block version when postponing ops, otherwise the
  check fails when the block is initially empty. This for example leads to
  writing both leaf items & block pointers (which is incorrect) into the root
  block when starting stress-test with --parallelism 32
- Fix -EINTR comparison
2024-03-16 13:24:36 +03:00
Vitaliy Filippov babaf2a0ce Print operation statistics 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 5773f1a375 K/V fixes after stress-test :-)
- track block versions correctly - per inode block (128kb) instead of tree block (4kb)
- prevent multiple parallel CAS writes of the same inode block
- add logging for EILSEQ which means invalid data in the tree
- fix get_block updated flag which was true for blocks already in cache and was leading to infinite loops on "unrelated block" errors
- apply changes to blocks in cache only after successful writes (using "virtual changes")
- do not replace cached block with an older version from disk
- recheck "unrelated blocks" (read/update collisions) until data stops changing
- track tree path correctly - do not treat split block as parent of its right half
- correctly move blocks when finding new empty place on disk
- restart updates from the beginning when one of blocks is changed by a parallel update
- fix delete using SET opcode and setting key to the empty value instead
- prevent changing the same key more than 1 time in parallel
- fix listing verification
- resume continue_updates in update_find (required because it uses continue_update itself)
- add allow_old_cached parameter to get()
2024-03-16 13:24:36 +03:00
Vitaliy Filippov 57222a9f79 Implement K/V DB stress tester 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 61ef000c6e Evict blocks based on memory limit & block usage 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 7d5e1cc393 Track blocks per level 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 5e7f27a02d Track block level 2024-03-16 13:24:36 +03:00
Vitaliy Filippov fd1d8a8520 Experimental B-Tree Vitastor embedded K/V database implementation! 2024-03-16 13:24:36 +03:00
Vitaliy Filippov c364e14c40 Stop then retry, not retry then stop 2024-03-16 13:24:36 +03:00
Vitaliy Filippov 3ebbfa0428 Fix another rare OSD hang on zeroing out entries on start 2024-03-16 13:24:36 +03:00
Vitaliy Filippov aa79d1db1c Fix incorrect "changing scheme" message in modify-pool
Test / test_rm (push) Successful in 14s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m32s Details
Test / test_move_reappear (push) Successful in 20s Details
Test / test_snapshot_down (push) Successful in 29s Details
Test / test_snapshot_down_ec (push) Successful in 29s Details
Test / test_splitbrain (push) Successful in 28s Details
Test / test_snapshot_chain (push) Successful in 2m5s Details
Test / test_snapshot_chain_ec (push) Successful in 3m3s Details
Test / test_rebalance_verify_imm (push) Successful in 4m0s Details
Test / test_rebalance_verify (push) Successful in 4m40s Details
Test / test_switch_primary (push) Successful in 38s Details
Test / test_write (push) Successful in 41s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_write_xor (push) Successful in 1m2s Details
Test / test_rebalance_verify_ec (push) Successful in 5m34s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 5m34s Details
Test / test_heal_pg_size_2 (push) Successful in 3m22s Details
Test / test_heal_ec (push) Successful in 4m58s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m37s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m21s Details
Test / test_heal_csum_32k (push) Successful in 7m1s Details
Test / test_scrub (push) Successful in 1m37s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m59s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m26s Details
Test / test_scrub_xor (push) Successful in 1m3s Details
Test / test_heal_csum_4k_dj (push) Successful in 7m20s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m7s Details
Test / test_scrub_ec (push) Successful in 36s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m37s Details
Test / test_heal_csum_4k (push) Successful in 6m23s Details
2024-03-06 00:41:35 +03:00
Vitaliy Filippov a1fecb7eff Move callback away when calling it in cluster_client 2024-03-06 00:41:35 +03:00
Vitaliy Filippov ff74b19423 Fix rare OSD hang on zeroing out bad entries on start 2024-03-06 00:41:35 +03:00
Vitaliy Filippov 4cf6dceed7 Merge branch 'rel-1.4'
Test / test_minsize_1 (push) Has been cancelled Details
Test / test_move_reappear (push) Has been cancelled Details
Test / test_rm (push) Has been cancelled Details
Test / test_snapshot_chain (push) Has been cancelled Details
Test / test_snapshot_chain_ec (push) Has been cancelled Details
Test / test_snapshot_down (push) Has been cancelled Details
Test / test_snapshot_down_ec (push) Has been cancelled Details
Test / test_splitbrain (push) Has been cancelled Details
Test / test_rebalance_verify (push) Has been cancelled Details
Test / test_rebalance_verify_imm (push) Has been cancelled Details
Test / test_rebalance_verify_ec (push) Has been cancelled Details
Test / test_rebalance_verify_ec_imm (push) Has been cancelled Details
Test / test_switch_primary (push) Has been cancelled Details
Test / test_write (push) Has been cancelled Details
Test / test_write_xor (push) Has been cancelled Details
Test / test_write_no_same (push) Has been cancelled Details
Test / test_heal_pg_size_2 (push) Has been cancelled Details
Test / test_heal_ec (push) Has been cancelled Details
Test / test_heal_csum_32k_dmj (push) Has been cancelled Details
Test / test_heal_csum_32k_dj (push) Has been cancelled Details
Test / test_heal_csum_32k (push) Has been cancelled Details
Test / test_heal_csum_4k_dmj (push) Has been cancelled Details
Test / test_heal_csum_4k_dj (push) Has been cancelled Details
Test / test_heal_csum_4k (push) Has been cancelled Details
Test / test_scrub (push) Has been cancelled Details
Test / test_scrub_zero_osd_2 (push) Has been cancelled Details
Test / test_scrub_xor (push) Has been cancelled Details
Test / test_scrub_pg_size_3 (push) Has been cancelled Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Has been cancelled Details
Test / test_scrub_ec (push) Has been cancelled Details
2024-02-29 09:59:01 +03:00
Vitaliy Filippov 38b8963330 Release 1.4.8
Test / test_rm (push) Successful in 19s Details
Test / test_move_reappear (push) Successful in 26s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m40s Details
Test / test_snapshot_down (push) Successful in 31s Details
Test / test_snapshot_down_ec (push) Successful in 34s Details
Test / test_splitbrain (push) Successful in 27s Details
Test / test_snapshot_chain (push) Successful in 2m18s Details
Test / test_snapshot_chain_ec (push) Successful in 2m59s Details
Test / test_rebalance_verify_imm (push) Successful in 5m32s Details
Test / test_rebalance_verify (push) Successful in 6m11s Details
Test / test_switch_primary (push) Successful in 41s Details
Test / test_write (push) Successful in 45s Details
Test / test_write_no_same (push) Successful in 23s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 5m2s Details
Test / test_write_xor (push) Successful in 55s Details
Test / test_rebalance_verify_ec (push) Successful in 6m22s Details
Test / test_heal_pg_size_2 (push) Successful in 5m41s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m59s Details
Test / test_heal_csum_32k_dj (push) Successful in 7m19s Details
Test / test_heal_csum_32k (push) Successful in 7m17s Details
Test / test_heal_csum_4k_dmj (push) Successful in 7m14s Details
Test / test_scrub (push) Successful in 1m12s Details
Test / test_heal_ec (push) Successful in 9m2s Details
Test / test_scrub_xor (push) Successful in 56s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m8s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 2m1s Details
Test / test_heal_csum_4k_dj (push) Successful in 4m45s Details
Test / test_scrub_pg_size_3 (push) Successful in 2m31s Details
Test / test_heal_csum_4k (push) Successful in 4m54s Details
Test / test_scrub_ec (push) Successful in 46s Details
- Do not use \r if output is not a terminal (should fix unexpected job output in proxmox)
- Fix rm/rm-data error return code, add --down-ok option to bypass the error
- Add EIO retry timeout and allow to disable these retries, rename up_wait_retry_interval to client_retry_interval
- Add ubuntu jammy build
- Wait for blockstore initialisation before starting OSD (prevent timeouts when init takes time)
- Fix a rare use-after-free in automatic sync after delete in blockstore
2024-02-29 09:58:34 +03:00
Vitaliy Filippov 77167e2920 Do not use \r if output is not a terminal 2024-02-29 00:21:17 +03:00
Vitaliy Filippov 5af23672d0 Fix rm/rm-data error return code, add --down-ok option to bypass the error 2024-02-29 00:20:10 +03:00
Vitaliy Filippov 6bf1f539a6 Add EIO retry timeout and allow to disable these retries, rename up_wait_retry_interval to client_retry_interval 2024-02-28 13:10:02 +03:00
Vitaliy Filippov 4eab26f968 Add documentation and a very basic test for pool management commands
Test / test_snapshot_ec (push) Successful in 31s Details
Test / test_rm (push) Successful in 17s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_snapshot_down (push) Successful in 27s Details
Test / test_snapshot_down_ec (push) Successful in 33s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_snapshot_chain (push) Successful in 2m15s Details
Test / test_snapshot_chain_ec (push) Successful in 2m58s Details
Test / test_rebalance_verify_imm (push) Successful in 5m3s Details
Test / test_rebalance_verify (push) Successful in 5m36s Details
Test / test_switch_primary (push) Successful in 37s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m3s Details
Test / test_write_no_same (push) Successful in 21s Details
Test / test_write (push) Successful in 58s Details
Test / test_write_xor (push) Successful in 1m31s Details
Test / test_rebalance_verify_ec (push) Successful in 6m20s Details
Test / test_heal_pg_size_2 (push) Successful in 4m7s Details
Test / test_heal_ec (push) Successful in 4m33s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m53s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m17s Details
Test / test_heal_csum_32k (push) Successful in 7m23s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m56s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m26s Details
Test / test_scrub (push) Successful in 1m29s Details
Test / test_heal_csum_4k_dj (push) Successful in 7m1s Details
Test / test_scrub_xor (push) Successful in 1m1s Details
Test / test_heal_csum_4k (push) Successful in 6m34s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 32s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m19s Details
Test / test_scrub_ec (push) Successful in 24s Details
2024-02-28 13:08:04 +03:00
Vitaliy Filippov 86243b7101 Rework & fix pool-create / pool-modify / pool-ls 2024-02-28 13:08:04 +03:00
idelson dc92851322 vitastor-cli: add commands to control pools: pool-create, pool-ls, pool-modify, pool-rm
PR #59 - https://github.com/vitalif/vitastor/pull/58/commits

By MIND Software LLC

By submitting this pull request, I accept Vitastor CLA
2024-02-28 13:08:04 +03:00
Vitaliy Filippov fc413038d1 Wait for blockstore initialisation before starting OSD
Test / test_cas (push) Has been cancelled Details
Test / test_change_pg_count (push) Has been cancelled Details
Test / test_change_pg_count_ec (push) Has been cancelled Details
Test / test_change_pg_size (push) Has been cancelled Details
Test / test_create_nomaxid (push) Has been cancelled Details
Test / test_etcd_fail (push) Has been cancelled Details
Test / test_interrupted_rebalance (push) Has been cancelled Details
Test / test_interrupted_rebalance_imm (push) Has been cancelled Details
Test / test_interrupted_rebalance_ec (push) Has been cancelled Details
Test / test_interrupted_rebalance_ec_imm (push) Has been cancelled Details
Test / test_failure_domain (push) Has been cancelled Details
Test / test_snapshot (push) Has been cancelled Details
Test / test_snapshot_ec (push) Has been cancelled Details
Test / test_minsize_1 (push) Has been cancelled Details
Test / test_move_reappear (push) Has been cancelled Details
Test / test_rm (push) Has been cancelled Details
Test / test_snapshot_chain (push) Has been cancelled Details
Test / test_snapshot_chain_ec (push) Has been cancelled Details
Test / test_snapshot_down (push) Has been cancelled Details
Test / test_snapshot_down_ec (push) Has been cancelled Details
Test / test_splitbrain (push) Has been cancelled Details
Test / test_rebalance_verify (push) Has been cancelled Details
Test / test_rebalance_verify_imm (push) Has been cancelled Details
Test / test_rebalance_verify_ec (push) Has been cancelled Details
Test / test_rebalance_verify_ec_imm (push) Has been cancelled Details
Test / test_switch_primary (push) Has been cancelled Details
Test / test_write (push) Has been cancelled Details
Test / test_write_xor (push) Has been cancelled Details
Test / test_write_no_same (push) Has been cancelled Details
Test / test_heal_pg_size_2 (push) Has been cancelled Details
2024-02-27 02:20:04 +03:00
Vitaliy Filippov 1bc0b5aab3 Fix a rare use-after-free in automatic sync after delete in blockstore
Test / test_interrupted_rebalance_ec (push) Successful in 2m49s Details
Test / test_rm (push) Successful in 14s Details
Test / test_move_reappear (push) Successful in 21s Details
Test / test_snapshot_down (push) Successful in 31s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_chain (push) Successful in 2m29s Details
Test / test_snapshot_chain_ec (push) Successful in 2m48s Details
Test / test_rebalance_verify_imm (push) Successful in 4m9s Details
Test / test_rebalance_verify (push) Successful in 4m42s Details
Test / test_switch_primary (push) Successful in 41s Details
Test / test_write (push) Successful in 43s Details
Test / test_write_no_same (push) Successful in 21s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m37s Details
Test / test_write_xor (push) Successful in 1m11s Details
Test / test_rebalance_verify_ec (push) Successful in 7m14s Details
Test / test_heal_pg_size_2 (push) Successful in 4m3s Details
Test / test_heal_ec (push) Successful in 4m18s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m5s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m52s Details
Test / test_heal_csum_32k (push) Successful in 6m23s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m23s Details
Test / test_scrub (push) Successful in 1m30s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m18s Details
Test / test_heal_csum_4k_dj (push) Successful in 7m9s Details
Test / test_scrub_xor (push) Successful in 57s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5s Details
Test / test_scrub_ec (push) Successful in 1m6s Details
Test / test_scrub_pg_size_3 (push) Successful in 2m3s Details
Test / test_heal_csum_4k (push) Successful in 4m54s Details
ASan report: [0] READ of size 16 at operator() /root/vitastor/src/blockstore_write.cpp:100
...[5] blockstore_impl_t::ack_sync(blockstore_op_t*) /root/vitastor/src/blockstore_sync.cpp:232
2024-02-24 00:06:36 +03:00
Vitaliy Filippov 5e934264cf Release 1.4.7
- Fix another old "BUG: Attempt to overwrite used offset" in a very simple
  case: bs=4k rw=write iodepth=16 from OSD start; add this case to tests
- Fix a rare crash with "unexpected state during flush: 0x51" possible with
  EC since 1.4.2 during rebalance and OSD outages
- Fix a rare write stall with EC & immediate_commit=none caused by sync
  operations reserving unneeded space in the journal
- Fix 32-bit build warnings, most in printf/scanf format strings
2024-02-22 12:45:52 +03:00
Vitaliy Filippov f20564b44b Fix 32-bit build warnings (99.9% in printf) 2024-02-22 12:22:16 +03:00
Vitaliy Filippov b3c15db331 32M journal by default in simple-offsets
Test / test_snapshot_ec (push) Successful in 30s Details
Test / test_rm (push) Successful in 18s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_snapshot_down (push) Successful in 26s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_chain (push) Successful in 2m17s Details
Test / test_snapshot_chain_ec (push) Successful in 2m55s Details
Test / test_rebalance_verify_imm (push) Successful in 2m46s Details
Test / test_rebalance_verify (push) Successful in 3m9s Details
Test / test_switch_primary (push) Successful in 39s Details
Test / test_write (push) Successful in 43s Details
Test / test_write_no_same (push) Successful in 19s Details
Test / test_write_xor (push) Successful in 55s Details
Test / test_rebalance_verify_ec (push) Successful in 3m35s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m37s Details
Test / test_heal_pg_size_2 (push) Successful in 3m36s Details
Test / test_heal_ec (push) Successful in 5m47s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m21s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m16s Details
Test / test_heal_csum_32k (push) Successful in 6m45s Details
Test / test_scrub (push) Successful in 1m56s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m39s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m42s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m16s Details
Test / test_scrub_xor (push) Successful in 47s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m26s Details
Test / test_heal_csum_4k (push) Successful in 6m32s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 48s Details
Test / test_scrub_ec (push) Successful in 49s Details
2024-02-21 15:25:02 +03:00
Vitaliy Filippov 685bcd6ef9 Do not reserve extra space for big_writes during sync - sync itself is needed to commit and clear them 2024-02-21 13:00:14 +03:00
Vitaliy Filippov 3eb389b321 Supposed fix for "unexpected state during flush: 0x51" with EC
Test / test_move_reappear (push) Successful in 22s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m32s Details
Test / test_rm (push) Successful in 16s Details
Test / test_snapshot_down (push) Successful in 31s Details
Test / test_snapshot_down_ec (push) Successful in 32s Details
Test / test_splitbrain (push) Successful in 25s Details
Test / test_snapshot_chain (push) Successful in 2m4s Details
Test / test_snapshot_chain_ec (push) Successful in 2m51s Details
Test / test_rebalance_verify_imm (push) Successful in 2m47s Details
Test / test_rebalance_verify (push) Successful in 3m30s Details
Test / test_switch_primary (push) Successful in 38s Details
Test / test_write (push) Successful in 51s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_write_xor (push) Successful in 52s Details
Test / test_rebalance_verify_ec (push) Successful in 3m32s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m7s Details
Test / test_scrub_zero_osd_2 (push) Successful in 59s Details
Test / test_scrub (push) Successful in 1m2s Details
Test / test_scrub_xor (push) Successful in 36s Details
Test / test_scrub_ec (push) Successful in 38s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 40s Details
Test / test_scrub_pg_size_3 (push) Successful in 49s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m12s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m8s Details
Test / test_heal_csum_32k (push) Successful in 4m55s Details
Test / test_heal_ec (push) Failing after 10m14s Details
Test / test_heal_csum_4k_dmj (push) Successful in 4m59s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m5s Details
Test / test_heal_pg_size_2 (push) Successful in 3m54s Details
Test / test_heal_csum_4k (push) Successful in 3m49s Details
2024-02-21 01:32:06 +03:00
Vitaliy Filippov 3d16cde23c Fix assertions, add small sequential write test
Test / test_snapshot_down_ec (push) Successful in 32s Details
Test / test_splitbrain (push) Successful in 22s Details
Test / test_snapshot_chain (push) Successful in 2m8s Details
Test / test_snapshot_chain_ec (push) Successful in 2m48s Details
Test / test_rebalance_verify_imm (push) Successful in 2m57s Details
Test / test_rebalance_verify (push) Successful in 3m29s Details
Test / test_switch_primary (push) Successful in 36s Details
Test / test_write (push) Successful in 54s Details
Test / test_write_xor (push) Successful in 51s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_rebalance_verify_ec (push) Successful in 3m40s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m20s Details
Test / test_scrub (push) Successful in 1m1s Details
Test / test_scrub_zero_osd_2 (push) Successful in 46s Details
Test / test_scrub_xor (push) Successful in 41s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m0s Details
Test / test_scrub_ec (push) Successful in 58s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m45s Details
Test / test_heal_pg_size_2 (push) Failing after 4m52s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m36s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m33s Details
Test / test_interrupted_rebalance_imm (push) Successful in 1m35s Details
Test / test_interrupted_rebalance (push) Successful in 2m28s Details
Test / test_interrupted_rebalance_ec (push) Successful in 2m30s Details
Test / test_interrupted_rebalance_ec_imm (push) Successful in 2m41s Details
Test / test_heal_ec (push) Failing after 10m20s Details
Test / test_heal_csum_4k_dmj (push) Successful in 4m21s Details
Test / test_heal_csum_32k (push) Successful in 5m15s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m48s Details
Test / test_heal_csum_4k (push) Successful in 5m32s Details
2024-02-20 19:41:48 +03:00
Vitaliy Filippov c6406d67fc Fix journal space_check incorrectly checking for space at the beginning 2024-02-20 19:40:56 +03:00
Vitaliy Filippov f87964861d Release 1.4.6
Test / test_snapshot_ec (push) Successful in 29s Details
Test / test_rm (push) Successful in 18s Details
Test / test_move_reappear (push) Successful in 26s Details
Test / test_snapshot_down (push) Successful in 28s Details
Test / test_snapshot_down_ec (push) Successful in 32s Details
Test / test_splitbrain (push) Successful in 23s Details
Test / test_snapshot_chain (push) Successful in 2m3s Details
Test / test_snapshot_chain_ec (push) Successful in 2m46s Details
Test / test_rebalance_verify_imm (push) Successful in 3m1s Details
Test / test_rebalance_verify (push) Successful in 3m30s Details
Test / test_switch_primary (push) Successful in 38s Details
Test / test_write (push) Successful in 32s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_write_xor (push) Successful in 38s Details
Test / test_rebalance_verify_ec (push) Successful in 4m38s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m57s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m14s Details
Test / test_heal_csum_32k_dmj (push) Successful in 5m21s Details
Test / test_heal_csum_32k (push) Successful in 5m45s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m27s Details
Test / test_scrub (push) Successful in 1m30s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m26s Details
Test / test_scrub_zero_osd_2 (push) Successful in 38s Details
Test / test_scrub_xor (push) Successful in 40s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m8s Details
Test / test_scrub_ec (push) Successful in 1m5s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m49s Details
Test / test_heal_csum_4k (push) Successful in 5m41s Details
Test / test_heal_ec (push) Successful in 4m11s Details
Test / test_heal_pg_size_2 (push) Successful in 4m22s Details
Unwavering stabilization of 1.4.x, continued :-)

- Include the accidentally lost part of 1.4.5 journal trimming fix
- Fix a possible OSD crash with "BUG: Attempt to overwrite used offset"
  which was probably present for long time, but became apparent after
  fixing flapping tests in CI
- Fix remaining flapping tests in CI. It was the first time when tests
  actually passed without retries :-)
2024-02-20 17:01:26 +03:00
Vitaliy Filippov 7048228678 Supposed fix for "BUG: Attempt to overwrite used offset" 2024-02-20 15:56:48 +03:00
Vitaliy Filippov ea73857450 Add asserts to catch "BUG: Attempt to overwrite used offset" 2024-02-20 15:56:48 +03:00
Vitaliy Filippov 6cfe38ec04 Followup to empty cur.oid as stop condition for forced trim fix 2024-02-20 15:56:38 +03:00
Vitaliy Filippov f882c7dd87 Release 1.4.5
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m23s Details
Test / test_rm (push) Successful in 15s Details
Test / test_move_reappear (push) Successful in 21s Details
Test / test_snapshot_down (push) Successful in 26s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 29s Details
Test / test_snapshot_chain (push) Successful in 2m17s Details
Test / test_snapshot_chain_ec (push) Successful in 3m14s Details
Test / test_rebalance_verify_imm (push) Successful in 3m24s Details
Test / test_rebalance_verify (push) Successful in 3m59s Details
Test / test_switch_primary (push) Successful in 35s Details
Test / test_write_xor (push) Successful in 32s Details
Test / test_write_no_same (push) Successful in 13s Details
Test / test_rebalance_verify_ec (push) Successful in 3m46s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m13s Details
Test / test_heal_pg_size_2 (push) Successful in 3m52s Details
Test / test_heal_ec (push) Successful in 5m25s Details
Test / test_heal_csum_32k_dj (push) Successful in 4m24s Details
Test / test_heal_csum_4k_dmj (push) Successful in 4m23s Details
Test / test_heal_csum_4k_dj (push) Successful in 4m17s Details
Test / test_scrub (push) Successful in 38s Details
Test / test_scrub_zero_osd_2 (push) Successful in 29s Details
Test / test_scrub_xor (push) Successful in 30s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 43s Details
Test / test_scrub_ec (push) Successful in 32s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m46s Details
Test / test_heal_csum_4k (push) Successful in 4m4s Details
Test / test_write (push) Successful in 1m38s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m5s Details
Test / test_heal_csum_32k (push) Successful in 4m15s Details
- Fix a write stall caused by incorrect journal trimming introduced in 1.4.4 :)
- Fix PGs sometimes hanging in "starting" state on mass OSD restarts
- Fix a rare crash with "map::at" during OSD pings
- Use new defaults for non-capacitor (desktop) SSDs - improves T1Q256 random write from ~6k iops to ~45k iops
- Make journal_trim_interval configurable
2024-02-16 10:13:33 +03:00
Vitaliy Filippov 26dd863c8d Fix sometimes possible crash on clients.at() during pings 2024-02-16 10:13:33 +03:00
Vitaliy Filippov 2ae859fbc6 Use min/max_flusher_count=32/256, 128M journal and autosync_writes=512 for non-capacitor SSDs by default 2024-02-16 10:13:33 +03:00
Vitaliy Filippov 8389c0f33b Fix PGs sometimes hanging in "starting" state on mass OSD restarts 2024-02-15 23:38:52 +03:00
Vitaliy Filippov 9db2196aef Make journal_trim_interval configurable 2024-02-15 23:38:51 +03:00
Vitaliy Filippov 8d6ae662fe Use empty cur.oid as stop condition for forced trim, not journal_trim_counter 2024-02-15 23:27:17 +03:00
Vitaliy Filippov c777a0041a Release 1.4.4
Test / test_interrupted_rebalance_ec_imm (push) Successful in 1m23s Details
Test / test_move_reappear (push) Successful in 21s Details
Test / test_rm (push) Successful in 16s Details
Test / test_snapshot_down (push) Successful in 30s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 25s Details
Test / test_snapshot_chain (push) Successful in 2m18s Details
Test / test_snapshot_chain_ec (push) Successful in 3m13s Details
Test / test_rebalance_verify_imm (push) Successful in 3m8s Details
Test / test_rebalance_verify (push) Successful in 3m41s Details
Test / test_switch_primary (push) Successful in 36s Details
Test / test_write (push) Successful in 40s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_write_xor (push) Successful in 39s Details
Test / test_rebalance_verify_ec (push) Successful in 4m56s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m21s Details
Test / test_heal_pg_size_2 (push) Successful in 4m15s Details
Test / test_heal_ec (push) Successful in 5m1s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m32s Details
Test / test_heal_csum_32k (push) Successful in 5m38s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m43s Details
Test / test_scrub (push) Successful in 1m31s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m17s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m57s Details
Test / test_scrub_xor (push) Successful in 30s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m7s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 41s Details
Test / test_scrub_ec (push) Successful in 24s Details
Test / test_heal_csum_32k_dmj (push) Successful in 3m56s Details
Test / test_heal_csum_4k (push) Successful in 3m16s Details
A couple of fixes for EC pools

- Fix a segfault possible on partial EC overwrite in 1234 -> 5030 rebalance scenario
- Fix two problems leading to EC pools stalling on rebalance & parallel sudden stops
  of OSDs, for example during a sudden poweroff of a host:
  - Recovery auto-tuning (1.4.0 feature) could apply too large delays and stall
    the EC journal - fixed by limiting delays with a new recovery_tune_sleep_cutoff_us
    parameter (10 seconds by default) and applying recovery pauses before write
    operations, not after them, to not occupy space in the journal for long time
  - Dynamic journal space reservation (1.3.0 feature) wasn't accounting new writes
    when checking the limit so OSDs could still fill the journal fully and stall -
    fixed by including new writes into the limit
- Print etcd dbSize instead of dbSizeInUse in status
2024-02-11 16:23:08 +03:00
Vitaliy Filippov 978bdc128a Apply recovery pause before writes, after commits, and do not apply it to syncs to not block EC pools from functioning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov bb2f395f1e Add cutoff threshold for recovery auto-tuning 2024-02-11 16:13:52 +03:00
Vitaliy Filippov b127da40f7 Add a FIXME about incomplete PGs 2024-02-11 13:42:51 +03:00
Vitaliy Filippov ca34a6047a Fix dynamic journal space reservation: include the new write itself, too 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 38ba76e893 Fix flusher sometimes being unable to trim journal when the flush queue is empty 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 1e3c4edea0 Print etcd dbSize instead of dbSizeInUse in status 2024-02-11 13:42:51 +03:00
Vitaliy Filippov e7ac855b07 Fix that EC segfault (1234 -> 5030 partial overwrite) 2024-02-11 13:42:51 +03:00
Vitaliy Filippov c53357ac45 Add a test for EC segfault with partial overwrite in 1234 -> 5030 rebalance scenario 2024-02-11 13:42:51 +03:00
Vitaliy Filippov 27e9f244ec Release 1.4.3
Test / test_move_reappear (push) Successful in 22s Details
Test / test_rm (push) Successful in 15s Details
Test / test_snapshot_down (push) Successful in 36s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_interrupted_rebalance (push) Successful in 5m3s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_snapshot_chain (push) Successful in 3m1s Details
Test / test_snapshot_chain_ec (push) Successful in 3m13s Details
Test / test_rebalance_verify_imm (push) Successful in 3m0s Details
Test / test_rebalance_verify (push) Successful in 3m29s Details
Test / test_switch_primary (push) Successful in 37s Details
Test / test_write (push) Successful in 44s Details
Test / test_write_xor (push) Successful in 39s Details
Test / test_write_no_same (push) Successful in 16s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m13s Details
Test / test_rebalance_verify_ec (push) Successful in 5m31s Details
Test / test_heal_ec (push) Successful in 4m54s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m25s Details
Test / test_heal_csum_32k (push) Successful in 6m8s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m17s Details
Test / test_scrub (push) Successful in 1m8s Details
Test / test_scrub_zero_osd_2 (push) Successful in 55s Details
Test / test_scrub_xor (push) Successful in 45s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m22s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m11s Details
Test / test_scrub_ec (push) Successful in 46s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m39s Details
Test / test_heal_csum_4k (push) Successful in 6m8s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m15s Details
Test / test_heal_pg_size_2 (push) Successful in 4m41s Details
Hotfix for hotfix O:-)

- "Write stall fix" was incomplete and EC write stalls could
  continue even on 1.4.2. Now they're finally fixed O:-)
- Make monitor ignore statistics of stopped OSDs. Previously if you stopped all
  OSDs the last total I/O numbers would remain the same indefinitely
2024-02-09 00:29:31 +03:00
Vitaliy Filippov 5d3317e4f2 Followup to 1.4.2 write stall fix - sadly, the previous version was not working correctly :)
Test / test_move_reappear (push) Successful in 19s Details
Test / test_snapshot_chain (push) Successful in 1m21s Details
Test / test_snapshot_down (push) Successful in 23s Details
Test / test_snapshot_chain_ec (push) Successful in 1m50s Details
Test / test_snapshot_down_ec (push) Successful in 22s Details
Test / test_splitbrain (push) Successful in 16s Details
Test / test_etcd_fail (push) Successful in 6m42s Details
Test / test_rebalance_verify_imm (push) Successful in 2m19s Details
Test / test_rebalance_verify (push) Successful in 4m7s Details
Test / test_switch_primary (push) Successful in 36s Details
Test / test_write (push) Successful in 35s Details
Test / test_rebalance_verify_ec (push) Successful in 4m6s Details
Test / test_write_no_same (push) Successful in 22s Details
Test / test_write_xor (push) Successful in 1m34s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 6m7s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m7s Details
Test / test_heal_csum_32k_dj (push) Successful in 4m59s Details
Test / test_heal_csum_32k (push) Successful in 5m4s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m59s Details
Test / test_scrub (push) Successful in 1m9s Details
Test / test_scrub_zero_osd_2 (push) Successful in 37s Details
Test / test_scrub_xor (push) Successful in 52s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m12s Details
Test / test_heal_csum_4k (push) Successful in 5m1s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m48s Details
Test / test_scrub_ec (push) Successful in 19s Details
Test / test_interrupted_rebalance (push) Successful in 1m38s Details
Test / test_heal_pg_size_2 (push) Successful in 3m20s Details
Test / test_heal_ec (push) Successful in 3m3s Details
2024-02-08 19:34:29 +03:00
Vitaliy Filippov 016115c0d4 Release 1.4.2
Test / test_rm (push) Successful in 16s Details
Test / test_move_reappear (push) Successful in 20s Details
Test / test_snapshot_down (push) Successful in 25s Details
Test / test_snapshot_down_ec (push) Successful in 39s Details
Test / test_interrupted_rebalance (push) Successful in 4m52s Details
Test / test_splitbrain (push) Successful in 20s Details
Test / test_snapshot_chain (push) Successful in 3m11s Details
Test / test_rebalance_verify_imm (push) Successful in 3m16s Details
Test / test_rebalance_verify (push) Successful in 3m45s Details
Test / test_switch_primary (push) Successful in 36s Details
Test / test_write (push) Successful in 40s Details
Test / test_write_xor (push) Successful in 40s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m8s Details
Test / test_rebalance_verify_ec (push) Successful in 5m57s Details
Test / test_heal_pg_size_2 (push) Successful in 4m22s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m20s Details
Test / test_heal_ec (push) Successful in 5m54s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m24s Details
Test / test_heal_csum_32k (push) Successful in 6m3s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m54s Details
Test / test_scrub_zero_osd_2 (push) Successful in 53s Details
Test / test_scrub (push) Successful in 55s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m14s Details
Test / test_scrub_xor (push) Successful in 1m1s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m50s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 57s Details
Test / test_scrub_ec (push) Successful in 52s Details
Test / test_heal_csum_4k (push) Successful in 5m47s Details
Test / test_snapshot_chain_ec (push) Successful in 1m24s Details
- Log to systemd by default
- Fix excessive autosyncs after every operation with disabled immediate_commit (introduced in 1.1.0)
- Fix a possible write stall with EC due to the lack of OSD wakeup after stabilizing previous writes
- Change sync operation semantics as a final fix to possible write stalls with EC and disabled immediate_commit
- Sync after deleting data in CLI rm / rm-data if immediate_commit is disabled
- Fix OSDs ignoring syncs & autosyncs for delete operations
- Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools)
- Speed up monitor failover - change default etcd_mon_ttl from 30 to 5 seconds
- Speed up operation retries - change default up_wait_retry_interval to 50 ms
- Add patch for libvirt 9.10
2024-02-04 02:23:49 +03:00
Vitaliy Filippov 77c10fd1f8 In fact, do not autosync blockstore when autosync_writes=0
Test / test_move_reappear (push) Successful in 19s Details
Test / test_rm (push) Successful in 14s Details
Test / test_snapshot_down (push) Successful in 24s Details
Test / test_snapshot_down_ec (push) Successful in 25s Details
Test / test_splitbrain (push) Successful in 17s Details
Test / test_snapshot_chain (push) Successful in 1m57s Details
Test / test_snapshot_chain_ec (push) Successful in 2m41s Details
Test / test_rebalance_verify (push) Successful in 3m5s Details
Test / test_rebalance_verify_imm (push) Successful in 2m26s Details
Test / test_switch_primary (push) Successful in 45s Details
Test / test_write (push) Successful in 33s Details
Test / test_write_xor (push) Successful in 33s Details
Test / test_rebalance_verify_ec (push) Successful in 3m42s Details
Test / test_write_no_same (push) Successful in 14s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m57s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m24s Details
Test / test_heal_csum_32k_dj (push) Successful in 4m29s Details
Test / test_heal_csum_4k_dmj (push) Successful in 5m10s Details
Test / test_heal_csum_32k (push) Successful in 5m13s Details
Test / test_scrub (push) Successful in 1m5s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m1s Details
Test / test_scrub_xor (push) Successful in 1m2s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m2s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 57s Details
Test / test_scrub_ec (push) Successful in 50s Details
Test / test_scrub_pg_size_3 (push) Successful in 2m1s Details
Test / test_heal_csum_4k (push) Successful in 4m40s Details
Test / test_interrupted_rebalance (push) Successful in 1m38s Details
Test / test_heal_pg_size_2 (push) Successful in 4m2s Details
Test / test_heal_ec (push) Successful in 5m17s Details
2024-02-03 20:37:36 +03:00
Vitaliy Filippov 581d02e581 Mark secondary OSDs with deletions as dirty to not forget to sync & autosync them
Test / test_change_pg_count (push) Has been cancelled Details
Test / test_rm (push) Has been cancelled Details
Test / test_snapshot_chain (push) Has been cancelled Details
Test / test_snapshot_chain_ec (push) Has been cancelled Details
Test / test_snapshot_down (push) Has been cancelled Details
Test / test_snapshot_down_ec (push) Has been cancelled Details
Test / test_splitbrain (push) Has been cancelled Details
Test / test_rebalance_verify (push) Has been cancelled Details
Test / test_rebalance_verify_imm (push) Has been cancelled Details
Test / test_rebalance_verify_ec (push) Has been cancelled Details
Test / test_rebalance_verify_ec_imm (push) Has been cancelled Details
Test / test_switch_primary (push) Has been cancelled Details
Test / test_write (push) Has been cancelled Details
Test / test_write_xor (push) Has been cancelled Details
Test / test_write_no_same (push) Has been cancelled Details
Test / test_heal_pg_size_2 (push) Has been cancelled Details
Test / test_heal_ec (push) Has been cancelled Details
Test / test_heal_csum_32k_dmj (push) Has been cancelled Details
Test / test_cas (push) Has been cancelled Details
Test / test_heal_csum_32k_dj (push) Has been cancelled Details
Test / test_heal_csum_32k (push) Has been cancelled Details
Test / test_heal_csum_4k_dmj (push) Has been cancelled Details
Test / test_heal_csum_4k_dj (push) Has been cancelled Details
Test / test_heal_csum_4k (push) Has been cancelled Details
Test / test_scrub (push) Has been cancelled Details
Test / test_scrub_zero_osd_2 (push) Has been cancelled Details
Test / test_scrub_xor (push) Has been cancelled Details
Test / test_scrub_pg_size_3 (push) Has been cancelled Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Has been cancelled Details
Test / test_scrub_ec (push) Has been cancelled Details
2024-02-03 20:31:08 +03:00
Vitaliy Filippov f03a9db4d9 Fix OSD space reporting sometimes adding garbage zeros for deleted inodes (causing extra pool/stats etcd keys for deleted pools) 2024-02-03 20:31:08 +03:00
Vitaliy Filippov cb9c30bc31 Sync after sending all deletes to each PG in cli rm-data 2024-02-03 20:31:08 +03:00
Vitaliy Filippov a86a380d20 Fix invalid parsing of autosync_writes in blockstore leading to autosyncs after every operation with disabled immediate_commit :D 2024-02-03 20:31:08 +03:00
Vitaliy Filippov 1cec62d25d Sync only completed writes
Test / test_move_reappear (push) Successful in 21s Details
Test / test_rm (push) Successful in 16s Details
Test / test_snapshot_down (push) Successful in 25s Details
Test / test_snapshot_down_ec (push) Successful in 35s Details
Test / test_splitbrain (push) Successful in 24s Details
Test / test_interrupted_rebalance (push) Successful in 5m14s Details
Test / test_snapshot_chain (push) Successful in 2m50s Details
Test / test_rebalance_verify_imm (push) Successful in 2m47s Details
Test / test_rebalance_verify (push) Successful in 3m42s Details
Test / test_switch_primary (push) Successful in 33s Details
Test / test_write (push) Successful in 42s Details
Test / test_write_xor (push) Successful in 44s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m52s Details
Test / test_write_no_same (push) Successful in 15s Details
Test / test_rebalance_verify_ec (push) Successful in 4m19s Details
Test / test_heal_ec (push) Successful in 6m20s Details
Test / test_heal_csum_32k (push) Successful in 3m29s Details
Test / test_scrub (push) Successful in 1m24s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m11s Details
Test / test_heal_csum_4k_dmj (push) Successful in 4m23s Details
Test / test_scrub_xor (push) Successful in 1m9s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m29s Details
Test / test_heal_csum_4k (push) Successful in 5m36s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m53s Details
Test / test_scrub_ec (push) Successful in 29s Details
Test / test_heal_pg_size_2 (push) Successful in 3m9s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m13s Details
Test / test_heal_csum_32k_dj (push) Successful in 4m17s Details
Test / test_snapshot_chain_ec (push) Successful in 1m25s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Failing after 24s Details
Should be a final remaining fix to EC + non-capacitor (non-immediate-commit) write hangs :).

First it was breaking non-EC ("instantly stable") writes because they sometimes
complete out of order which was leading to the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  BUG: Unexpected dirty_entry 1000000000001:29480000 v65540 unstable state during flush: 0x151

But it is easily fixed by scanning previous and next dirty_entries in mark_stable.
2024-01-27 15:17:22 +03:00
Vitaliy Filippov 1c322b33ed Change default up_wait_retry_interval to 50 ms
Test / test_rm (push) Successful in 14s Details
Test / test_interrupted_rebalance_ec (push) Successful in 3m59s Details
Test / test_snapshot_chain (push) Successful in 1m34s Details
Test / test_snapshot_down (push) Successful in 25s Details
Test / test_snapshot_down_ec (push) Successful in 29s Details
Test / test_splitbrain (push) Successful in 19s Details
Test / test_snapshot_chain_ec (push) Successful in 2m35s Details
Test / test_interrupted_rebalance (push) Successful in 8m15s Details
Test / test_rebalance_verify_imm (push) Successful in 3m54s Details
Test / test_switch_primary (push) Successful in 36s Details
Test / test_write (push) Successful in 35s Details
Test / test_rebalance_verify_ec (push) Successful in 4m48s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m51s Details
Test / test_write_no_same (push) Successful in 14s Details
Test / test_write_xor (push) Failing after 3m9s Details
Test / test_heal_pg_size_2 (push) Successful in 3m55s Details
Test / test_heal_ec (push) Successful in 3m50s Details
Test / test_rebalance_verify (push) Failing after 9m30s Details
Test / test_heal_csum_32k_dmj (push) Failing after 5m40s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m12s Details
Test / test_heal_csum_32k (push) Successful in 6m25s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m56s Details
Test / test_scrub (push) Successful in 1m4s Details
Test / test_scrub_zero_osd_2 (push) Successful in 55s Details
Test / test_scrub_xor (push) Successful in 56s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m19s Details
Test / test_scrub_pg_size_3 (push) Failing after 2m14s Details
Test / test_heal_csum_4k_dj (push) Successful in 5m53s Details
Test / test_scrub_ec (push) Successful in 1m1s Details
Test / test_heal_csum_4k (push) Successful in 5m17s Details
2024-01-26 01:51:08 +03:00
Vitaliy Filippov ba55f91409 Release 1.4.1
Test / test_move_reappear (push) Successful in 22s Details
Test / test_snapshot_chain (push) Successful in 1m27s Details
Test / test_interrupted_rebalance_ec (push) Successful in 4m41s Details
Test / test_snapshot_down (push) Successful in 25s Details
Test / test_snapshot_chain_ec (push) Successful in 2m0s Details
Test / test_splitbrain (push) Successful in 18s Details
Test / test_snapshot_down_ec (push) Successful in 25s Details
Test / test_rebalance_verify_ec (push) Failing after 2m21s Details
Test / test_rebalance_verify_imm (push) Successful in 2m30s Details
Test / test_switch_primary (push) Successful in 39s Details
Test / test_write (push) Successful in 35s Details
Test / test_interrupted_rebalance (push) Failing after 10m8s Details
Test / test_write_xor (push) Successful in 36s Details
Test / test_write_no_same (push) Successful in 17s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 4m4s Details
Test / test_heal_pg_size_2 (push) Successful in 3m55s Details
Test / test_rebalance_verify (push) Successful in 8m31s Details
Test / test_heal_ec (push) Successful in 5m9s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m27s Details
Test / test_heal_csum_32k (push) Successful in 5m42s Details
Test / test_heal_csum_32k_dj (push) Successful in 6m1s Details
Test / test_scrub (push) Successful in 59s Details
Test / test_scrub_zero_osd_2 (push) Successful in 38s Details
Test / test_heal_csum_4k_dmj (push) Successful in 7m5s Details
Test / test_scrub_xor (push) Successful in 58s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m25s Details
Test / test_scrub_ec (push) Failing after 42s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m32s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m38s Details
Test / test_heal_csum_4k (push) Successful in 5m38s Details
- Fix a monitor crash on primary OSD switching introduced in 1.4.0
- Fix "partly outside array bounds" warnings for GCC 12 in cpp-btree
- Fix a realloc memory leak in theory possible with too large listings (OSD_OP_LIST)
2024-01-18 02:31:42 +03:00
Vitaliy Filippov d00d4dbac0 Initialize mod_revision field in etcd_state_client
Test / test_interrupted_rebalance_ec (push) Successful in 2m28s Details
Test / test_rm (push) Successful in 17s Details
Test / test_move_reappear (push) Successful in 29s Details
Test / test_snapshot_down (push) Successful in 26s Details
Test / test_snapshot_down_ec (push) Successful in 26s Details
Test / test_splitbrain (push) Successful in 16s Details
Test / test_snapshot_chain (push) Successful in 2m0s Details
Test / test_rebalance_verify_imm (push) Successful in 2m28s Details
Test / test_rebalance_verify (push) Successful in 3m0s Details
Test / test_rebalance_verify_ec (push) Successful in 3m14s Details
Test / test_write_no_same (push) Successful in 13s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m7s Details
Test / test_heal_pg_size_2 (push) Successful in 3m33s Details
Test / test_heal_ec (push) Successful in 4m40s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m40s Details
Test / test_heal_csum_32k (push) Successful in 6m8s Details
Test / test_scrub (push) Successful in 1m4s Details
Test / test_scrub_zero_osd_2 (push) Successful in 47s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m33s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m28s Details
Test / test_scrub_xor (push) Successful in 44s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m2s Details
Test / test_scrub_ec (push) Successful in 42s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m38s Details
Test / test_heal_csum_4k (push) Successful in 5m56s Details
Test / test_interrupted_rebalance (push) Successful in 1m53s Details
Test / test_snapshot_chain_ec (push) Failing after 3m17s Details
Test / test_write (push) Failing after 3m15s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m6s Details
Test / test_write_xor (push) Failing after 3m11s Details
2024-01-13 01:30:28 +03:00
Vitaliy Filippov 5d9d6f32a0 Fix common realloc memory leak mistakes found by cppcheck 2024-01-13 01:30:28 +03:00
Vitaliy Filippov 5280d1d561 Release 1.4.0
Test / test_snapshot (push) Successful in 26s Details
Test / test_snapshot_ec (push) Successful in 26s Details
Test / test_rm (push) Successful in 16s Details
Test / test_move_reappear (push) Successful in 24s Details
Test / test_snapshot_down (push) Successful in 26s Details
Test / test_snapshot_down_ec (push) Successful in 30s Details
Test / test_splitbrain (push) Successful in 28s Details
Test / test_snapshot_chain (push) Successful in 2m41s Details
Test / test_rebalance_verify_imm (push) Successful in 2m48s Details
Test / test_rebalance_verify (push) Successful in 3m28s Details
Test / test_write (push) Successful in 47s Details
Test / test_write_no_same (push) Successful in 14s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 3m5s Details
Test / test_rebalance_verify_ec (push) Successful in 3m41s Details
Test / test_heal_pg_size_2 (push) Successful in 3m45s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m52s Details
Test / test_heal_ec (push) Successful in 5m11s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m42s Details
Test / test_heal_csum_32k (push) Successful in 5m56s Details
Test / test_scrub (push) Successful in 1m25s Details
Test / test_scrub_zero_osd_2 (push) Successful in 1m18s Details
Test / test_scrub_xor (push) Successful in 42s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m49s Details
Test / test_heal_csum_4k_dj (push) Successful in 6m32s Details
Test / test_heal_csum_4k (push) Successful in 5m31s Details
Test / test_scrub_ec (push) Successful in 50s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m2s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m5s Details
Test / test_snapshot_chain_ec (push) Successful in 1m21s Details
Test / test_write_xor (push) Successful in 36s Details
New features:
- Intelligent recovery/rebalance speed auto-tuning to reduce its impact on clients (see README -> Features)
- Auto-restoration of dead VDUSE daemons in CSI plugin
- Add vitastor-disk update-sb command
- Update QEMU for Debian Bookworm to 8.1 and use it for CSI plugin

Bug fixes:
- Fix pools SOMETIMES staying inactive after stopping a node due to OSDs not reacting
  to PG state changes caused by incorrect full reload of state from etcd on reconnection
- Make monitors retry pool configuration changes quickier which fixes them being unable
  to apply changes when an ongoing rebalance is quickly making a lot of PGs clean
- Fix CSI plugin not accepting array of strings as etcd address in /etc/vitastor/vitastor.conf
- Allow multiple interfaces with the same IP address, for "simple routed" full mesh network
- Do not ignore loopback addresses for OSD network (to make ECMP setups with frr possible)
- Fix a rare client crash during OSD reconnections
- Only treat data partitions as existing OSDs in vitastor-disk prepare
- Remove etcd parameter from default command examples
- Fix reported free space sometimes changing non-immediately after deletion of data from OSDs
- Fix a possible OSD crash on print_slow when bs_op is NULL
- Use the same etcd_ws_keepalive_interval in mon as in OSD
- Fix mon not using values from config when /config/global is not present
- Remove pve-storage-portal-dns-list format for vitastor_etcd_address
- Parse log_level in cluster_client
- Fix vitastor-nbd image existence check not working because of non-zeroed inode_watch fields
- Do not warn on EPIPE in client unless log_level is raised explicitly
- Fix incorrect error in CSI when searching for the device in /sys
- Remove 2 last prints to stdout in etcd_state_client
- Fix a possible OSD crash when checking corrupted journal entries
2024-01-12 01:28:33 +03:00
Vitaliy Filippov 2f228fa96a Only treat data partitions as existing OSDs in vitastor-disk prepare
Test / test_interrupted_rebalance_ec (push) Successful in 2m40s Details
Test / test_rm (push) Successful in 31s Details
Test / test_move_reappear (push) Successful in 39s Details
Test / test_snapshot_down (push) Successful in 26s Details
Test / test_interrupted_rebalance (push) Successful in 4m42s Details
Test / test_snapshot_down_ec (push) Successful in 26s Details
Test / test_splitbrain (push) Successful in 22s Details
Test / test_snapshot_chain (push) Failing after 3m17s Details
Test / test_snapshot_chain_ec (push) Failing after 3m13s Details
Test / test_rebalance_verify_imm (push) Successful in 2m51s Details
Test / test_write (push) Successful in 37s Details
Test / test_rebalance_verify_ec_imm (push) Successful in 2m37s Details
Test / test_write_no_same (push) Successful in 18s Details
Test / test_rebalance_verify_ec (push) Successful in 3m20s Details
Test / test_write_xor (push) Failing after 3m8s Details
Test / test_rebalance_verify (push) Successful in 8m20s Details
Test / test_heal_pg_size_2 (push) Successful in 4m17s Details
Test / test_heal_ec (push) Successful in 4m59s Details
Test / test_heal_csum_32k_dmj (push) Successful in 4m15s Details
Test / test_heal_csum_32k_dj (push) Successful in 5m35s Details
Test / test_heal_csum_32k (push) Successful in 6m47s Details
Test / test_heal_csum_4k_dmj (push) Successful in 6m49s Details
Test / test_scrub (push) Successful in 1m2s Details
Test / test_scrub_zero_osd_2 (push) Successful in 45s Details
Test / test_scrub_xor (push) Successful in 40s Details
Test / test_heal_csum_4k_dj (push) Successful in 7m16s Details
Test / test_scrub_pg_size_6_pg_minsize_4_osd_count_6_ec (push) Successful in 1m9s Details
Test / test_scrub_ec (push) Successful in 45s Details
Test / test_heal_csum_4k (push) Successful in 5m26s Details
Test / test_scrub_pg_size_3 (push) Successful in 1m38s Details
2023-12-31 11:46:47 +03:00
Vitaliy Filippov a6ab54b1ba Do not allow negative util_low/high 2023-12-31 01:23:17 +03:00