Commit Graph

497 Commits (e41bee72a56b84be54666fff450962400b05bd47)

Author SHA1 Message Date
Vitaliy Filippov caa01c6aaf Acquire etcd leases, prevent starting two OSDs with the same number 2020-04-25 01:35:52 +03:00
Vitaliy Filippov d398ddfd3b Use snake_case for etcd requests 2020-04-25 01:35:52 +03:00
Vitaliy Filippov 0f2b8dbf6f Use a single timerfd_manager for all timers 2020-04-25 01:35:49 +03:00
Vitaliy Filippov 4f42e9659e Use etcd instead of Consul 2020-04-24 01:03:55 +03:00
Vitaliy Filippov 7cf71a8031 Fix timerfd_manager: remove timer, then call callback 2020-04-21 12:45:18 +03:00
Vitaliy Filippov 9d22559bcf Start peering immediately when loading PGs 2020-04-21 02:27:13 +03:00
Vitaliy Filippov 8c03e3ebab Lock Blockstore devices exclusively by default 2020-04-21 01:59:11 +03:00
Vitaliy Filippov 2a640ba2e8 Remove range port selection (leads to races) 2020-04-21 00:10:59 +03:00
Vitaliy Filippov 6a21ea207e Check peer config (at least, number) after connecting 2020-04-21 00:08:54 +03:00
Vitaliy Filippov 642802b595 Auto-select port numbers 2020-04-20 17:45:27 +03:00
Vitaliy Filippov ff38b464a5 Add consul & connect timeouts, report state before loading PGs, move init_primary to osd_cluster 2020-04-20 15:43:07 +03:00
Vitaliy Filippov 663153713b Reconnect to peers after connecting drops 2020-04-19 01:01:26 +03:00
Vitaliy Filippov dc57c5c362 Report PG states again, clear PG history on reaching active+clean 2020-04-19 00:48:23 +03:00
Vitaliy Filippov f95299b769 Take PG history into account when starting PGs 2020-04-19 00:20:18 +03:00
Vitaliy Filippov 9126ffb0f9 Fix PG loading - now it works, at least once 2020-04-17 02:33:44 +03:00
Vitaliy Filippov 2a8e40835e Fix reporting to Consul, report even if we are purely secondary 2020-04-17 01:59:06 +03:00
Vitaliy Filippov 309486d746 Implement loading PGs from Consul (in theory) 2020-04-16 23:22:32 +03:00
Vitaliy Filippov 582f485578 Extract http & getifaddr_list into a separate file 2020-04-15 15:47:06 +03:00
Vitaliy Filippov 089b4eb208 Retry consul connection attempts and then die 2020-04-15 15:33:18 +03:00
Vitaliy Filippov d78ce509c6 Add simple timer manager 2020-04-15 13:41:44 +03:00
Vitaliy Filippov f3a7ccff50 Use 4K blockstore block by default, use MEM_ALIGNMENT in osd code 2020-04-14 19:19:56 +03:00
Vitaliy Filippov 37b27c3025 Implement basic OSD status reporting to Consul 2020-04-14 14:52:06 +03:00
Vitaliy Filippov edf6d6f897 Fix http_request 2020-04-12 02:08:00 +03:00
Vitaliy Filippov d11e8dcb5e Do not flush or recover in readonly mode 2020-04-11 12:06:18 +03:00
Vitaliy Filippov dd02bc1c44 Add base64 implementation 2020-04-11 12:06:18 +03:00
Vitaliy Filippov 298b013eae Add simple http request function 2020-04-11 12:05:58 +03:00
Vitaliy Filippov 0880a77c1a 2 FIXME for the future 2020-04-06 00:55:47 +03:00
Vitaliy Filippov aa849ea07b Add a test for missing chunk overwrite 2020-04-05 16:14:03 +03:00
Vitaliy Filippov d59be0e8b4 Delete misplaced chunks after moving the object, reset object state in primary_write 2020-04-05 15:51:22 +03:00
Vitaliy Filippov cf7de0f181 (Almost) Implement misplaced recovery, integrating it into calc_rmw() 2020-04-05 15:50:53 +03:00
Vitaliy Filippov 6212195440 Implement parallel recovery 2020-04-04 19:23:12 +03:00
Vitaliy Filippov dfb6e15eaa Implement graceful stopping of PGs 2020-04-03 13:03:42 +03:00
Vitaliy Filippov afe2e76c87 Implement regular automatic syncs, split osd_t constructor into some methods 2020-04-02 22:16:46 +03:00
Vitaliy Filippov 0f43f6d3f6 Fix crashes, print some stats
Notably:
- fix the `delete op` inside lambda callback crash (it frees the lambda itself
  which results in use-after-free with g++)
- fix stop_client() reenterability
- fix a bug in the blockstore layer which resulted in always returning version=0
  for zero-length reads
- change error codes for blockstore_stabilize
2020-03-31 17:55:31 +03:00
Vitaliy Filippov 92c800bb64 Forget unstable writes when re-peering, rename parity_block_size -> pg_stripe_size, pg_parity_size -> pg_block_size 2020-03-31 02:09:25 +03:00
Vitaliy Filippov 8a8b619875 Handle secondary OSD connection errors [in theory] 2020-03-30 19:51:34 +03:00
Vitaliy Filippov 43fe1d88e7 Fix memory leaks with subops, fix recovery crashes 2020-03-28 19:09:20 +03:00
Vitaliy Filippov 1b30120918 Fix stripe reconstruction in recovery, only write modified object parts 2020-03-28 13:58:42 +03:00
Vitaliy Filippov c0a22d825d Fix degraded object recovery (it seems to work now) 2020-03-25 02:17:41 +03:00
Vitaliy Filippov 7acfc95f75 CONFIG_HAVE_GETTID 2020-03-25 01:20:20 +03:00
Vitaliy Filippov 250f22c0b6 Implement basic degraded object recovery (integrated into primary_write) 2020-03-25 01:17:50 +03:00
Vitaliy Filippov dbd8418798 Reply using a single finish_op() method, allow to call OSD ops from inside the OSD 2020-03-24 00:18:52 +03:00
Vitaliy Filippov 036f4c5bf3 Fix unstable flushing, include extra OSDs with old object versions in osd_set 2020-03-23 20:28:47 +03:00
Vitaliy Filippov fd8e1a8418 Slightly reorganize object state check code 2020-03-23 00:42:17 +03:00
Vitaliy Filippov a08e0bfacd Treat misplaced and degraded as separate state parts 2020-03-23 00:40:31 +03:00
Vitaliy Filippov ddc3e927d3 Solve it in integers 2020-03-20 13:58:54 +03:00
Vitaliy Filippov 2aa605f2bb Do not check 2020-03-20 13:38:35 +03:00
Vitaliy Filippov 18915b264a Extract to .pm + fix all_combinations 2020-03-19 21:35:47 +03:00
Vitaliy Filippov 60f795e7eb Add lp_solve based data distribution optimizer 2020-03-19 17:23:24 +03:00
Vitaliy Filippov 3a4279adbf Hash-based PG distribution experiments 2020-03-17 18:52:39 +03:00