Commit Graph

519 Commits (4a17a61d1faefff272ecc59572647b0ea8e832f4)

Author SHA1 Message Date
Vitaliy Filippov 226f5a2945 Allow to override block_size in fio_sec_osd 2020-06-07 00:10:13 +03:00
Vitaliy Filippov 2187d06eac Add a parameter to pass the initial config to client 2020-06-07 00:10:12 +03:00
Vitaliy Filippov c573bc6bb3 (Probably almost) implement cluster client 2020-06-07 00:09:36 +03:00
Vitaliy Filippov 2f6cf605a1 Rename cluster_client to osd_messenger 2020-06-04 12:57:54 +03:00
Vitaliy Filippov 05ea97119f Fix BS_OP_LIST to account for deleted objects: only list the newest stable entry of each object
This allows list responses to be unaffected by journal flushes, which, in turn,
fixes PG peering when a peer OSD is replaying journal and journal contains deletions
2020-06-02 23:52:48 +03:00
Vitaliy Filippov 571be0f380 Make deletions instantly stable
"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.

So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.

By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
2020-06-02 23:45:22 +03:00
Vitaliy Filippov 985c309d7f Remove duplicate code between blockstore_{rollback,stable} and blockstore_init 2020-06-02 20:37:00 +03:00
Vitaliy Filippov a56f8cd14e Simplify handle_primary_subop() arguments 2020-06-02 18:44:23 +03:00
Vitaliy Filippov 46e111272f Replace assert(this_it == cur_op) with if() for the case of PG repeering 2020-06-02 14:30:57 +03:00
Vitaliy Filippov 165c204555 Fix BS_OP_DELETE (the implementation was untested up to this point) 2020-06-02 14:26:01 +03:00
Vitaliy Filippov af5cd45071 Oh crap, got SIGPIPE. Add MSG_NOSIGNAL 2020-06-02 11:41:08 +03:00
Vitaliy Filippov c3fe9ad0d1 Fix rebalancing writes (add a forgotten state resume) 2020-06-02 01:26:14 +03:00
Vitaliy Filippov 0fcdeae18b Do not die if a peer is already stopped on flush error 2020-06-01 23:07:08 +03:00
Vitaliy Filippov e6a4b634f8 Fix possible write stall
The stall occurred during fio Q=128 random write tests with low flusher_count (4).
It was caused by flushers being unable to flush the beginning of the journal
because it contained older writes to an object that also had writes in the very end
of the journal, after dirty_start.
2020-06-01 16:18:23 +03:00
Vitaliy Filippov c22e096943 Output journal offsets in debug trace in hex, add detailed "still waiting" messages 2020-06-01 16:18:19 +03:00
Vitaliy Filippov 45b1c2fbf1 Fix canceling of write operations on PG re-peer (which led to use-after-free, too...) 2020-06-01 16:18:14 +03:00
Vitaliy Filippov 3469bead67 Protect "delete this" with a stack refcounter
(to fix use-after-free, too, but "delete this" was a time bomb anyway)
2020-06-01 16:18:09 +03:00
Vitaliy Filippov 3a5d488f19 Fix use-after-free in osd_flush.cpp 2020-06-01 01:56:24 +03:00
Vitaliy Filippov 73e4e30b1f Auto-generate C++ header dependencies 2020-06-01 00:25:25 +03:00
Vitaliy Filippov 5feff1ffb9 Slightly cleanup socket send/receive code 2020-05-31 15:03:27 +03:00
Vitaliy Filippov b466e215f0 Fix queued OP_SYNC execution 2020-05-27 13:55:25 +03:00
Vitaliy Filippov 36f995367f Fix bind_address reporting 2020-05-27 10:58:40 +03:00
Vitaliy Filippov 0aca6e9ca8 Extract peer connect and read-write loop into a separate file (to be shared with the client library) 2020-05-26 22:11:30 +03:00
Vitaliy Filippov fa98be6bc0 Allow to specify multiple etcd addresses 2020-05-25 16:30:05 +03:00
Vitaliy Filippov 256a7f2667 Free op->bs_op manually 2020-05-25 15:31:22 +03:00
Vitaliy Filippov 79bf57b6e2 Allow to override pg_stripe_size 2020-05-25 15:31:22 +03:00
Vitaliy Filippov 53f6aba3e6 Die when journal_sector_buffer_count is too small 2020-05-24 17:26:47 +03:00
Vitaliy Filippov 36595eb669 Print "Ran out of journal sector buffers" warning 2020-05-24 16:48:50 +03:00
Vitaliy Filippov e09d0e0678 Several bug fixes
- Do not block flock() requests
- Fix stop_client(0) attempts leading to std::bad_function_call
- Fix degraded writes crashing due to an unset stripes[i].missing (at least with a missing parity device)
- Fix recovery B/W reporting
2020-05-24 01:51:35 +03:00
Vitaliy Filippov d1602b50b3 Fix BS_OP_ROLLBACK removing an incorrect version
Instead of only removing versions with oid == X and version > Y it was
also removing the previous version in list (with the previous oid or
with version == Y)
2020-05-24 01:51:28 +03:00
Vitaliy Filippov 7df384031a Re-peer PGs after stopping the peer
Fixes the bug where two peers killed at once have lead to PG state PG_DEGRADED|PG_HAS_INCOMPLETE instead of PG_INCOMPLETE
2020-05-23 18:45:12 +03:00
Vitaliy Filippov e614a98543 Add a sad FIXME :-) 2020-05-23 15:43:37 +03:00
Vitaliy Filippov 01dd3ef89e Fix timerfd_manager triggering of multiple times at the same time 2020-05-23 15:43:37 +03:00
Vitaliy Filippov cdccc23aff Print [OSD $osd_num] in stats, print B/W only for ops that log bytes 2020-05-23 15:43:37 +03:00
Vitaliy Filippov 700428829a Fix autosync_interval default not setting when autosync_interval is skipped in config 2020-05-23 15:43:37 +03:00
Vitaliy Filippov 6488d0044a Ignore EPOLL_CTL_DEL ENOENT, fix detection of the rollback version 2020-05-23 15:43:37 +03:00
Vitaliy Filippov 393fe75900 Fix creepy (osd_op_t*)(long) casts 2020-05-23 15:43:37 +03:00
Vitaliy Filippov f036eecf1c Fix osd_rmw object recovery case (len==0) 2020-05-23 15:43:37 +03:00
Vitaliy Filippov e56909fb45 Remove tv_send (unused) and timerfd_interval from blockstore 2020-05-22 15:57:08 +03:00
Vitaliy Filippov fac75b0b57 Handle reweights in mon 2020-05-22 12:52:27 +03:00
Vitaliy Filippov 9f842ec9a5 Remove connect callback because it is always the same 2020-05-22 12:45:12 +03:00
Vitaliy Filippov f6a01a4819 Extract "state-watching" etcd client into a separate file 2020-05-22 12:38:40 +03:00
Vitaliy Filippov 6202260018 Extract HTTP client functions from osd_t 2020-05-21 11:39:01 +03:00
Vitaliy Filippov a61ede9951 Remove io_uring usage from osd_http and timerfd_manager
For better future interoperability with external event loops such as QEMU's one
2020-05-21 01:25:38 +03:00
Vitaliy Filippov f57731f8ca Calculate total stats in the monitor 2020-05-15 01:37:17 +03:00
Vitaliy Filippov 19f25c7cd5 Handle integer overflow of the op_stat_count 2020-05-15 01:37:17 +03:00
Vitaliy Filippov 2c3e84cc41 Implement stop_all_pgs() 2020-05-15 01:37:17 +03:00
Vitaliy Filippov 7bda66b866 Do not crash when optimising PGs in an undersized cluster 2020-05-15 01:29:15 +03:00
Vitaliy Filippov b467d0559f Begin node.js storage monitor service 2020-05-15 01:29:15 +03:00
Vitaliy Filippov c2c2eefea4 Duplicate host in osd/state and osd/stats, take PGs from /config/pgs.items 2020-05-15 01:29:15 +03:00