Commit Graph

442 Commits (f460d8c1c8c39cf9cd328c5ed13deeec67bcd360)

Author SHA1 Message Date
Vitaliy Filippov 6022f28dc9 Add pseudo-random PG generation 2020-07-07 23:13:07 +03:00
Vitaliy Filippov 9d10a4d057 Support arbitrary pg_size in LPOptimizer 2020-07-05 20:28:05 +03:00
Vitaliy Filippov ec7acc8f3a Add WRITE_STABLE operation for future replication support 2020-07-05 01:48:02 +03:00
Vitaliy Filippov 416a80b099 Make blockstore object state a combination of type and workflow 2020-07-04 22:20:32 +03:00
Vitaliy Filippov a7929931eb Implement PG epochs to prevent the "version split"
The "version split" is when:
- A block is written to 1 OSD out of 3, all of them die
- OSDs 2 and 3 come up, the same block is written to both of them
- The remaining OSD comes up. Now all 3 OSDs have the same version of the same object,
  but with different data.
2020-07-04 00:55:27 +03:00
Vitaliy Filippov e680d6c1c3 Rename reconstruct_stripe and calc_rmw_parity to indicate that they are only for XOR N+1 2020-06-30 10:40:43 +03:00
Vitaliy Filippov 9b33f598d3 Fix two more cluster client bugs
1) Sync could delete an unfinished write due to the lack of ordering (fixed by introducing syncing_writes)
2) Writes could be postponed indefinitely due to bad resuming of operations after a sync
2020-06-27 02:13:35 +03:00
Vitaliy Filippov 592bcd3699 Fix QEMU driver bugs (QEMU and qemu-img now work! hooray!) 2020-06-26 18:25:43 +03:00
Vitaliy Filippov 5e1e39633d Implement QEMU block driver 2020-06-25 11:59:43 +03:00
Vitaliy Filippov 41c2655edd Disconnect sockets when read returns zero 2020-06-24 01:32:19 +03:00
Vitaliy Filippov d68370304e Support iovecs in cluster_client_t 2020-06-24 01:31:48 +03:00
Vitaliy Filippov a22d9f38aa Only use EPOLLOUT while connecting 2020-06-23 20:18:31 +03:00
Vitaliy Filippov 8736b3ad32 Add destructors, make ringloop optional in cluster_client_t 2020-06-23 20:10:33 +03:00
Vitaliy Filippov 62343c8022 Allow to turn synchronous recvmsg/sendmsg on with a config option 2020-06-23 01:15:07 +03:00
Vitaliy Filippov 9abaf5b735 Use epoll_manager in osd 2020-06-20 01:28:18 +03:00
Vitaliy Filippov badf68c039 Support iovecs for read operations 2020-06-19 19:47:05 +03:00
Vitaliy Filippov 0f6d193d73 Postpone op callbacks to the end of handle_read(), fix a bug where primary OSD could reply -EPIPE with data to a read operation 2020-06-16 01:36:38 +03:00
Vitaliy Filippov 27ee14a4e6 Fix bugs in cluster_client 2020-06-16 00:08:45 +03:00
Vitaliy Filippov 64afec03ec In theory, implement syncs and replay for the non-immediate commit mode 2020-06-15 00:04:16 +03:00
Vitaliy Filippov 4dde8b8a42 Oops, fix fio_sec_osd block_order parsing 2020-06-09 00:52:00 +03:00
Vitaliy Filippov f5ccb154af Benchmark reads in stub_bench, too 2020-06-08 01:54:44 +03:00
Vitaliy Filippov 73c80e2c39 Move accept_connections() to osd_messenger_t, add a simple uring OSD stub 2020-06-08 01:32:16 +03:00
Vitaliy Filippov 437dc5b630 Implement a FIO engine for testing cluster I/O 2020-06-07 00:30:15 +03:00
Vitaliy Filippov 226f5a2945 Allow to override block_size in fio_sec_osd 2020-06-07 00:10:13 +03:00
Vitaliy Filippov 2187d06eac Add a parameter to pass the initial config to client 2020-06-07 00:10:12 +03:00
Vitaliy Filippov c573bc6bb3 (Probably almost) implement cluster client 2020-06-07 00:09:36 +03:00
Vitaliy Filippov 2f6cf605a1 Rename cluster_client to osd_messenger 2020-06-04 12:57:54 +03:00
Vitaliy Filippov 05ea97119f Fix BS_OP_LIST to account for deleted objects: only list the newest stable entry of each object
This allows list responses to be unaffected by journal flushes, which, in turn,
fixes PG peering when a peer OSD is replaying journal and journal contains deletions
2020-06-02 23:52:48 +03:00
Vitaliy Filippov 571be0f380 Make deletions instantly stable
"2-phase" (write->stabilize) process is pointless for deletions because it
doesn't protect us from incomplete objects. This happens because it removes
the version information from metadata after stabilization. Deletions require
"3-phase" process with a potentially very long 3rd phase.

So, deletions will be allowed to generate degraded and incomplete objects,
and for it to not affect users' ability to delete something, the cluster
will allow to delete whole inodes while storing a list of them in etcd.
Proper TRIM will be impossible until the implementation of the aforementioned
"3-phase" process, though.

By the way, this change also fixes a possible write stall after rebalancing
which was caused by the lack of "stabilize delete" operations.
2020-06-02 23:45:22 +03:00
Vitaliy Filippov 985c309d7f Remove duplicate code between blockstore_{rollback,stable} and blockstore_init 2020-06-02 20:37:00 +03:00
Vitaliy Filippov a56f8cd14e Simplify handle_primary_subop() arguments 2020-06-02 18:44:23 +03:00
Vitaliy Filippov 46e111272f Replace assert(this_it == cur_op) with if() for the case of PG repeering 2020-06-02 14:30:57 +03:00
Vitaliy Filippov 165c204555 Fix BS_OP_DELETE (the implementation was untested up to this point) 2020-06-02 14:26:01 +03:00
Vitaliy Filippov af5cd45071 Oh crap, got SIGPIPE. Add MSG_NOSIGNAL 2020-06-02 11:41:08 +03:00
Vitaliy Filippov c3fe9ad0d1 Fix rebalancing writes (add a forgotten state resume) 2020-06-02 01:26:14 +03:00
Vitaliy Filippov 0fcdeae18b Do not die if a peer is already stopped on flush error 2020-06-01 23:07:08 +03:00
Vitaliy Filippov e6a4b634f8 Fix possible write stall
The stall occurred during fio Q=128 random write tests with low flusher_count (4).
It was caused by flushers being unable to flush the beginning of the journal
because it contained older writes to an object that also had writes in the very end
of the journal, after dirty_start.
2020-06-01 16:18:23 +03:00
Vitaliy Filippov c22e096943 Output journal offsets in debug trace in hex, add detailed "still waiting" messages 2020-06-01 16:18:19 +03:00
Vitaliy Filippov 45b1c2fbf1 Fix canceling of write operations on PG re-peer (which led to use-after-free, too...) 2020-06-01 16:18:14 +03:00
Vitaliy Filippov 3469bead67 Protect "delete this" with a stack refcounter
(to fix use-after-free, too, but "delete this" was a time bomb anyway)
2020-06-01 16:18:09 +03:00
Vitaliy Filippov 3a5d488f19 Fix use-after-free in osd_flush.cpp 2020-06-01 01:56:24 +03:00
Vitaliy Filippov 73e4e30b1f Auto-generate C++ header dependencies 2020-06-01 00:25:25 +03:00
Vitaliy Filippov 5feff1ffb9 Slightly cleanup socket send/receive code 2020-05-31 15:03:27 +03:00
Vitaliy Filippov b466e215f0 Fix queued OP_SYNC execution 2020-05-27 13:55:25 +03:00
Vitaliy Filippov 36f995367f Fix bind_address reporting 2020-05-27 10:58:40 +03:00
Vitaliy Filippov 0aca6e9ca8 Extract peer connect and read-write loop into a separate file (to be shared with the client library) 2020-05-26 22:11:30 +03:00
Vitaliy Filippov fa98be6bc0 Allow to specify multiple etcd addresses 2020-05-25 16:30:05 +03:00
Vitaliy Filippov 256a7f2667 Free op->bs_op manually 2020-05-25 15:31:22 +03:00
Vitaliy Filippov 79bf57b6e2 Allow to override pg_stripe_size 2020-05-25 15:31:22 +03:00
Vitaliy Filippov 53f6aba3e6 Die when journal_sector_buffer_count is too small 2020-05-24 17:26:47 +03:00