Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
The new protocol is almost compatible - it has bitmaps, but also it has
a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients
compatible, but for now I just assume nobody needs it.
If I'm wrong and anybody requests to upgrade their production 0.5.x system
to 0.6.x I'll fix it.
The bug reproduced if fio was temporarily stopped with SIGSTOP
during write test and then resumed after 10 seconds. In this case
"pings" were failed for all clients and fio process crashed with
'use-after-free' in keepalive_timer. It happened because it called
stop_client while having a live iterator to the map.
Previous implementation didn't respect write ordering and could lead
to corrupted data when restarting writes after an OSD outage
Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour