Vitaliy Filippov
28bd94d2c2
Make diagnostics slightly better
2021-07-18 01:24:38 +03:00
Vitaliy Filippov
148ff04aa8
Do not lose flusher queue entries when an "older object rescan" happens in parallel with flushing of an older version of another object
2021-07-18 01:20:54 +03:00
JiangYu
e86df4a2a2
fix BLOCKSTORE_DEBUG, error: ‘dirty_it’ was not declared in this scope
...
Signed-off-by: JiangYu <lnsyyj@hotmail.com>
2021-07-18 00:46:05 +08:00
Vitaliy Filippov
e74af9745e
Print journal flusher diagnostics on slow ops
2021-07-17 16:13:41 +03:00
Vitaliy Filippov
0e0509e3da
Dump op states in slow operation log
2021-07-16 01:58:50 +03:00
Vitaliy Filippov
cb282d25e0
Release 0.6.5
...
- Basic support for OpenStack: Cinder driver, patches for Nova and libvirt
- Add missing "image" and "config_path" QEMU options
- Calculate aggregate per-pool statistics in monitor
- Implement writes with Check-And-Set semantics
- Add a C wrapper library with public header
2021-07-10 11:01:21 +03:00
Vitaliy Filippov
b52dd6843a
Rename qemu_rbd_unescape and qemu_rbd_next_tok to *_vitastor_*
2021-07-03 23:14:44 +03:00
Vitaliy Filippov
b66160a7ad
Aggregate per-pool statistics in mon
2021-07-03 23:14:44 +03:00
Vitaliy Filippov
aad7792d3f
Check for loops in parent inode chains
2021-06-20 00:23:03 +03:00
Vitaliy Filippov
6ca8afffe5
Add CAS version parameter to the C wrapper
2021-06-19 01:00:52 +03:00
Vitaliy Filippov
511a89948b
Rework qemu_proxy into a C wrapper library with public header
2021-06-19 00:39:11 +03:00
Vitaliy Filippov
3de553ecd7
Add a test for CAS write operation
2021-06-15 00:12:35 +03:00
Vitaliy Filippov
891250d355
Implement CAS writes
...
From now on, reads will return the server-side object version numbers
and writes and deletes will have an additional "version" parameter
which, if set to a non-zero value, will be atomically compared with
the current version of the object plus 1 and the modification will
fail if it doesn't match.
This feature opens the road to correct online flattening of snapshot
layers and other interesting things.
2021-06-15 00:12:35 +03:00
Vitaliy Filippov
f9fe72d40a
Release 0.6.4
...
- Implement a basic Kubernetes CSI driver
- Minor fixes for vitastor-nbd
- Fix build without RDMA broken in 0.6.3
2021-05-16 01:38:01 +03:00
Vitaliy Filippov
eaac1fc5d1
Log to stderr in etcd_state_client, too
2021-05-16 01:09:25 +03:00
Vitaliy Filippov
57be1923d3
Daemonize NBD_DO_IT process, correctly cleanup unmounted NBD clients
2021-05-16 01:09:25 +03:00
Vitaliy Filippov
c467acc388
Fix /v3 appendage to etcd URLs without /v3
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
bf591ba3ee
Fix nbd module load check
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
699a0fbbc7
Log to stderr instead of stdout in client
2021-05-15 19:22:24 +03:00
Vitaliy Filippov
6b2dd50f27
Fix build without RDMA
2021-05-08 18:20:43 +03:00
Vitaliy Filippov
caf2f3c56f
Release 0.6.3
...
- RDMA support
- Client performance optimisations (4k randread ~120k -> ~180k on 1 core)
- JSON configuration file (/etc/vitastor/vitastor.conf) support
- Bug fixes
2021-05-02 17:47:43 +03:00
Vitaliy Filippov
9174f188b1
Build packages with libibverbs
...
For CentOS 7 it also requires newer rdma-core as CentOS 7's native version doesn't have
implicit ODP support. The updated version is already uploaded into the vitastor repo.
2021-05-02 17:47:16 +03:00
Vitaliy Filippov
d3978c6d0e
Do not print RDMA connection messages when log_level=0
...
By the way, it's 1 by default in the OSD, so these messages will still be there in OSD logs
2021-05-01 00:26:09 +03:00
Vitaliy Filippov
4a7365660d
Do not wait for down OSDs during sync
...
Fixes a hang introduced in 0.5.11 in the non-immediate_commit mode
2021-05-01 00:26:07 +03:00
Vitaliy Filippov
818ae5d61d
Some config parsing fixes
2021-05-01 00:20:01 +03:00
Vitaliy Filippov
f6f35f4127
Pass options correctly to not override /etc/vitastor/vitastor.conf
2021-04-30 01:17:44 +03:00
Vitaliy Filippov
72aa2fd819
Make OSD and client read common configuration from /etc/vitastor/vitastor.conf
2021-04-30 01:11:27 +03:00
Vitaliy Filippov
5010b0dd75
Use json11 instead of blockstore_config_t
2021-04-30 00:52:46 +03:00
Vitaliy Filippov
483c5ab380
Negotiate max_msg instead of max_sge, make buffer settings more conservative :-)
2021-04-29 11:10:35 +03:00
Vitaliy Filippov
6a6fd6544d
Add RDMA options to the QEMU driver
2021-04-29 11:02:49 +03:00
Vitaliy Filippov
971aa4ae4f
Implement RDMA receive with memory copying (send remains zero-copy)
...
This is the simplest and, as usual, the best implementation :)
100% zero-copy implementation is also possible (see rdma-zerocopy branch),
but it requires to create A LOT of queues (~128 per client) to use QPN as a 'tag'
because of the lack of receive tags and the server may simply run out of queues.
Hardware limit is 262144 on Mellanox ConnectX-4 which amounts to only 2048
'connections' per host. And even with that amount of queues it's still less optimal
than the non-zerocopy one.
In fact, newest hardware like Mellanox ConnectX-5 does have Tag Matching
support, but it's still unsuitable for us because it doesn't support scatter/gather
(tm_caps.max_sge=1).
2021-04-29 02:34:45 +03:00
Vitaliy Filippov
9e6cbc6ebc
Negotiate max_sge between RDMA client & server
2021-04-29 02:15:20 +03:00
Vitaliy Filippov
ce777319c3
WIP RDMA support
...
Basic naive implementation works, but it's highly non-optimal as
RNR retransmissions occur all the time. RDMA expects the receiver
to always have place for incoming WRs...
2021-04-29 02:03:54 +03:00
Vitaliy Filippov
f8ff39b0ab
Rework continue_ops() to remove a CPU hot spot
...
This rework increases fio -rw=randread -iodepth=128 result from ~120k to ~180k iops :)
2021-04-29 01:50:13 +03:00
Vitaliy Filippov
d749159585
Linked list experiment
...
Rework client operation queue from a vector to a linked list.
This is required to rework continue_ops() as its current implementation
consumes ~25% of client process CPU.
2021-04-29 01:47:33 +03:00
Vitaliy Filippov
9703773a63
Fix has_flushes setting
2021-04-28 23:40:44 +03:00
Vitaliy Filippov
5d8d486f7c
Add SOVERSION
2021-04-20 01:01:32 +03:00
Vitaliy Filippov
2b546cdd55
Link vitastor_blk with vitastor_common for timerfd_manager_t
...
Not really required to operate, but fixes a verify-elf error
2021-04-20 00:51:53 +03:00
Vitaliy Filippov
bd7b177707
Report sensitive configuration values instead of the configuration source
2021-04-17 23:11:16 +03:00
Vitaliy Filippov
82e6aff17b
Support mapping NBD by the image name
2021-04-17 17:39:55 +03:00
Vitaliy Filippov
57e2c503f7
Rename osd_t::c_cli to msgr
2021-04-17 16:32:09 +03:00
Vitaliy Filippov
715bc8d53d
Release 0.6.2
...
- Fix a possible crash during SYNC when journal fsyncs are enabled
- Fix a memory leak in the chained read implementation
2021-04-15 23:40:06 +03:00
Vitaliy Filippov
0af077701c
Fix a possible crash during SYNC when journal fsyncs are enabled
2021-04-15 02:01:50 +03:00
Vitaliy Filippov
cac976ce25
Fix a memory leak in the chained read implementation
2021-04-15 01:42:18 +03:00
Vitaliy Filippov
acf0646542
Build common sources once
2021-04-15 01:13:34 +03:00
Vitaliy Filippov
ede1c1d667
Release 0.6.1
...
A bugfix for the new "chained read from snapshot" feature
2021-04-14 22:32:23 +03:00
Vitaliy Filippov
38bd51c97f
Remove aio_context assertion, it seems it is unneeded
2021-04-14 22:32:15 +03:00
Vitaliy Filippov
966fb763ca
Oooops, fix chained reads
2021-04-13 16:19:21 +03:00
Vitaliy Filippov
0b41ffc08d
Release 0.6.0
...
Warning: upgrading from 0.5.x is currently not supported!
Please create an issue if you really need upgrade capability.
New features:
- Snapshots and Copy-on-Write clones
- Inode (image) names
- Inode I/O and space statistics
- Write throttling for smoothing random write workloads in SSD+HDD configurations
2021-04-11 00:49:18 +03:00
Vitaliy Filippov
64eeb79051
Prevent 0.6.x OSDs from talking to 0.5.x
...
The new protocol is almost compatible - it has bitmaps, but also it has
a "bitmap_length" field. It's not hard to make 0.5-0.6 OSDs and clients
compatible, but for now I just assume nobody needs it.
If I'm wrong and anybody requests to upgrade their production 0.5.x system
to 0.6.x I'll fix it.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov
2a02f3c4c7
Add metadata superblock and check it on start
...
Refuse to start if the superblock is missing or bad version;
zero out the metadata area when initializing superblock.
2021-04-10 22:26:17 +03:00
Vitaliy Filippov
f684d9101a
Refuse to start with old journal version
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
a1f2f19489
Do not increment inode statistics if the object already exists
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
82c1a7ec67
Fix statistics reporting, split inode number into pool & inode
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2ab423d4ef
Implement journaled write throttling for the SSD+HDD case
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4694811eab
Add microsecond accuracy to set_timer
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6b988de17d
Remove timerfd_interval
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
37efdc2a83
Fix bitmap_set for replicated pools
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
591cad09c9
Fix bitmaps for objects larger than 128K
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
b907ad50aa
Oops, forgot to add external bitmaps to blockstore in some places
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
5f5b6ef150
Enable chained reads in the client
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
38a3df4a0e
Implement chained (optimized) read in the primary OSD code
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6950b8e3a0
Watch inode metadata revisions
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
0cea3576fb
Add "read bitmaps" operation to secondary OSD protocol
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
f01eea07d3
Add simplified interface to read blockstore bitmaps synchronously
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2c2f08aca2
Shorten some structure names
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d6524670e1
Introduce data distribution locality
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
7aeb2cbac7
Capture all by value in qemu_proxy
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
2612d3198a
Introduce image names and metadata storage in etcd
...
Each inode has: image name, parent inode number & pool, size and readonly flag
Snapshots are created by switching image name to a different inode number
while using the older inode as parent.
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ab39ce2bbb
Use clean_entry_bitmap_size instead of entry_attr_size back because of changed bitmap handling
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
d0c2e31312
Add a test for snapshots, fix bugs. Now the test passes
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
9038d42327
Fix several snapshot I/O bugs
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
691f066055
Actual snapshot support (untested)
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
ffe1cd4c79
Report inode I/O statistics, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
4ae1b84c67
Report inode space usage statistics to etcd, aggregate it in the monitor
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
c35963967f
Add inode space usage statistics tracking to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
0aa2dd2890
Send bitmaps with primary-reads, actually read bitmaps for READ ops
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6bf88883ac
Allocate bitmaps along with stripes to avoid memory fragmentation
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
004f265393
Remove cryptic bitmap inlining from bs_op_t and osd_op_t, use bitmap in primary OSD code
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
860ac24762
Add "external" bitmap support to the secondary OSD protocol
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6107a4d07b
Add "external" bitmap support to blockstore
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
95c29b9dc3
Add "external" bitmap support to osd_rmw
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
6909807068
Allow to start the OSD just to flush the journal completely
2021-04-10 17:44:12 +03:00
Vitaliy Filippov
18c72f4835
Correct reenterability fix (now verified with a test)
...
It's rather funny but 0.5.12 has to be re-published again
2021-04-09 12:10:16 +03:00
Vitaliy Filippov
40b7c21fb1
Followup to 307c1731c1
- fix mark_stable
2021-04-08 15:47:18 +03:00
Vitaliy Filippov
efb3678606
Fix qemu-img broken in 0.5.11
...
Caused by the lack of reenterability of the main cluster_client function
2021-04-08 14:59:20 +03:00
Vitaliy Filippov
8d87e32175
Fix msgr_op.h includes
2021-04-08 01:18:46 +03:00
Vitaliy Filippov
b0b2e7df3c
Fix use-after-free in keepalive_timer and rework stop_client()
...
The bug reproduced if fio was temporarily stopped with SIGSTOP
during write test and then resumed after 10 seconds. In this case
"pings" were failed for all clients and fio process crashed with
'use-after-free' in keepalive_timer. It happened because it called
stop_client while having a live iterator to the map.
2021-04-07 11:06:31 +03:00
Vitaliy Filippov
97efb9e299
Do not crash on PG re-peering events when operations are in progress
2021-04-07 11:06:31 +03:00
Vitaliy Filippov
f6d705383a
Fix client connection recovery bugs, add dirty_ops limit
2021-04-07 11:06:31 +03:00
Vitaliy Filippov
68567c0e1f
Fix messenger possibly trying to connect to the same OSD twice
2021-04-07 01:30:38 +03:00
Vitaliy Filippov
04b00003e9
Log ping failures
2021-04-07 01:30:38 +03:00
Vitaliy Filippov
307c1731c1
Forget all dirty_entries before stable big_write or delete during initialisation
...
This fixes a 'double_alloc' assertion in the following case:
- big_write object #1 v1 to block #100
- big_write object #1 v2 to block #101
- big_write object #2 v1 to block #100
2021-04-07 01:30:38 +03:00
Vitaliy Filippov
a48e2bbf18
Fix write replay ordering when immediate_commit != all
...
Previous implementation didn't respect write ordering and could lead
to corrupted data when restarting writes after an OSD outage
Also rework cluster_client queueing logic and add tests for it to verify the correct behaviour
2021-04-03 14:51:52 +03:00
Vitaliy Filippov
688821665a
Remove stoull_full() from etcd_state_client.cpp
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
3e162d95a0
Remove http_client.h include from etcd_state_client.h
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
829381b335
Extract some definitions to msgr_op.{cpp,h}
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
54f2353f24
Use bitmap granularity for alignment checks
2021-04-03 14:36:04 +03:00
Vitaliy Filippov
e47f6fba60
Remove cluster_client_t::stop()
2021-04-03 14:35:42 +03:00
Vitaliy Filippov
883bf84a16
Fix build
2021-04-03 01:47:15 +03:00