InfluxDB (as of the 2.6 version) seems not to be able to restore normal operation after getting no space left on a device and cleaning it up. New data is kept in memory, which suggests a normal operation, while no data is written to the disk anymore. Moreover, a (potentially) partially written WAL (write-ahead-log) will be recognized as corrupt during each next startup of the InfluxDB and ignored for subsequent writes.
Solution is easy - just delete the corrupt WAL file. Helpful to identify the right WAL can be checking the logs journalctl -u influxdb.service
and analyzing the WALs influxd inspect verify-wal --wal-path ...
or influxd inspect dump-wal <wal_file_path>
.
However, how to restore the data out of the WAL file? I could not find any command-line tool doing exactly that, but with the help of In-memory indexing with TSM I was able to write a simple code that - accordingly to your needs - can be used as a base for more complex restoration.
Since the data is compressed using Google's Snappy algorithm, we need a library for that pip install python-snappy
.
import snappy
in_path = "PATH TO YOUR CORRUPTED WAL FILE"
out_path = "PATH TO A NEW WAL FILE"
count = 0
with open(in_path, mode='rb') as in_file:
with open(out_path, mode='wb') as out_file:
while True:
op_type = in_file.read(1) # first byte is an operation code
if op_type == b"":
print('file end', in_file.tell())
break
if op_type[0] == 1 or op_type[0] == 0: # use it to identify out-of-sync
count += 1 # just for statistics
length_b = in_file.read(4) # length of the field
length = int.from_bytes(length_b, "big") # in my case they were big-endian
print('id', count, 'op_type', op_type, 'length', length)
d_raw = in_file.read(length)
try:
d = snappy.uncompress(d_raw) # a real test if the data is not corrupt
# print(d)
# copy good data to a new file
out_file.write(op_type)
out_file.write(length_b)
out_file.write(d_raw)
except Exception as e:
# the current entry wasn't readable, skip it
print('exception', e)
else:
print('id', count, 'unexpected op type', op_type, 'at file position', in_file.tell())
# if this is your case and you expect more valid data in the file, you may try to re-sync here
break
print('total entries found', count)