InfluxDB (as of the 2.6 version) seems not to be able to restore normal operation after getting no space left on a device and cleaning it up. New data is kept in memory, which suggests a normal operation, while no data is written to the disk anymore. Moreover, a (potentially) partially written WAL (write-ahead-log) will be recognized as corrupt during each next startup of the InfluxDB and ignored for subsequent writes.
Solution is easy - just delete the corrupt WAL file. Helpful to identify the right WAL can be checking the logs
journalctl -u influxdb.service and analyzing the WALs
influxd inspect verify-wal --wal-path ... or
influxd inspect dump-wal <wal_file_path>.
However, how to restore the data out of the WAL file? I could not find any command-line tool doing exactly that, but with the help of In-memory indexing with TSM I was able to write a simple code that - accordingly to your needs - can be used as a base for more complex restoration.
Since the data is compressed using Google's Snappy algorithm, we need a library for that
pip install python-snappy.
import snappy in_path = "PATH TO YOUR CORRUPTED WAL FILE" out_path = "PATH TO A NEW WAL FILE" count = 0 with open(in_path, mode='rb') as in_file: with open(out_path, mode='wb') as out_file: while True: op_type = in_file.read(1) # first byte is an operation code if op_type == b"": print('file end', in_file.tell()) break if op_type == 1 or op_type == 0: # use it to identify out-of-sync count += 1 # just for statistics length_b = in_file.read(4) # length of the field length = int.from_bytes(length_b, "big") # in my case they were big-endian print('id', count, 'op_type', op_type, 'length', length) d_raw = in_file.read(length) try: d = snappy.uncompress(d_raw) # a real test if the data is not corrupt # print(d) # copy good data to a new file out_file.write(op_type) out_file.write(length_b) out_file.write(d_raw) except Exception as e: # the current entry wasn't readable, skip it print('exception', e) else: print('id', count, 'unexpected op type', op_type, 'at file position', in_file.tell()) # if this is your case and you expect more valid data in the file, you may try to re-sync here break print('total entries found', count)
Next: nginx + gunicorn + flask + systemd