Complete Influx TICK Stack Disaster Recovery

My entire system became corrupt one day and while it was technically booting it was not functioning. I did not have proper backups so the road to recovery was long & painful. I now have better emphasis on backups.

Typically all Influx data is backed up by:

influxd backup -portable /media/usb/drive

and restored with

influxd restore -portable /media/usb/drive

I did not have this luxury so I started with copying all the main files to and external drive, these were:

/var/lib/influxdb/data
/var/lib/influxdb/wal
/var/lib/influxdb/meta
/var/lib/kapacitor/kapacitor.db

Okay we are now finished with the corrupted image, do a full fresh install of your system. (tutorial)

Great we are now all setup, insert USB where files were backed up to before, we need to tell Influx config to look at memory stick, edit the below file with:

sudo nano /etc/influxdb/influxdb.conf 
[meta]
  #dir = "/var/lib/influxdb/meta"
  dir = "/media/usb/drive/meta"

[data]
  #dir = "/var/lib/influxdb/data"
  dir = "/media/usb/drive/data"

  #wal-dir = "/var/lib/influxdb/wal"
  wal-dir = "/media/usb/drive/wal"

I also needed to change the user of the files on the USB by:

sudo chown -R influxdb:influxdb /media/usb/drive

We will revert some of the above changes later on.
Note: My original plan was to have all files on the USB drive permanently but as soon as I added the data source in Chronograf everything broke so I undid this. I just used this step to export the data properly.

Reboot system.

Now all your old data should be loaded.

Now we will create a proper backup of the data with the below:

influxd backup -portable /media/usb/drive

Revert all changes in the influxdb.conf file:

sudo nano /etc/influxdb/influxdb.conf 

Now restore all data back to the default locations by:

influxd restore -portable /media/usb/drive

Since it took me a few days to figure out how to restore data I already had the system back up recording data, the above restore does not work if a database is already created so I had to side-load all databases in with:

influxd
CREATE my_data_bak
USE my_data_bak
SELECT * INTO my_data..:MEASUREMENT FROM /.*/ GROUP BY *
DROP DATABASE my_data_bak
exit

Finally add back in your Chronograf alerts etc. by:

sudo mv /var/lib/kapacitor/kapacitor.db /var/lib/kapacitor/kapacitor_orig.db 
sudo mv /media/usb/drive/kapacitor.db /var/lib/kapacitor/
sudo chown -R kapacitor:kapacitor /var/lib/kapacitor/kapacitor.db

Future planning would be to keep regular backups with: (you need to do this individually for all databases). See my other post on this.

influxd backup -portable /media/usb-influx/backup
kapacitor backup /media/usb/drive/kapacitor.db

Reboot and we are done!

Installing TICK Stack on RPi4

Nothing complicated this time, just commands I use to setup my Influx TICK stack from fresh install.

sudo apt-get update
sudo apt-get upgrade
wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/os-release
test $VERSION_ID = "10" && echo "deb https://repos.influxdata.com/debian buster stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt-get install influxdb
sudo apt install influxdb-client
sudo apt-get update
sudo apt-get install telegraf
sudo apt-get install chronograf
sudo apt-get install kapacitor
sudo systemctl unmask influxdb.service 
sudo systemctl start influxdb 
sudo apt-get install fail2ban
sudo apt-get install ntp
sudo apt-get install ntpstat
systemctl stop systemd-timesyncd
systemctl disable systemd-timesyncd
/etc/init.d/ntp stop
/etc/init.d/ntp start
sudo reboot

Confirm everything is working:

sudo service kapacitor status
sudo service chronograf status
sudo service influxdb status
sudo service telegraf status
ntpstat

You can also head to the Chronograf configuration page on: http://192.168.1.xxx:8888

That’s it!

PiHole logging to InfluxDB & Grafana Dash

Building on the work of others before me, below you will find a tutorial to get PiHole logging to InfluxDB using a python script and then to a Grafana Dashboard. All required code available on my GitHub.

SSH into your PiHole: ssh pi@xxx.xxx.xxx.xxx and run the below:

Install python dependencies:

sudo apt-get install python-influxdb

Create the below python file:

sudo nano influx_scripts/piholestats.py
#! /usr/bin/python

# History:
# 2016: Script originally created by JON HAYWARD: https://fattylewis.com/Graphing-pi-hole-stats/
# 2016 (December) Adapted to work with InfluxDB by /u/tollsjo
# 2016 (December) Updated by Cludch https://github.com/sco01/piholestatus
# 2020 (March) Updated by http://cactusprojects.com/pihole-logging-to-influxdb-&-grafana-dash

import requests
import time
from influxdb import InfluxDBClient

HOSTNAME = "pihole" # Pi-hole hostname to report in InfluxDB for each measurement
PIHOLE_API = "http://192.168.1.XXX/admin/api.php"
INFLUXDB_SERVER = "192.168.1.XXX" # IP or hostname to InfluxDB server
INFLUXDB_PORT = 8086 # Port on InfluxDB server
INFLUXDB_USERNAME = ""
INFLUXDB_PASSWORD = ""
INFLUXDB_DATABASE = "dev_pihole"
DELAY = 10 # seconds

def send_msg(domains_blocked, dns_queries_today, ads_percentage_today, ads_blocked_today):

	json_body = [
	    {
	        "measurement": "piholestats." + HOSTNAME.replace(".", "_"),
	        "tags": {
	            "host": HOSTNAME
	        },
	        "fields": {
	            "domains_blocked": int(domains_blocked),
                    "dns_queries_today": int(dns_queries_today),
                    "ads_percentage_today": float(ads_percentage_today),
                    "ads_blocked_today": int(ads_blocked_today)
	        }
	    }
	]

	client = InfluxDBClient(INFLUXDB_SERVER, INFLUXDB_PORT, INFLUXDB_USERNAME, INFLUXDB_PASSWORD, INFLUXDB_DATABASE) # InfluxDB host, InfluxDB port, Username, Password, database
	# client.create_database(INFLUXDB_DATABASE) # Uncomment to create the database (expected to exist prior to feeding it data)
	client.write_points(json_body)

api = requests.get(PIHOLE_API) # URI to pihole server api
API_out = api.json()

#print (API_out) # Print out full data, there are other parameters not sent to InfluxDB

domains_blocked = (API_out['domains_being_blocked'])#.replace(',', '')
dns_queries_today = (API_out['dns_queries_today'])#.replace(',', '')
ads_percentage_today = (API_out['ads_percentage_today'])#
ads_blocked_today = (API_out['ads_blocked_today'])#.replace(',', '')

send_msg(domains_blocked, dns_queries_today, ads_percentage_today, ads_blocked_today)

Save and Exit.

I have the file run on a cron job every minute. Others set it up as a service but cron job works just fine for me:

crontab -e
*/1 * * * * /usr/bin/python /home/pi/influx_scripts/piholestats.py

We need to create Influx database next, I carried this out through the Chronograf web interface but add it through the terminal by the below if required:

influx
create database dev_pihole
exit

Now onto Grafana Dash:

Add the “dev_pihole” database to the Grafana Data Sources list.

Next go to “Import dashboard” and paste in the JSON code on my Github. I tweaked a previous dashboard slightly.

All done!

OpenWRT logging to InfluxDB & Grafana Dash

Building on the work of others before me, below you will find a complete tutorial to get OpenWRT logging to InfluxDB using the “connectd” plugin. All required code available on my GitHub.

SSH into your router console: ssh root@xxx.xxx.xxx.xxx and run the below:

opkg update
opkg install luci-app-statistics collectd collectd-mod-cpu \
collectd-mod-interface collectd-mod-iwinfo \
collectd-mod-load collectd-mod-memory collectd-mod-network collectd-mod-uptime collectd-mod-thermal collectd-mod-openvpn collectd-mod-dns collectd-mod-wireless
/etc/init.d/luci_statistics enable
/etc/init.d/collectd enable

Go to router Web Interface and there is a new “Statistics” tab, its mostly setup but quick configuration: (also see screenshot below)

  • Go to Statistics -> Setup -> add ‘Hostname’ field and populate it. (doesn’t exist by default for some reason)
  • Go to Statistics -> Setup -> Output plugins -> add the details of your InfuxDB server. (leave the port as 25826)

We are finished with the router now, I rebooted it, not sure if was 100% necessary.

Next SSH into your InfluxDB console: ssh xxx@xxx.xxx.xxx.xxx

Create file: /usr/local/share/collectd/types.db (add file from my Github)

sudo nano /usr/local/share/collectd/types.db

We now need to enable the “collectd” plugin in InfluxDB config:

sudo nano /etc/influxdb/influxdb.conf

Configure it so it is the same as below:

[[collectd]]
   enabled = true
   bind-address = ":25826"
   database = "dev_collectd"
   retention-policy = ""
  #
  # The collectd service supports either scanning a directory for multiple types
  # db files, or specifying a single db file.
   typesdb = "/usr/local/share/collectd/types.db"
  #
   security-level = "none"
   auth-file = "/etc/collectd/auth_file"

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.

  # Flush if this many points get buffered
   batch-size = 5000

  # Number of batches that may be pending in memory
   batch-pending = 10

  # Flush at least this often even if we haven't hit buffer limit
   batch-timeout = "10s"

  # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
   read-buffer = 0

  # Multi-value plugins can be handled two ways.
  # "split" will parse and store the multi-value plugin data into separate measurements
  # "join" will parse and store the multi-value plugin as a single multi-value measurement.
  # "split" is the default behavior for backward compatibility with previous versions of influxdb.
  # parse-multivalue-plugin = "split"

Exit & Save.

Add new database in InfluxDB, I carried this out through the Chronograf web interface but add in through the terminal by the below if required:

    influx
    create database dev_collectd
    exit

Restart InfluxDB to activate the new config:

sudo service influxd restart

Now onto Grafana Dash:

Add the “dev_collectd” database to the Grafana Data Sources list.

Next go to “Import dashboard” and paste in the JSON code on my Github. I tweaked a previous dashboard slightly.

All done!

References I used:
https://blog.christophersmart.com/2019/09/09/monitoring-openwrt-with-collectd-influxdb-and-grafana/
https://wiki.opnfv.org/display/fastpath/Installing+and+configuring+InfluxDB+and+Grafana+to+display+metrics+with+collectd

Notes on what doesn’t work:
Can’t see amount of connected wireless devices.
OpenVPN stats also not working.
Its on the to do list if I can get this going again.

InfluxDB Backup Database (2 methods)

It makes sense to periodically backup InfluxDB to an external drive in-case of corruption of onboard memory. I am using a USB memory stick.

A simple cronjob can take care of this (every night 2am), open Crontab:

sudo crontab -e

and insert the below line: (change for your storage device)

0 2 * * * influxd backup -portable /media/usb/drive

Backup names start with the date it was generated but it can get messy after a few weeks.

Long term its better to run a backup script to put backups in individual directories and catch errors etc., create a python file for this and use the below example:

nano /home/pi/influx_scripts/influx_backup.py
import os
from datetime import date

today = date.today()

d1 = today.strftime("%Y_%m_%d")
print("Date", d1)

command = "mkdir /media/usb-backup/" + d1
#print(command)
os.system(command)

command = "influxd backup -portable /media/usb-backup/" + d1
os.system(command)

command = "kapacitor backup /media/usb-backup/"+d1+"/kapacitor.db"
os.system(command)
os.system("echo Backups Done!")
sudo crontab -e
0 2 * * * python /home/pi/influx_scripts/influx_backup.py

You can keep an eye on the USB memory stick size by the below snip of script which can be logged to InfluxDB. A Grafana alarm keeps an eye on the size and alerts if getting close to capacity.

DIRECTORY="/media/usb/drive"
if [ -d "$DIRECTORY" ]; then
    usb_mem_usage=$(du -s $DIRECTORY | awk 'NR==1{print $1}')
else
    usb_mem_usage="-1"
fi
echo $usb_mem_usage

All done!

ESP8266 logging to InfluxDB

The ESP8266 is a $5 IOT device with huge capabilities. In this post we will log data to a remote Influx database running on a RaspberryPi.

I am programming the ESP8266 in the Arduino IDE, the ESP8266 library is required, you can find it here. I have a test code file (of copy from below) that you can upload after entering your InfluxDB I.P. Address, SSID & Password and it will start logging data immediately.

#include <ESP8266WiFi.h>
#include <ESP8266WiFiMulti.h>
#include <InfluxDb.h>

#define INFLUXDB_HOST "192.168.1.1"   //Enter IP of device running Influx Database
#define WIFI_SSID "SSID"              //Enter SSID of your WIFI Access Point
#define WIFI_PASS "PASSWORD"          //Enter Password of your WIFI Access Point

ESP8266WiFiMulti WiFiMulti;
Influxdb influx(INFLUXDB_HOST);

void setup() {
  Serial.begin(9600);
  WiFiMulti.addAP(WIFI_SSID, WIFI_PASS);
  Serial.print("Connecting to WIFI");
  while (WiFiMulti.run() != WL_CONNECTED) {
    Serial.print(".");
    delay(100);
  }
  Serial.println("WiFi connected");
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());

  influx.setDb("esp8266_test");

  Serial.println("Setup Complete.");
}

int loopCount = 0;

void loop() {
  loopCount++;

  InfluxData row("data");
  row.addTag("Device", "ESP8266");
  row.addTag("Sensor", "Temp");
  row.addTag("Unit", "Celsius");
  row.addValue("LoopCount", loopCount);
  row.addValue("RandomValue", random(10, 40));

  influx.write(row);
  delay(5000);
}

The Arduino Serial Terminal will display something like the below so you can if it is working. (My previous tutorial shows setting up InfluxDB, ensure you have the database “esp8266_test” created as we are going to write to that.)

 --> writing to esp8266_test:
data,Device=ESP8266,Sensor=Temp,Unit=Celsius LoopCount=256.00,RandomValue=37.00
 <-- Response: 204 ""
 --> writing to esp8266_test:
data,Device=ESP8266,Sensor=Temp,Unit=Celsius LoopCount=257.00,RandomValue=20.00
 <-- Response: 204

On the Influx Database we can look at the data by:

influx
USE esp8266_test
select * from data limit 50

Below you can see the export from my database (I have shortened the time field for neatness). You can see I reset the ESP8266 a couple of times due to the LoopCount value.

time        Device  LoopCount RandomValue Sensor Unit    
----        ------  --------- ----------- ------ ----   
52808175073 ESP8266 1         38          Temp   Celsius                              
63108846141 ESP8266 2         35          Temp   Celsius                              
69802517277 ESP8266 1         13          Temp   Celsius                              
79892112240 ESP8266 2         12          Temp   Celsius                              
89961602267 ESP8266 3         14          Temp   Celsius                              
99998928411 ESP8266 4         22          Temp   Celsius                              
10053683452 ESP8266 5         10          Temp   Celsius                              
20120378415 ESP8266 6         28          Temp   Celsius                              
30175745403 ESP8266 7         14          Temp   Celsius                              
40732248123 ESP8266 8         38          Temp   Celsius                              
51232948067 ESP8266 9         15          Temp   Celsius                              
61322347831 ESP8266 10        13          Temp   Celsius                              
71424432515 ESP8266 11        19          Temp   Celsius                              
84740185749 ESP8266 1         18          Temp   Celsius                              
94790343615 ESP8266 2         21          Temp   Celsius                              
04839215465 ESP8266 3         13          Temp   Celsius                              
31864448941 ESP8266 1         32          Temp   Celsius                              
41956355523 ESP8266 2         36          Temp   Celsius                              
52018136222 ESP8266 3         30          Temp   Celsius                              
62083037888 ESP8266 4         22          Temp   Celsius       

That’s it!

Backup InfluxDB

It makes sense to backup the InfluxDB periodically so we don’t loose all our data.

We can do this in the terminal by:

influxd backup -portable /home/pi/influx_backup/

Make it run every night at 2am by opening crontab and adding the below code:

crontab -e
0 2 * * * influxd backup -portable /home/pi/influx_backup/

Now it would make sense for the above location to be a USB drive etc. as if our main drive fails we would loose the backup along with the original data. We can do this by updating the crontab -e to:

0 2 * * * influxd backup -portable /media/YOUR_USB_DRIVE_NAME

I had to instal the below package to allow the RaspberryPi write to the USB drive:

sudo apt-get install ntfs-3g

Another improvement would be to put all this in a script and push to another machine maybe over FTP but this is as far as I got right now and works well.

We can also see how much data is on the USB Drive by the below, maybe we will log this to Influx in future to keep an eye on backup sizes.

du -sh /media/YOUR_USB_DRIVE_NAME

That’s it!

RPi Status Log to InfluxDB

In the last post we setup InfluxDB, now we are going to start storing system parameters every minute. It will work out of the box for Raspberry Pi and probably for some other Linux distros.

We are going to log the system uptime, the CPU & GPU Temperatures, the current CPU usage as well as the average CPU usage since boot.

The code is available here or copy from the end of this post. Put the code in a file called soc.sh, don’t forget to update your IP address in the code and make it executable by:

chmod +x soc.sh

We are going to log to database rpi_01, if you don’t have this created already complete the below:

influx
create database rpi_01
exit

Test our script run:

bash soc.sh

To confirm it works we can check the database:

influx
use rpi_01
select * from system_status

and you should see something like: (type exit when you are done)

name: system_status
time   cpu_temp cpu_usage gpu_temp system   system_model   uptime
15329   39.5     14        40.1     RPI-01   ZeroW_V1.1   1386.68

Now we want the system status to be logged every minute, we do this by adding it to crontab:

crontab -e 

Add this line and save and close: (ensure path is correct to your file)

*/1 * * * * /home/pi/influx_scripts/soc.sh

Check back after a while to ensure the logging is happening. In the next post we are going to show the status in graphical form using Grafana like the below:


soc.sh code:

#!/bin/bash
# Gets SOC GPU Temperatures
gpu_temp_0=$(/opt/vc/bin/vcgencmd measure_temp | tr -cd '0-9.')

# Gets System Uptime
uptime=0
uptime=$(awk '{print $1}' /proc/uptime)

# Gets SOC CPU Temperatures
cpu_temp_0=$(cat /sys/class/thermal/thermal_zone0/temp)
cpu_temp_1=$(($cpu_temp_0/1000))
cpu_temp_2=$(($cpu_temp_0/100))
cpu_temp_3=$(($cpu_temp_2 % $cpu_temp_1))
cpu_temp_4=$cpu_temp_1"."$cpu_temp_3

# Converts the total CPU Usage into %
PREV_TOTAL=0
PREV_IDLE=0
Average=0

  for i in {1..6}
  do
  # Since the CPU fluctuates, it discards the first reading and averages the next 5.
  CPU=(`sed -n 's/^cpu\s//p' /proc/stat`) # Discards the cpu prefix
  IDLE=${CPU[3]} 			  # Just the idle CPU time.

  # Calculate the total CPU time.
  TOTAL=0
  for VALUE in "${CPU[@]}"; do
    let "TOTAL=$TOTAL+$VALUE"
  done

  # Calculate the CPU usage since we last checked.
  let "DIFF_IDLE=$IDLE-$PREV_IDLE"
  let "DIFF_TOTAL=$TOTAL-$PREV_TOTAL"
  let "DIFF_USAGE=(1000*($DIFF_TOTAL-$DIFF_IDLE)/$DIFF_TOTAL+5)/10"

  # Remember the total and idle CPU times for the next check.
  PREV_TOTAL="$TOTAL"
  PREV_IDLE="$IDLE"

if [ $i -gt 1 ] # Ignores 1st reading as this is CPU average since boot
    then
	let Average="$DIFF_USAGE+$Average"
fi

  # Wait 1s before checking again.
  sleep 1
done

let Average="$Average/5"

curl -i -XPOST 'http://your.influxDB.ip.address:8086/write?db=rpi_01' --data-binary 'system_status,system=RPI-01,system_model=Insert_Model_Name cpu_usage='$Average',cpu_temp='$cpu_temp_4',gpu_temp='$gpu_temp_0',uptime='$uptime''

Credit to resources I used:
https://hwwong168.wordpress.com/2015/10/12/raspberry-pi-2-gpu-and-cpu-temperature-logger-with-influxdb/

InfluxDB Setup on RPi Zero

InfluxDB is a time series database. The build is robust and straightforward but first lets start with what didn’t work:

  • I could not get Raspbian Buster image to work for the RPi Zero, I reverted to Raspbian Stretch image. Get the image from the Raspberry Pi Archives. Grafana also showed issues on Buster so avoid the headache for now.
wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/os-release
test $VERSION_ID = "9" && echo "deb https://repos.influxdata.com/debian stretch stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt-get update
sudo apt-get install influxdb
sudo service influxdb start
influxd -config /etc/influxdb/influxdb.conf
echo $INFLUXDB_CONFIG_PATH /etc/influxdb/influxdb.conf
influxd
sudo service influxdb restart

Open the file /etc/influxdb/influxdb.conf and ensure the [http] section looks like the below:

sudo nano /etc/influxdb/influxdb.conf
[http]
  # Determines whether HTTP endpoint is enabled.
   enabled = true

  # Determines whether the Flux query endpoint is enabled.
  # flux-enabled = false

  # Determines whether the Flux query logging is enabled.
  # flux-log-enabled = false

  # The bind address used by the HTTP service.
   bind-address = ":8086"

  # Determines whether user authentication is enabled over HTTP/HTTPS.
   auth-enabled = false

InfluxDB is setup and running now but we have no data stored. First we must create a database in InfluxDB, then we can insert data:

influx
create database rpi_01
INSERT system_status,system=RPI-01 cpu_usage=10

We can then view the contents of the database by:

use rpi_01
select * from system_status

You should see and entry like the below in your terminal:

name: system_status
time                cpu_usage system
----                --------- ------
1573332238034530963 10        RPI-01

We are now successfully manually writing to the database, in the next tutorial we will write a script to log the CPU & GPU temperatures of the Raspberry Pi and also the CPU Usage in %.

Credit to resources I used:
https://www.circuits.dk/install-grafana-influxdb-raspberry/
https://github.com/influxdata/docs.influxdata.com/blob/master/content/telegraf/v1.2/introduction/installation.md