Due to the goal of having a hands-off system, besides hard crashes, something going wrong might go undetected for quite some time. Therefore the Raspberry Pi (4B) will be monitored and have alerts set for certain events.
For this purpose Telegraf (gather data), InfluxDB (store data) and Grafana (visualize data & alerts) will be used. They were chosen because they seem popular, rather mature, feature rich and stable. For each of the components there are alternatives available. All of them are free and can be self hosted. For installation see my previous post.
To interact with influxdb (2) i strongly advice to use its excellent web UI (
Create a new bucket with retention policy 1d (or as desired). I have called it “telegraf”.
Create a at least one read/write access token, or two if you want to split telegraf and grafana.
Its configuration is located at:
Comment the output plugin
[[outputs.influxdb]] (enabled by default) and enable
[[outputs.influxdb_v2]]. Fill in the urls (
http://127.0.0.1:8086), token (see influxdb web UI), organization and bucket (e.g. “telegraf”).
This section only applies if you want to run custom scripts/programs to gather data.
Telegraf runs as a dedicated user (
telegraf) which does not have sudo rights. Neither does it have rights to common folders (e.g. home folders of other users), if you want to execute custom scripts i advice you to mark the script as executable (
chmod +x script.sh) and move it to
/var/lib/telegraf; telegraf’s user homefolder. That way Telegraf will always have access.
If you need to debug script use:
sudo -u telegraf /full/path/to/script.sh to manually run the script.
telegraf execute a script which requires elevated privileges (sudo) add the script to
telegraf ALL=(ALL) NOPASSWD : /path/to/script.sh (it must go behind all other rules, last one takes precedence). Note: this is not secure, if one can change the script they effectively have root privileges. For our purposes elevated privileges will not be required.
In order for the
telegraf user to be able to gather information about the GPU one first has to add
telegraf into the
sudo usermod -G video telegraf. To test if it works use
sudo -u telegraf /opt/vc/bin/vcgencmd measure_temp.
Check the permission of
vchiq which might be incorrect.
ls -al /dev/vchiq crw------- 1 root root 245, 0 Jul 25 17:10 /dev/vchiq
Correct them by using:
sudo chgrp video /dev/vchiq sudo chmod 0660 /dev/vchiq ls -al /dev/vchiq crw-rw---- 1 root video 245, 0 Jul 25 17:10 /dev/vchiq
The Raspberry Pi comes with a tool,
/opt/vc/bin/vcgencmd, to query various system parameters; intended for development. For example voltage, temperature, frequency and throttled state. Some parsing/formatting is required, this can be done through a shell script and fed to telegraf. Fortunately
robcowart had the same thought and have already created such script.
Fortunately on Arch Linux ARM one does not need sudo to run
/opt/vc/bin/vcgencmd as root, so i made slight modifications and combined both scripts. The only thing one has to do is mark it as executable (
chmod +x vcgencmd.sh) and move it to
/var/lib/telegraf; that’s it.
Uncomment whatever you want to measure
As the original author already noted, one could write a proper plug-in for Telegraf instead of using a shellscript.
Telegraf input configuration
Telegraf defines data sources as “inputs”, it has quite a lot of default
input plugins available, for example: cpu, memory, network, …. info all work out-of-the-box. To enable them uncomment/add them in Telegraf’s config file. The config file is quite large by itself therefor it is advised to create a separate
conf file (e.g.
inputs.conf) and add it to
/etc/telegraf/telegraf.d/. This way you will retain a better overview of your telegraf config, some people even prefer a separate config per input plugin. Technically it does not matter whether or not, or how, you split them up. They are effectively all appended.
Some input plugins might require to install other software, for example
[[inputs.hddtemp]] requires to have the
hddtemp daemon running (install the
hddtemp package). Note: not all drives have a temperature sensor, YMMV.
Configuring plug-ins is very straight forward, only for the “topk” processes default plugin there is some additional configuration done to properly format the values (copied from akavel). Other than that everything should be self explanatory (if not have a look at the official documentation).
Remember to reload the telegraf service when the conf file has been changed.
Its configuration is located at:
/etc/grafana/grafana.ini however creating and adjusting dashboards is done through its web interface:
http://[Raspberry Pi IP]:4000/
To set-up Grafana with InfluxDB as data source: in the webinterface go to “Data Sources”, select “Add new Data Source” and find InfluxDB under “Timeseries Databases”. Set the query language to
Flux, fill in the url (
http://127.0.0.1:8086) and fill in the “InfluxDB Details” (you can find the token in the influxdb web UI).
Create your own dashboard or use one of the many community dashboards available, including mine (see bottom of this page). Dashboard can be imported as JSON or grafana id. Often when you load a dashboard for the first time some variables, like data source, have to be configured. For more info see the official documentation.
To debug queries it is strongly advised to try them out in the InfluxDB explorer (see web UI).
This post was written for Grafana 7.4.
If the dashboard is not shown 24/7 something might need your attention without you noticing, grafana has this feature build in. Alters. There are many “notification channels” (e.g. Microsoft Teams, slack, discord, PagerDuty, …) which can be configured in the config file; i will use SMTP (email).
Of course Grafana runs on the Raspberry Pi itself, so if the RPi goes down grafana goes down as well. Classic case of “who is watching the watcher”.
Unfortunately there are some, none obvious, limitations in grafana’s altering system: they can only be applied to graphs and the queries cannot use any variables. This issue has been raise since 2016 but so far no dice. They recently stated working on it though. Let’s hope i can rewrite this section soon.
Using Grafana: create dedicated graph panels for every metric you want to set an alert on and do not use any variables in your queries. You can put them all in the same “row” and collapse it. While not ideal it does work, has not much side effects and does not require further attention once set-up. If you get
tsdb.HandleRequest() error time: invalid duration $interval
make sure there is no variable used in the query options.
Using InfluxDB dashboard: InfluxDB web UI also allows one to create dashboards, it is very similar to grafana. One can copy the queries verbatim. It also supports alters however, SMTP (email) as notification channel is not (yet?) supported.
Therefore i simply did not configure any alerts for the time being.
Few gotcha’s i have encountered while exploring Grafana:
One cannot adjust the refresh rate of individual panels. This would be useful to fine-tune performance and allow faster refresh rates for certain applicable metrics.
If you want 2 y-axes with different units do NOT set the “Unit” under “Standard options” since it will overwrite the individual axis settings. Unfortunately once set you can no longer clear it through the UI, you have to open the JSON file of the panel and set the standard unit to
"". You can now set a unit per y-axis again.
Using Flux disjoint graph series (show nothing if there is no data) seem broken. It should be enough to just set
Null values(under “Display”) to
null. However it does not seem to work. This affects the Processes graphs.
Without further or do, here is the result. Of course you can always further customize to your liking. To try for yourself import the attached JSON below or the Grafana id
(the old version of this board, for InfluxDB 1.X, is available under Grafana id
13044 but no longer maintained)
Individual rows on the dashboard
- Permalink: //oostens.me/projects/raspberrypiserver/system-monitoring/
- License: The text and content is licensed under CC BY-NC-SA 4.0. All source code I wrote on this page is licensed under The Unlicense; do as you please, I'm not liable nor provide warranty.