Maintenance#
Software#
Database#
PostgreSQL#
Important
All these instructions have been tested with PostgreSQL version 13 only!
Changing collate#
Sometimes you might create a new database without specifying the collate information. PosgreSQL will use the default collate setting and there is no way to change it once the database is created. The only solution is to dump the database, to create a new one with the correct collate and finally to import the dump and drop the original database.
See also
What is Collation in Databases? [1]
backup old database
pg_dump -U myuser olddb > olddb.bak.sql
create new database with correct collate
CREATE DATABASE newdb WITH OWNER myuser TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
import dump into new database
psql -U myuser -d newdb < olddb.bak.sql
rename old database
ALTER DATABASE olddb RENAME TO olddb_bak;
rename the new database to the original database name
ALTER DATABASE newdb RENAME TO olddb;
restart services and check that everything works
drop old database
DROP DATABASE olddb_bak;
Moving data directory#
If the data directory in the root partition is getting too large you can create
a new partition (mounted on /postgresql
in this example) and let
PostgreSQL point to that one instead.
install the dependencies
apt-get install rsync
stop PostgreSQL
systemctl stop postgresql
copy the database directory
rsync -avAX /var/lib/postgresql/13/main /postgresql/13
change the
data_directory
setting in/etc/postgresql/postgresql.conf
data_directory = '/postgresql/13/main' # use data in another directory
restart PostgreSQL
systemctl start postgresql
High availability#
In this example we configure two nodes. When the master node goes offline the backup node takes over the floating IP address. We replicate services on the backup node such as Apache and Unbound.
Keepalived is a tool which handles network replication for layers 3 and 4. Here you can find the configuration file for the master node (the backup node just needs minor edits).
A script you find below helps you copy the content of webservers, DNS server, etc on the backup node as well as restarting those services automatically.
See also
Key#
Node |
IP |
Hostname |
Network interface name |
---|---|---|---|
MASTER |
192.168.0.10 |
mst |
eno1 |
BACKUP |
192.168.0.11 |
bak |
eno1 |
floating IP |
192.168.0.100 |
- |
- |
Basic setup#
install the dependencies. Keepalived must be installed on all nodes
apt-get install keepalived rsync
create the
configuration
for the master node1global_defs { 2 max_auto_priority -1 3} 4 5######################## 6## VRRP configuration ## 7######################## 8 9# Identify the VRRP instance as, in this case, "failover_link". 10vrrp_instance failover_link { 11 12 # Initial state of the keepalived VRRP instance on this host 13 # (MASTER or BACKUP). Once started, only priority matters. 14 state MASTER 15 16 # interface this VRRP instance is bound to. 17 interface eno1 18 19 # Arbitrary value between 1 and 255 to distinguish this VRRP 20 # instance from others running on the same device. It must match 21 # with other peering devices. 22 virtual_router_id 1 23 24 # Highest priority value takes the MASTER role and the 25 # virtual IP (default value is 100). 26 priority 110 27 28 # Time, in seconds, between VRRP advertisements. The default is 1, 29 # but in some cases you can achieve more reliable results by 30 # increasing this value. 31 advert_int 2 32 33 use_vmac 34 vmac_xmit_base 35 36 # Authentication method: AH indicates ipsec Authentication Header. 37 # It offers more security than PASS, which transmits the 38 # authentication password in plaintext. Some implementations 39 # have complained of problems with AH, so it may be necessary 40 # to use PASS to get keepalived's VRRP working. 41 # 42 # The auth_pass will only use the first 8 characters entered. 43 authentication { 44 auth_type AH 45 auth_pass f5K.*0Bq 46 } 47 48 # VRRP advertisements ordinarily go out over multicast. This 49 # configuration paramter causes keepalived to send them 50 # as unicasts. This specification can be useful in environments 51 # where multicast isn't supported or in instances where you want 52 # to limit which devices see your VRRP announcements. The IP 53 # address(es) can be IPv4 or IPv6, and indicate the real IP of 54 # other members. 55 unicast_peer { 56 192.168.0.11 57 } 58 # Virtual IP address(es) that will be shared among VRRP 59 # members. "Dev" indicates the interface the virtual IP will 60 # be assigned to. And "label" allows for clearer description of the 61 # virtual IP. 62 virtual_ipaddress { 63 192.168.0.100 dev eno1 label eno1:vip 64 } 65}
Note
Copy this file in the backup node as well and change:
state MASTER
tostate BACKUP
unicast_peer { 192.168.0.11 }
tounicast_peer { 192.168.0.10 }
priority 110
topriority 100
restart keepalived on both nodes
systemctl restart keepalived
ping the floating IP address
ping -c1 192.168.0.100
test replication by stopping Keepalived on the master node only and pinging the floating IP address. Finally, restart keepalived
systemctl stop keepalived ping -c1 192.168.0.100 systemctl start keepalived
Service replication#
Make sure to be in a trusted network because we allow root login via SSH to simplify operations. In this example we copy files from
Apache
Unbound
dnscrypt-proxy
Certbot (Let’s encrypt)
The enabled_files
directory in the master node contains files with lists
of files or directories which will be copied by rsync to the backup server.
create the
script
1#!/usr/bin/env bash 2# 3# keepalived_deploy.sh 4# 5# Copyright (C) 2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 6# 7# This program is free software: you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation, either version 3 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# You should have received a copy of the GNU General Public License 18# along with this program. If not, see <http://www.gnu.org/licenses/>. 19 20set -euo pipefail 21 22. "${1}" 23 24SRC='/' 25DST='/' 26ENABLED_FILES=$(find enabled_files/* -type f) 27SYSTEMD_DEPLOY_SERVICES=$(cat systemd_deploy_services.txt) 28 29# Sync files. 30for f in ${ENABLED_FILES}; do 31 printf "%s\n" "${RSYNC_BASE} --files-from="${f}" "${SRC}" \ 32 "${USER}"@"${HOST}":"${DST}"" 33 ${RSYNC_BASE} --files-from="${f}" "${SRC}" "${USER}"@"${HOST}":"${DST}" 34done 35 36# Restart systemd services. 37ssh "${USER}"@"${HOST}" "\ 38 systemctl daemon-reload \ 39 && systemctl reenable --all "${SYSTEMD_DEPLOY_SERVICES}" \ 40 && systemctl restart --all "${SYSTEMD_DEPLOY_SERVICES}" \ 41 && systemctl status --all --no-pager "${SYSTEMD_DEPLOY_SERVICES}""
create a
configuration file
1# o The --archive (-a) option's behavior does not imply --recursive 2# (-r), so specify it explicitly, if you want it. 3# Use --dry-run to simulate 4RSYNC_BASE='rsync -avAX -r --delete' 5USER='root' 6HOST='192.168.0.11'
create an SSH key. Do not set a password for it
ssh-keygen -t rsa -b 16384 -C "$(whoami)@$(hostname)-$(date +%F)"
Add the following to the SSH configuration
1Match host 192.168.0.11 user root 2 IdentityFile=/root/.ssh/bak_root
go to the backup node and copy the newly created public key in
/root/.ssh/authorized_keys
edit the SSH server configuration
1# [ ... ] 2 3PermitRootLogin yes 4AllowUsers root # [ ... ] 5Match user root 6 PasswordAuthentication no 7 8# [ ... ]
restart the SSH service in the backup node
systemctl restart ssh
go back to the master node and test if the key is working
ssh root@192.168.0.11
create a
Systemd service unit file
1[Unit] 2Description=Copy files for keepalived 3Requires=network-online.target 4After=network-online.target 5 6[Service] 7Type=simple 8WorkingDirectory=/home/jobs/scripts/by-user/root/keepalived 9ExecStart=/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf 10User=root 11Group=root
create a
Systemd timer unit file
1[Unit] 2Description=Once every day copy files for keepalived 3 4[Timer] 5OnCalendar=*-*-* 5:30:00 6Persistent=true 7 8[Install] 9WantedBy=timers.target
Apache replication#
See also
How to redirect all pages to one page? [5]
in your master node, separate replicatable service from non-replicatable ones. You can do this by separating the configuration in multiple files and then including those configuration in the main file (
/etc/apache2/apache2.conf
).add these files in the master node.
The first one copies the Apache configuration
/etc/apache2/apache2.conf /etc/apache2/replicated-servers.conf
Important
you must change the virtual host directive to use the floating IP like this:
<VirtualHost 192.168.0.100:443>
The second file copies the server data. You can replicate static data (Jekyll website, HTML, etc…) but not programs that rely on databases without extra work
/var/www/franco.net.eu.org /var/www/assets.franco.net.eu.org /var/www/blog.franco.net.eu.org /var/www/docs.franco.net.eu.org /var/www/keepachangelog.franco.net.eu.org
The third file copies all the HTTPS certificates
/etc/letsencrypt
in the backup node you must “patch” the non-replicatable service. You can setup an error message for each server like this:
1<IfModule mod_ssl.c> 2<VirtualHost 192.168.0.100:443> 3 UseCanonicalName on 4 Keepalive On 5 SSLCompression off 6 ServerName software.franco.net.eu.org 7 RewriteEngine On 8 9 Include /etc/apache2/standard-servers-outage-text.conf 10 11 Include /etc/letsencrypt/options-ssl-apache.conf 12 SSLCertificateFile /etc/letsencrypt/live/software.franco.net.eu.org/fullchain.pem 13 SSLCertificateKeyFile /etc/letsencrypt/live/software.franco.net.eu.org/privkey.pem 14</VirtualHost> 15</IfModule>
1DocumentRoot "/var/www/standard-servers" 2<Directory "/var/www/standard-servers"> 3 Options -ExecCGI -Includes -FollowSymLinks -Indexes 4 AllowOverride None 5 Require all granted 6</Directory> 7 8# Redirect all requests to the root directory of the virtual server. 9RewriteEngine On 10RewriteRule \/.+ / [L,R]
Create a file in
/var/www/standard-servers/index.html
with your outage message
DNS replication#
I use dnscrypt-proxy as DNS server and Unbound as caching server. The systemd socket file is useful to set the listening port.
See also
Unbound DNS server behind a VIP - solving reply from unexpected source [6]
use these configurations to replicate the two services.
/etc/dnscrypt-proxy/dnscrypt-proxy.toml /etc/systemd/system/dnscrypt-proxy.socket
/etc/unbound/unbound.conf
Important
Add interface-automatic: yes
to the unbound configuration
in the server
section.
Final steps#
run the deploy script
Kernel#
See also
filesystem - Where does update-initramfs look for kernel versions? - Ask Ubuntu [7]
RAID#
Run periodical RAID data scrubs on hard drives and SSDs.
See also
install the dependencies
apt-get install mdadm python3-yaml python3-requests
install fpyutils. See reference
create the jobs directories. See reference
mkdir -p /home/jobs/{scripts,services}/by-user/root
create the
script
1#!/usr/bin/env python3 2# 3# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de> 4# 5# 6# This program is free software; you can redistribute it and/or modify 7# it under the terms of the GNU General Public License as published by 8# the Free Software Foundation; either version 2 of the License, or 9# (at your option) any later version. 10# 11# This program is distributed in the hope that it will be useful, 12# but WITHOUT ANY WARRANTY; without even the implied warranty of 13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14# GNU General Public License for more details. 15# 16# Author: Neil Brown 17# Email: <neilb@suse.com> 18# 19# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 20r"""Run RAID tests.""" 21 22import collections 23import multiprocessing 24import os 25import pathlib 26import sys 27import time 28 29import fpyutils 30import yaml 31 32# Constants. 33STATUS_CLEAN = 'clean' 34STATUS_ACTIVE = 'active' 35STATUS_IDLE = 'idle' 36 37 38class UserNotRoot(Exception): 39 """The user running the script is not root.""" 40 41 42class NoAvailableArrays(Exception): 43 """No available arrays.""" 44 45 46class NoSelectedArraysPresent(Exception): 47 """None of the arrays in the configuration file exists.""" 48 49 50def get_active_arrays(): 51 active_arrays = list() 52 with open('/proc/mdstat') as f: 53 line = f.readline() 54 while line: 55 if STATUS_ACTIVE in line: 56 active_arrays.append(line.split()[0]) 57 line = f.readline() 58 59 return active_arrays 60 61 62def get_array_state(array: str): 63 return open('/sys/block/' + array + '/md/array_state').read().rstrip() 64 65 66def get_sync_action(array: str): 67 return open('/sys/block/' + array + '/md/sync_action').read().rstrip() 68 69 70def run_action(array: str, action: str): 71 with open('/sys/block/' + array + '/md/sync_action', 'w') as f: 72 f.write(action) 73 74 75def main_action(array: str, config: dict): 76 action = devices[array] 77 go = True 78 while go: 79 if get_sync_action(array) == STATUS_IDLE: 80 message = 'running ' + action + ' on /dev/' + array + '. pid: ' + str( 81 os.getpid()) 82 run_action(array, action) 83 message += '\n\n' 84 message += 'finished pid: ' + str(os.getpid()) 85 print(message) 86 87 if config['notify']['gotify']['enabled']: 88 m = config['notify']['gotify']['message'] + ' ' + '\n' + message 89 fpyutils.notify.send_gotify_message( 90 config['notify']['gotify']['url'], 91 config['notify']['gotify']['token'], m, 92 config['notify']['gotify']['title'], 93 config['notify']['gotify']['priority']) 94 if config['notify']['email']['enabled']: 95 fpyutils.notify.send_email( 96 message, config['notify']['email']['smtp_server'], 97 config['notify']['email']['port'], 98 config['notify']['email']['sender'], 99 config['notify']['email']['user'], 100 config['notify']['email']['password'], 101 config['notify']['email']['receiver'], 102 config['notify']['email']['subject']) 103 104 go = False 105 if go: 106 print('waiting ' + array + ' to be idle...') 107 time.sleep(config['generic']['timeout_idle_check']) 108 109 110if __name__ == '__main__': 111 if os.getuid() != 0: 112 raise UserNotRoot 113 114 configuration_file = sys.argv[1] 115 config = yaml.load(open(configuration_file), Loader=yaml.SafeLoader) 116 devices = dict() 117 for dev_element in config['devices']: 118 key = dev_element.keys() 119 device = list(key)[0] 120 devices[device] = dev_element[device] 121 122 active_arrays = get_active_arrays() 123 dev_queue = collections.deque() 124 if len(active_arrays) > 0: 125 for dev in active_arrays: 126 if pathlib.Path('/sys/block/' + dev + '/md/sync_action').is_file(): 127 state = get_array_state(dev) 128 if state == STATUS_CLEAN or state == STATUS_ACTIVE or state == STATUS_IDLE: 129 try: 130 if devices[dev] != 'ignore' and dev in devices: 131 dev_queue.append(dev) 132 except KeyError: 133 pass 134 135 if len(active_arrays) == 0: 136 raise NoAvailableArrays 137 if len(dev_queue) == 0: 138 raise NoSelectedArraysPresent 139 140 while len(dev_queue) > 0: 141 for i in range(0, config['generic']['max_concurrent_checks']): 142 if len(dev_queue) > 0: 143 ready = dev_queue.popleft() 144 p = multiprocessing.Process(target=main_action, 145 args=( 146 ready, 147 config, 148 )) 149 p.start() 150 p.join()
create a
configuration file
1# 2# mdadm_check.yaml 3# 4# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de> 5# 6# 7# This program is free software; you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation; either version 2 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# Author: Neil Brown 18# Email: <neilb@suse.com> 19# 20# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 21 22generic: 23 # The maximum number of concurrent processes. 24 max_concurrent_checks: 2 25 26 # In seconds. 27 timeout_idle_check: 10 28 29# key: RAID array name without '/dev/'. 30# value: action. 31devices: 32 md1: 'check' 33 md2: 'ignore' 34 md3: 'check' 35 36notify: 37 email: 38 enabled: true 39 smtp_server: 'smtp.gmail.com' 40 port: 465 41 sender: 'myusername@gmail.com' 42 user: 'myusername' 43 password: 'my awesome password' 44 receiver: 'myusername@gmail.com' 45 subject: 'mdadm operation' 46 gotify: 47 enabled: true 48 url: '<gotify url>' 49 token: '<app token>' 50 title: 'mdadm operation' 51 message: 'starting mdadm operation' 52 priority: 5
Important
do not prepend
/dev
to RAID device namespossible values:
check
,repair
,idle
,ignore
ignore
will make the script skip the deviceuse
repair
at your own risk
absent devices are ignored
run these commands to get the names of RAID arrays
lsblk cat /proc/mdstat
create a
Systemd service unit file
1[Unit] 2Description=mdadm check 3Requires=sys-devices-virtual-block-md1.device 4Requires=sys-devices-virtual-block-md2.device 5Requires=sys-devices-virtual-block-md3.device 6After=sys-devices-virtual-block-md1.device 7After=sys-devices-virtual-block-md2.device 8After=sys-devices-virtual-block-md3.device 9 10[Service] 11Type=simple 12ExecStart=/home/jobs/scripts/by-user/root/mdadm_check.py /home/jobs/scripts/by-user/root/mdadm_check.yaml 13User=root 14Group=root 15 16[Install] 17WantedBy=multi-user.target
create a
Systemd timer unit file
1[Unit] 2Description=Once a month check mdadm arrays 3 4[Timer] 5OnCalendar=Monthly 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 /home/jobs/{scripts,services}/by-user/root
run the deploy script
S.M.A.R.T.#
Run periodical S.M.A.R.T. tests on hard drives and SSDs.
The provided script supports only /dev/disk/by-id
names.
See also
A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [10]
install the dependencies
apt-get install hdparm smartmontools python3-yaml python3-requests
install fpyutils. See reference
identify the drives you want to check S.M.A.R.T. values
ls /dev/disk/by-id
See also the udev rule file
/lib/udev/rules.d/60-persistent-storage.rules
. You can also use this command to have more details of specific driveshdparm -I /dev/disk/by-id/${drive_name} # or hdparm -I /dev/sd${letter}
create the jobs directories. See reference
mkdir -p /home/jobs/{scripts,services}/by-user/root chmod 700 -R /home/jobs/{scripts,services}/by-user/root
create the
script
1#!/usr/bin/env python3 2# 3# smartd_test.py 4# 5# Copyright (C) 2019-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 6# 7# This program is free software: you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation, either version 3 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# You should have received a copy of the GNU General Public License 18# along with this program. If not, see <http://www.gnu.org/licenses/>. 19r"""Run S.M.A.R.T tests on hard drives.""" 20 21import json 22import os 23import pathlib 24import re 25import shlex 26import subprocess 27import sys 28 29import fpyutils 30import yaml 31 32 33class UserNotRoot(Exception): 34 """The user running the script is not root.""" 35 36 37def get_disks() -> list: 38 r"""Scan all the disks.""" 39 disks = list() 40 for d in pathlib.Path('/dev/disk/by-id').iterdir(): 41 # Ignore disks ending with part-${integer} to avoid duplicates (names 42 # corresponding to partitions of the same disk). 43 disk = str(d) 44 if re.match('.+-part[0-9]+$', disk) is None: 45 try: 46 ddict = json.loads( 47 subprocess.run( 48 shlex.split('smartctl --capabilities --json ' + 49 shlex.quote(disk)), 50 capture_output=True, 51 check=False, 52 shell=False, 53 timeout=30).stdout) 54 try: 55 # Check for smart test support. 56 if ddict['ata_smart_data']['capabilities'][ 57 'self_tests_supported']: 58 disks.append(disk) 59 except KeyError: 60 pass 61 except subprocess.TimeoutExpired: 62 print('timeout for ' + disk) 63 except subprocess.CalledProcessError: 64 print('device ' + disk + 65 ' does not support S.M.A.R.T. commands, skipping...') 66 67 return disks 68 69 70def disk_ready(disk: str, busy_status: int = 249) -> bool: 71 r"""Check if the disk is ready.""" 72 # Raises a KeyError if disk has not S.M.A.R.T. status capabilities. 73 ddict = json.loads( 74 subprocess.run(shlex.split('smartctl --capabilities --json ' + 75 shlex.quote(disk)), 76 capture_output=True, 77 check=True, 78 shell=False, 79 timeout=30).stdout) 80 if ddict['ata_smart_data']['self_test']['status']['value'] != busy_status: 81 return True 82 else: 83 return False 84 85 86def run_test(disk: str, test_length: str = 'long') -> str: 87 r"""Run the smartd test.""" 88 return subprocess.run( 89 shlex.split('smartctl --test=' + shlex.quote(test_length) + ' ' + 90 shlex.quote(disk)), 91 capture_output=True, 92 check=True, 93 shell=False, 94 timeout=30).stdout 95 96 97if __name__ == '__main__': 98 if os.getuid() != 0: 99 raise UserNotRoot 100 101 configuration_file = shlex.quote(sys.argv[1]) 102 config = yaml.load(open(configuration_file), Loader=yaml.SafeLoader) 103 104 # Do not prepend '/dev/disk/by-id/'. 105 disks_to_check = shlex.quote(sys.argv[2]) 106 disks_available = get_disks() 107 108 for d in config['devices']: 109 dev = '/dev/disk/by-id/' + d 110 if config['devices'][d]['enabled'] and dev in disks_available: 111 if disks_to_check == 'all' or disks_to_check == d: 112 if disk_ready(dev, config['devices'][d]['busy_status']): 113 print('attempting ' + d + ' ...') 114 message = run_test( 115 dev, config['devices'][d]['test']).decode('utf-8') 116 print(message) 117 if config['devices'][d]['log']: 118 if config['notify']['gotify']['enabled']: 119 m = config['notify']['gotify'][ 120 'message'] + ' ' + d + '\n' + message 121 fpyutils.notify.send_gotify_message( 122 config['notify']['gotify']['url'], 123 config['notify']['gotify']['token'], m, 124 config['notify']['gotify']['title'], 125 config['notify']['gotify']['priority']) 126 if config['notify']['email']['enabled']: 127 fpyutils.notify.send_email( 128 message, 129 config['notify']['email']['smtp_server'], 130 config['notify']['email']['port'], 131 config['notify']['email']['sender'], 132 config['notify']['email']['user'], 133 config['notify']['email']['password'], 134 config['notify']['email']['receiver'], 135 config['notify']['email']['subject']) 136 else: 137 # Drop test requests if a disk is running a test in a particular moment. 138 # This avoid putting the disks under too much stress. 139 print('disk ' + d + ' not ready, checking the next...')
create a
configuration file
1# 2# smartd_test.yaml 3# 4# Copyright (C) 2019-2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 5# 6# This program is free software: you can redistribute it and/or modify 7# it under the terms of the GNU General Public License as published by 8# the Free Software Foundation, either version 3 of the License, or 9# (at your option) any later version. 10# 11# This program is distributed in the hope that it will be useful, 12# but WITHOUT ANY WARRANTY; without even the implied warranty of 13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14# GNU General Public License for more details. 15# 16# You should have received a copy of the GNU General Public License 17# along with this program. If not, see <http://www.gnu.org/licenses/>. 18 19devices: 20 ata-disk1: 21 enabled: true 22 test: 'long' 23 log: true 24 busy_status: 249 25 ata-disk2: 26 enabled: true 27 test: 'long' 28 log: false 29 busy_status: 249 30 ata-diskn: 31 enabled: true 32 test: 'long' 33 log: true 34 busy_status: 249 35 36notify: 37 gotify: 38 enabled: true 39 url: '<gotify url>' 40 token: '<app token>' 41 title: 'smart test' 42 message: 'starting smart test on' 43 priority: 5 44 email: 45 enabled: true 46 smtp_server: 'smtp.gmail.com' 47 port: 465 48 sender: 'myusername@gmail.com' 49 user: 'myusername' 50 password: 'my awesome password' 51 receiver: 'myusername@gmail.com' 52 subject: 'smartd test'
Important
absent devices are ignored
devices must be explicitly enabled
do not prepend
/dev/disk/by-id/
to drive namesrun a short test to get the
busy_status
value.smartctl -t short /dev/disk/by-id/${drive_name}
You should be able to capture the value while the test is running by looking at the
Self-test execution status:
line. In my case it is always249
, but this value is not hardcoded in smartmontools’ source codesmartctl --all /dev/disk/by-id/${drive_name}
use this
Systemd service unit file
1[Unit] 2Description=execute smartd on ata-disk1 3 4[Service] 5Type=simple 6ExecStart=/home/jobs/scripts/by-user/root/smartd_test.py /home/jobs/scripts/by-user/root/smartd_test.yaml ata-disk1 7User=root 8Group=root
use this
Systemd timer unit file
1[Unit] 2Description=Once every two months smart test ata-disk1 3 4[Timer] 5OnCalendar=*-01,03,05,07,09,11-01 00:00:00 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 -R /home/jobs/scripts/by-user/smartd_test.* chmod 700 -R /home/jobs/services/by-user/root
run the deploy script
Important
To avoid tests being interrupted you must avoid putting the disks to sleep, therefore, programs like hd-idle must be stopped before running the tests.
Services#
Notify unit status#
This script is useful to notfiy about failed Systemd service.
Some time ago my Gitea instance could not start after an update. If I used this script I would have known immediately about the problem instead of several days later.
See also
install the dependencies
apt-get install python3-pip python3-venv
create the jobs directories. See reference
mkdir -p /home/jobs/{scripts,services}/by-user/root
create a new virtual environment
cd /home/jobs/scripts/by-user/root python3 -m venv .venv_notify_unit_status . .venv_notify_unit_status/bin/activate
create the
requirements_notify_unit_status.txt
fileapprise PyYAML
install the dependencies
pip3 install -r requirements_notify_unit_status.txt deactivate
create the
script
1#!/usr/bin/env python3 2# 3# notify_unit_status.py 4# 5# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 6# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 7# Copyright (C) 2020-2024 Franco Masotti 8# 9# This script is licensed under a 10# Creative Commons Attribution-ShareAlike 4.0 International License. 11# 12# You should have received a copy of the license along with this 13# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 14r"""Send a notification when a Systemd unit fails.""" 15 16import shlex 17import sys 18 19import apprise 20import yaml 21 22if __name__ == '__main__': 23 configuration_file = shlex.quote(sys.argv[1]) 24 config = yaml.load(open(configuration_file), Loader=yaml.SafeLoader) 25 failed_unit = shlex.quote(sys.argv[2]) 26 27 message = 'Systemd service failure: ' + failed_unit 28 29 # Create an Apprise instance. 30 apobj = apprise.Apprise() 31 32 # Add all of the notification services by their server url. 33 for uri in config['apprise_notifiers']['dest']: 34 apobj.add(uri) 35 36 apobj.notify(body=message, title=config['apprise_notifiers']['title'])
create a
configuration file
1# 2# notify_unit_status.yaml 3# 4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 6# Copyright (C) 2020-2024 Franco Masotti 7# 8# This script is licensed under a 9# Creative Commons Attribution-ShareAlike 4.0 International License. 10# 11# You should have received a copy of the license along with this 12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 13 14apprise_notifiers: 15 dest: 16 - 'nctalks://<string>/' 17 - 'mailtos://<string>' 18 title: 'notify unit status'
use this
Systemd service unit file
1# 2# notify-unit-status@.service 3# 4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 6# Copyright (C) 2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 7# 8# This script is licensed under a 9# Creative Commons Attribution-ShareAlike 4.0 International License. 10# 11# You should have received a copy of the license along with this 12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 13 14[Unit] 15Description=Unit Status Mailer Service 16After=network-online.target 17Wants=network-online.target 18 19[Service] 20Type=simple 21WorkingDirectory=/home/jobs/scripts/by-user/root 22ExecStart=/bin/sh -c '. .notify_unit_status/bin/activate && ./notify_unit_status.py ./notify_unit_status.yaml %I; deactivate'
edit the Systemd service you want to monitor. In this example the service to be monitored is Gitea
systemctl edit gitea.service
add this content
# [ ... ] [Unit] # [ ... ] OnFailure=notify-unit-status@%n.service # [ ... ]
Updates#
Update action#
This script can be used to update software not supported by the package manager, for example Docker images.
Important
Any arbitrary command can be configured.
See also
A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [10]
install the dependencies
apt-get install python3-yaml python3-requests
install fpyutils. See reference
create the
script
1#!/usr/bin/env python3 2# 3# update_action.py 4# 5# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 6# 7# This program is free software: you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation, either version 3 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# You should have received a copy of the GNU General Public License 18# along with this program. If not, see <http://www.gnu.org/licenses/>. 19r"""update_action.py.""" 20 21import shlex 22import sys 23 24import fpyutils 25import yaml 26 27 28def send_notification(message: str, notify: dict): 29 m = notify['gotify']['message'] + '\n' + message 30 if notify['gotify']['enabled']: 31 fpyutils.notify.send_gotify_message(notify['gotify']['url'], 32 notify['gotify']['token'], m, 33 notify['gotify']['title'], 34 notify['gotify']['priority']) 35 if notify['email']['enabled']: 36 fpyutils.notify.send_email( 37 message, notify['email']['smtp_server'], notify['email']['port'], 38 notify['email']['sender'], notify['email']['user'], 39 notify['email']['password'], notify['email']['receiver'], 40 notify['email']['subject']) 41 42 43if __name__ == '__main__': 44 45 def main(): 46 configuration_file = shlex.quote(sys.argv[1]) 47 config = yaml.load(open(configuration_file), Loader=yaml.SafeLoader) 48 49 # Action types. Preserve this order. 50 types = ['pre', 'update', 'post'] 51 services = config['services'] 52 53 for service in services: 54 for type in types: 55 for cmd in services[service]['commands'][type]: 56 for name in cmd: 57 retval = fpyutils.shell.execute_command_live_output( 58 cmd[name]['command'], dry_run=False) 59 if cmd[name]['notify']['success'] and retval == cmd[ 60 name]['expected_retval']: 61 send_notification( 62 'command "' + name + '" of service "' + 63 service + '": OK', config['notify']) 64 elif cmd[name]['notify']['error'] and retval != cmd[ 65 name]['expected_retval']: 66 send_notification( 67 'command "' + name + '" of service "' + 68 service + '": ERROR', config['notify']) 69 70 main()
create a
configuration file
1# 2# update_action.mypurpose.yaml 3# 4# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 5# 6# This program is free software: you can redistribute it and/or modify 7# it under the terms of the GNU General Public License as published by 8# the Free Software Foundation, either version 3 of the License, or 9# (at your option) any later version. 10# 11# This program is distributed in the hope that it will be useful, 12# but WITHOUT ANY WARRANTY; without even the implied warranty of 13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14# GNU General Public License for more details. 15# 16# You should have received a copy of the GNU General Public License 17# along with this program. If not, see <http://www.gnu.org/licenses/>. 18 19notify: 20 email: 21 enabled: true 22 smtp_server: 'smtp.gmail.com' 23 port: 465 24 sender: 'myusername@gmail.com' 25 user: 'myusername' 26 password: 'my awesome password' 27 receiver: 'myusername@gmail.com' 28 subject: 'update action' 29 gotify: 30 enabled: true 31 url: '<gotify url>' 32 token: '<app token>' 33 title: 'update action' 34 message: 'update action' 35 priority: 5 36 37services: 38 hello: 39 commands: 40 pre: 41 - stop_service: 42 # string 43 command: 'systemctl stop docker-compose.hello.service' 44 # integer 45 expected_retval: 0 46 # boolean: {true,false} 47 notify: 48 success: true 49 error: true 50 update: 51 - pull: 52 command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose pull' 53 expected_retval: 0 54 notify: 55 success: true 56 error: true 57 - build: 58 command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose build --pull' 59 expected_retval: 0 60 notify: 61 success: true 62 error: true 63 post: 64 - start_service: 65 command: 'systemctl start docker-compose.hello.service' 66 expected_retval: 0 67 notify: 68 success: true 69 error: true 70 goodbye: 71 commands: 72 pre: 73 - stop_service: 74 command: 'systemctl stop docker-compose.goodbye.service' 75 expected_retval: 0 76 notify: 77 success: true 78 error: true 79 update: 80 - pull_only: 81 command: 'pushd /home/jobs/scripts/by-user/root/docker/goodbye && docker-compose pull' 82 expected_retval: 0 83 notify: 84 success: true 85 error: true 86 post: 87 - start_service: 88 command: 'systemctl start docker-compose.goodbye.service' 89 expected_retval: 0 90 notify: 91 success: true 92 error: true
use this
Systemd service unit file
1[Unit] 2Description=Update action mypurpose 3Wants=network-online.target 4After=network-online.target 5 6[Service] 7Type=simple 8ExecStart=/home/jobs/scripts/by-user/root/update_action.py /home/jobs/scripts/by-user/root/update_action.mypurpose.yaml 9User=root 10Group=root
use this
Systemd timer unit file
1[Unit] 2Description=Update action mypurpose monthly 3 4[Timer] 5OnCalendar=monthly 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 -R /home/jobs/scripts/by-user/update_action.* chmod 700 -R /home/jobs/services/by-user/root
run the deploy script
Footnotes