Maintenance
Contents
Maintenance#
Software#
Database#
PostgreSQL#
Important
All these instructions have been tested with PostgreSQL version 13 only!
Changing collate#
Sometimes you might create a new database without specifying the collate information. PosgreSQL will use the default collate setting and there is no way to change it once the database is created. The only solution is to dump the database, to create a new one with the correct collate and finally to import the dump and drop the original database.
See also
What is Collation in Databases? 1
backup old database
pg_dump -U myuser olddb > olddb.bak.sql
create new database with correct collate
CREATE DATABASE newdb WITH OWNER myuser TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
import dump into new database
psql -U myuser -d newdb < olddb.bak.sql
rename old database
ALTER DATABASE olddb RENAME TO olddb_bak;
rename the new database to the original database name
ALTER DATABASE newdb RENAME TO olddb;
restart services and check that everything works
drop old database
DROP DATABASE olddb_bak;
Moving data directory#
If the data directory in the root partition is getting too large you can create
a new partition (mounted on /postgresql
in this example) and let
PostgreSQL point to that one instead.
install the dependencies
apt-get install rsync
stop PostgreSQL
systemctl stop postgresql
copy the database directory
rsync -avAX /var/lib/postgresql/13/main /postgresql/13
change the
data_directory
setting in/etc/postgresql/postgresql.conf
data_directory = '/postgresql/13/main' # use data in another directory
restart PostgreSQL
systemctl start postgresql
High availability#
In this example we configure two nodes. When the master node goes offline the backup node takes over the floating IP address. We replicate services on the backup node such as Apache and Unbound.
Keepalived is a tool which handles network replication for layers 3 and 4. Here you can find the configuration file for the master node (the backup node just needs minor edits).
A script you find below helps you copy the content of webservers, DNS server, etc on the backup node as well as restarting those services automatically.
See also
Key#
Node |
IP |
Hostname |
Network interface name |
---|---|---|---|
MASTER |
192.168.0.10 |
mst |
eno1 |
BACKUP |
192.168.0.11 |
bak |
eno1 |
floating IP |
192.168.0.100 |
- |
- |
Basic setup#
install the dependencies. Keepalived must be installed on all nodes
apt-get install keepalived rsync
create the
configuration
for the master node/etc/keepalived/keepalived.conf#1global_defs { 2 max_auto_priority -1 3} 4 5######################## 6## VRRP configuration ## 7######################## 8 9# Identify the VRRP instance as, in this case, "failover_link". 10vrrp_instance failover_link { 11 12 # Initial state of the keepalived VRRP instance on this host 13 # (MASTER or BACKUP). Once started, only priority matters. 14 state MASTER 15 16 # interface this VRRP instance is bound to. 17 interface eno1 18 19 # Arbitrary value between 1 and 255 to distinguish this VRRP 20 # instance from others running on the same device. It must match 21 # with other peering devices. 22 virtual_router_id 1 23 24 # Highest priority value takes the MASTER role and the 25 # virtual IP (default value is 100). 26 priority 110 27 28 # Time, in seconds, between VRRP advertisements. The default is 1, 29 # but in some cases you can achieve more reliable results by 30 # increasing this value. 31 advert_int 2 32 33 use_vmac 34 vmac_xmit_base 35 36 # Authentication method: AH indicates ipsec Authentication Header. 37 # It offers more security than PASS, which transmits the 38 # authentication password in plaintext. Some implementations 39 # have complained of problems with AH, so it may be necessary 40 # to use PASS to get keepalived's VRRP working. 41 # 42 # The auth_pass will only use the first 8 characters entered. 43 authentication { 44 auth_type AH 45 auth_pass f5K.*0Bq 46 } 47 48 # VRRP advertisements ordinarily go out over multicast. This 49 # configuration paramter causes keepalived to send them 50 # as unicasts. This specification can be useful in environments 51 # where multicast isn't supported or in instances where you want 52 # to limit which devices see your VRRP announcements. The IP 53 # address(es) can be IPv4 or IPv6, and indicate the real IP of 54 # other members. 55 unicast_peer { 56 192.168.0.11 57 } 58 # Virtual IP address(es) that will be shared among VRRP 59 # members. "Dev" indicates the interface the virtual IP will 60 # be assigned to. And "label" allows for clearer description of the 61 # virtual IP. 62 virtual_ipaddress { 63 192.168.0.100 dev eno1 label eno1:vip 64 } 65}
Note
Copy this file in the backup node as well and change:
state MASTER
tostate BACKUP
unicast_peer { 192.168.0.11 }
tounicast_peer { 192.168.0.10 }
priority 110
topriority 100
restart keepalived on both nodes
systemctl restart keepalived
ping the floating IP address
ping -c1 192.168.0.100
test replication by stopping Keepalived on the master node only and pinging the floating IP address. Finally, restart keepalived
systemctl stop keepalived ping -c1 192.168.0.100 systemctl start keepalived
Service replication#
Make sure to be in a trusted network because we allow root login via SSH to simplify operations. In this example we copy files from
Apache
Unbound
dnscrypt-proxy
Certbot (Let’s encrypt)
The enabled_files
directory in the master node contains files with lists
of files or directories which will be copied by rsync to the backup server.
create the
script
/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh#1#!/usr/bin/env bash 2# 3# keepalived_deploy.sh 4# 5# Copyright (C) 2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 6# 7# This program is free software: you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation, either version 3 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# You should have received a copy of the GNU General Public License 18# along with this program. If not, see <http://www.gnu.org/licenses/>. 19 20set -euo pipefail 21 22. "${1}" 23 24SRC='/' 25DST='/' 26ENABLED_FILES=$(find enabled_files/* -type f) 27SYSTEMD_DEPLOY_SERVICES=$(cat systemd_deploy_services.txt) 28 29# Sync files. 30for f in ${ENABLED_FILES}; do 31 printf "%s\n" "${RSYNC_BASE} --files-from="${f}" "${SRC}" \ 32 "${USER}"@"${HOST}":"${DST}"" 33 ${RSYNC_BASE} --files-from="${f}" "${SRC}" "${USER}"@"${HOST}":"${DST}" 34done 35 36# Restart systemd services. 37ssh "${USER}"@"${HOST}" "\ 38 systemctl daemon-reload \ 39 && systemctl reenable --all "${SYSTEMD_DEPLOY_SERVICES}" \ 40 && systemctl restart --all "${SYSTEMD_DEPLOY_SERVICES}" \ 41 && systemctl status --all --no-pager "${SYSTEMD_DEPLOY_SERVICES}""
create a
configuration file
/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf#1# o The --archive (-a) option's behavior does not imply --recursive 2# (-r), so specify it explicitly, if you want it. 3# Use --dry-run to simulate 4RSYNC_BASE='rsync -avAX -r --delete' 5USER='root' 6HOST='192.168.0.11'
create an SSH key. Do not set a password for it
ssh-keygen -t rsa -b 16384 -C "$(whoami)@$(hostname)-$(date +%F)"
Add the following to the SSH configuration
/root/.ssh/config#1Match host 192.168.0.11 user root 2 IdentityFile=/root/.ssh/bak_root
go to the backup node and copy the newly created public key in
/root/.ssh/authorized_keys
edit the SSH server configuration
/etc/ssh/sshd_config#1# [ ... ] 2 3PermitRootLogin yes 4AllowUsers root # [ ... ] 5Match user root 6 PasswordAuthentication no 7 8# [ ... ]
restart the SSH service in the backup node
systemctl restart ssh
go back to the master node and test if the key is working
ssh root@192.168.0.11
create a
Systemd service unit file
/home/jobs/services/by-user/root/keepalived-deploy.service#1[Unit] 2Description=Copy files for keepalived 3Requires=network-online.target 4After=network-online.target 5 6[Service] 7Type=simple 8WorkingDirectory=/home/jobs/scripts/by-user/root/keepalived 9ExecStart=/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf 10User=root 11Group=root
create a
Systemd timer unit file
/home/jobs/services/by-user/root/keepalived-deploy.timer#1[Unit] 2Description=Once every day copy files for keepalived 3 4[Timer] 5OnCalendar=*-*-* 5:30:00 6Persistent=true 7 8[Install] 9WantedBy=timers.target
Apache replication#
See also
How to redirect all pages to one page? 5
in your master node, separate replicatable service from non-replicatable ones. You can do this by separating the configuration in multiple files and then including those configuration in the main file (
/etc/apache2/apache2.conf
).add these files in the master node.
The first one copies the Apache configuration
/home/jobs/scripts/by-user/root/keepalived/enabled_files/apache2.txt#/etc/apache2/apache2.conf /etc/apache2/replicated-servers.conf
Important
you must change the virtual host directive to use the floating IP like this:
<VirtualHost 192.168.0.100:443>
The second file copies the server data. You can replicate static data (Jekyll website, HTML, etc…) but not programs that rely on databases without extra work
/home/jobs/scripts/by-user/root/keepalived/enabled_files/replicated_webservers_data.txt#/var/www/franco.net.eu.org /var/www/assets.franco.net.eu.org /var/www/blog.franco.net.eu.org /var/www/docs.franco.net.eu.org /var/www/keepachangelog.franco.net.eu.org
The third file copies all the HTTPS certificates
/home/jobs/scripts/by-user/root/keepalived/enabled_files/letsencrypt.txt#/etc/letsencrypt
in the backup node you must “patch” the non-replicatable service. You can setup an error message for each server like this:
/etc/apache2/standard-servers.conf#1<IfModule mod_ssl.c> 2<VirtualHost 192.168.0.100:443> 3 UseCanonicalName on 4 Keepalive On 5 SSLCompression off 6 ServerName software.franco.net.eu.org 7 RewriteEngine On 8 9 Include /etc/apache2/standard-servers-outage-text.conf 10 11 Include /etc/letsencrypt/options-ssl-apache.conf 12 SSLCertificateFile /etc/letsencrypt/live/software.franco.net.eu.org/fullchain.pem 13 SSLCertificateKeyFile /etc/letsencrypt/live/software.franco.net.eu.org/privkey.pem 14</VirtualHost> 15</IfModule>
/etc/apache2/standard-servers-outage-text.conf#1DocumentRoot "/var/www/standard-servers" 2<Directory "/var/www/standard-servers"> 3 Options -ExecCGI -Includes -FollowSymLinks -Indexes 4 AllowOverride None 5 Require all granted 6</Directory> 7 8# Redirect all requests to the root directory of the virtual server. 9RewriteEngine On 10RewriteRule \/.+ / [L,R]
Create a file in
/var/www/standard-servers/index.html
with your outage message
DNS replication#
I use dnscrypt-proxy as DNS server and Unbound as caching server. The systemd socket file is useful to set the listening port.
See also
Unbound DNS server behind a VIP - solving reply from unexpected source 6
use these configurations to replicate the two services.
/home/jobs/scripts/by-user/root/keepalived/enabled_files/dnscrypt-proxy.txt#/etc/dnscrypt-proxy/dnscrypt-proxy.toml /etc/systemd/system/dnscrypt-proxy.socket
/home/jobs/scripts/by-user/root/keepalived/enabled_files/unbound.txt#/etc/unbound/unbound.conf
Important
Add interface-automatic: yes
to the unbound configuration
in the server
section.
Final steps#
run the deploy script
Kernel#
See also
filesystem - Where does update-initramfs look for kernel versions? - Ask Ubuntu 7
RAID#
Run periodical RAID data scrubs on hard drives and SSDs.
install the dependencies
apt-get install mdadm python3-yaml python3-requests
install fpyutils. See reference
create the jobs directories. See reference
mkdir -p /home/jobs/{scripts,services}/by-user/root
create the
script
/home/jobs/scripts/by-user/root/mdadm_check.py#1#!/usr/bin/env python3 2# -*- coding: utf-8 -*- 3# 4# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de> 5# 6# 7# This program is free software; you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation; either version 2 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# Author: Neil Brown 18# Email: <neilb@suse.com> 19# 20# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 21r"""Run RAID tests.""" 22 23import collections 24import multiprocessing 25import os 26import pathlib 27import sys 28import time 29 30import fpyutils 31import yaml 32 33# Constants. 34STATUS_CLEAN = 'clean' 35STATUS_ACTIVE = 'active' 36STATUS_IDLE = 'idle' 37 38 39class UserNotRoot(Exception): 40 """The user running the script is not root.""" 41 42 43class NoAvailableArrays(Exception): 44 """No available arrays.""" 45 46 47class NoSelectedArraysPresent(Exception): 48 """None of the arrays in the configuration file exists.""" 49 50 51def get_active_arrays(): 52 active_arrays = list() 53 with open('/proc/mdstat', 'r') as f: 54 line = f.readline() 55 while line: 56 if STATUS_ACTIVE in line: 57 active_arrays.append(line.split()[0]) 58 line = f.readline() 59 60 return active_arrays 61 62 63def get_array_state(array: str): 64 return open('/sys/block/' + array + '/md/array_state', 'r').read().rstrip() 65 66 67def get_sync_action(array: str): 68 return open('/sys/block/' + array + '/md/sync_action', 'r').read().rstrip() 69 70 71def run_action(array: str, action: str): 72 with open('/sys/block/' + array + '/md/sync_action', 'w') as f: 73 f.write(action) 74 75 76def main_action(array: str, config: dict): 77 action = devices[array] 78 go = True 79 while go: 80 if get_sync_action(array) == STATUS_IDLE: 81 message = 'running ' + action + ' on /dev/' + array + '. pid: ' + str( 82 os.getpid()) 83 run_action(array, action) 84 message += '\n\n' 85 message += 'finished pid: ' + str(os.getpid()) 86 print(message) 87 88 if config['notify']['gotify']['enabled']: 89 m = config['notify']['gotify']['message'] + ' ' + '\n' + message 90 fpyutils.notify.send_gotify_message( 91 config['notify']['gotify']['url'], 92 config['notify']['gotify']['token'], m, 93 config['notify']['gotify']['title'], 94 config['notify']['gotify']['priority']) 95 if config['notify']['email']['enabled']: 96 fpyutils.notify.send_email( 97 message, config['notify']['email']['smtp_server'], 98 config['notify']['email']['port'], 99 config['notify']['email']['sender'], 100 config['notify']['email']['user'], 101 config['notify']['email']['password'], 102 config['notify']['email']['receiver'], 103 config['notify']['email']['subject']) 104 105 go = False 106 if go: 107 print('waiting ' + array + ' to be idle...') 108 time.sleep(config['generic']['timeout_idle_check']) 109 110 111if __name__ == '__main__': 112 if os.getuid() != 0: 113 raise UserNotRoot 114 115 configuration_file = sys.argv[1] 116 config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader) 117 devices = dict() 118 for dev_element in config['devices']: 119 key = dev_element.keys() 120 device = list(key)[0] 121 devices[device] = dev_element[device] 122 123 active_arrays = get_active_arrays() 124 dev_queue = collections.deque() 125 if len(active_arrays) > 0: 126 for dev in active_arrays: 127 if pathlib.Path('/sys/block/' + dev + '/md/sync_action').is_file(): 128 state = get_array_state(dev) 129 if state == STATUS_CLEAN or state == STATUS_ACTIVE or state == STATUS_IDLE: 130 try: 131 if devices[dev] != 'ignore' and dev in devices: 132 dev_queue.append(dev) 133 except KeyError: 134 pass 135 136 if len(active_arrays) == 0: 137 raise NoAvailableArrays 138 if len(dev_queue) == 0: 139 raise NoSelectedArraysPresent 140 141 while len(dev_queue) > 0: 142 for i in range(0, config['generic']['max_concurrent_checks']): 143 if len(dev_queue) > 0: 144 ready = dev_queue.popleft() 145 p = multiprocessing.Process(target=main_action, 146 args=( 147 ready, 148 config, 149 )) 150 p.start() 151 p.join()
create a
configuration file
/home/jobs/scripts/by-user/root/mdadm_check.yaml#1# 2# mdadm_check.yaml 3# 4# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de> 5# 6# 7# This program is free software; you can redistribute it and/or modify 8# it under the terms of the GNU General Public License as published by 9# the Free Software Foundation; either version 2 of the License, or 10# (at your option) any later version. 11# 12# This program is distributed in the hope that it will be useful, 13# but WITHOUT ANY WARRANTY; without even the implied warranty of 14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15# GNU General Public License for more details. 16# 17# Author: Neil Brown 18# Email: <neilb@suse.com> 19# 20# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 21 22generic: 23 # The maximum number of concurrent processes. 24 max_concurrent_checks: 2 25 26 # In seconds. 27 timeout_idle_check: 10 28 29# key: RAID array name without '/dev/'. 30# value: action. 31devices: 32 md1: 'check' 33 md2: 'ignore' 34 md3: 'check' 35 36notify: 37 email: 38 enabled: true 39 smtp_server: 'smtp.gmail.com' 40 port: 465 41 sender: 'myusername@gmail.com' 42 user: 'myusername' 43 password: 'my awesome password' 44 receiver: 'myusername@gmail.com' 45 subject: 'mdadm operation' 46 gotify: 47 enabled: true 48 url: '<gotify url>' 49 token: '<app token>' 50 title: 'mdadm operation' 51 message: 'starting mdadm operation' 52 priority: 5
Important
do not prepend
/dev
to RAID device namespossible values:
check
,repair
,idle
,ignore
ignore
will make the script skip the deviceuse
repair
at your own risk
absent devices are ignored
run these commands to get the names of RAID arrays
lsblk cat /proc/mdstat
create a
Systemd service unit file
/home/jobs/services/by-user/root/mdadm-check.service#1[Unit] 2Description=mdadm check 3Requires=sys-devices-virtual-block-md1.device 4Requires=sys-devices-virtual-block-md2.device 5Requires=sys-devices-virtual-block-md3.device 6After=sys-devices-virtual-block-md1.device 7After=sys-devices-virtual-block-md2.device 8After=sys-devices-virtual-block-md3.device 9 10[Service] 11Type=simple 12ExecStart=/home/jobs/scripts/by-user/root/mdadm_check.py /home/jobs/scripts/by-user/root/mdadm_check.yaml 13User=root 14Group=root 15 16[Install] 17WantedBy=multi-user.target
create a
Systemd timer unit file
/home/jobs/services/by-user/root/mdadm-check.timer#1[Unit] 2Description=Once a month check mdadm arrays 3 4[Timer] 5OnCalendar=Monthly 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 /home/jobs/{scripts,services}/by-user/root
run the deploy script
S.M.A.R.T.#
Run periodical S.M.A.R.T. tests on hard drives and SSDs.
The provided script supports only /dev/disk/by-id
names.
See also
A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks 10
install the dependencies
apt-get install hdparm smartmontools python3-yaml python3-requests
install fpyutils. See reference
identify the drives you want to check S.M.A.R.T. values
ls /dev/disk/by-id
See also the udev rule file
/lib/udev/rules.d/60-persistent-storage.rules
. You can also use this command to have more details of specific driveshdparm -I /dev/disk/by-id/${drive_name} # or hdparm -I /dev/sd${letter}
create the jobs directories. See reference
mkdir -p /home/jobs/{scripts,services}/by-user/root chmod 700 -R /home/jobs/{scripts,services}/by-user/root
create the
script
/home/jobs/scripts/by-user/root/smartd_test.py#1#!/usr/bin/env python3 2# -*- coding: utf-8 -*- 3# 4# smartd_test.py 5# 6# Copyright (C) 2019-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 7# 8# This program is free software: you can redistribute it and/or modify 9# it under the terms of the GNU General Public License as published by 10# the Free Software Foundation, either version 3 of the License, or 11# (at your option) any later version. 12# 13# This program is distributed in the hope that it will be useful, 14# but WITHOUT ANY WARRANTY; without even the implied warranty of 15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16# GNU General Public License for more details. 17# 18# You should have received a copy of the GNU General Public License 19# along with this program. If not, see <http://www.gnu.org/licenses/>. 20r"""Run S.M.A.R.T tests on hard drives.""" 21 22import json 23import os 24import pathlib 25import re 26import shlex 27import subprocess 28import sys 29 30import fpyutils 31import yaml 32 33 34class UserNotRoot(Exception): 35 """The user running the script is not root.""" 36 37 38def get_disks() -> list: 39 r"""Scan all the disks.""" 40 disks = list() 41 for d in pathlib.Path('/dev/disk/by-id').iterdir(): 42 # Ignore disks ending with part-${integer} to avoid duplicates (names 43 # corresponding to partitions of the same disk). 44 disk = str(d) 45 if re.match('.+-part[0-9]+$', disk) is None: 46 try: 47 ddict = json.loads( 48 subprocess.run( 49 shlex.split('smartctl --capabilities --json ' + 50 shlex.quote(disk)), 51 capture_output=True, 52 check=False, 53 shell=False, 54 timeout=30).stdout) 55 try: 56 # Check for smart test support. 57 if ddict['ata_smart_data']['capabilities'][ 58 'self_tests_supported']: 59 disks.append(disk) 60 except KeyError: 61 pass 62 except subprocess.TimeoutExpired: 63 print('timeout for ' + disk) 64 except subprocess.CalledProcessError: 65 print('device ' + disk + 66 ' does not support S.M.A.R.T. commands, skipping...') 67 68 return disks 69 70 71def disk_ready(disk: str, busy_status: int = 249) -> bool: 72 r"""Check if the disk is ready.""" 73 # Raises a KeyError if disk has not S.M.A.R.T. status capabilities. 74 ddict = json.loads( 75 subprocess.run(shlex.split('smartctl --capabilities --json ' + 76 shlex.quote(disk)), 77 capture_output=True, 78 check=True, 79 shell=False, 80 timeout=30).stdout) 81 if ddict['ata_smart_data']['self_test']['status']['value'] != busy_status: 82 return True 83 else: 84 return False 85 86 87def run_test(disk: str, test_length: str = 'long') -> str: 88 r"""Run the smartd test.""" 89 return subprocess.run( 90 shlex.split('smartctl --test=' + shlex.quote(test_length) + ' ' + 91 shlex.quote(disk)), 92 capture_output=True, 93 check=True, 94 shell=False, 95 timeout=30).stdout 96 97 98if __name__ == '__main__': 99 if os.getuid() != 0: 100 raise UserNotRoot 101 102 configuration_file = shlex.quote(sys.argv[1]) 103 config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader) 104 105 # Do not prepend '/dev/disk/by-id/'. 106 disks_to_check = shlex.quote(sys.argv[2]) 107 disks_available = get_disks() 108 109 for d in config['devices']: 110 dev = '/dev/disk/by-id/' + d 111 if config['devices'][d]['enabled'] and dev in disks_available: 112 if disks_to_check == 'all' or disks_to_check == d: 113 if disk_ready(dev, config['devices'][d]['busy_status']): 114 print('attempting ' + d + ' ...') 115 message = run_test( 116 dev, config['devices'][d]['test']).decode('utf-8') 117 print(message) 118 if config['devices'][d]['log']: 119 if config['notify']['gotify']['enabled']: 120 m = config['notify']['gotify'][ 121 'message'] + ' ' + d + '\n' + message 122 fpyutils.notify.send_gotify_message( 123 config['notify']['gotify']['url'], 124 config['notify']['gotify']['token'], m, 125 config['notify']['gotify']['title'], 126 config['notify']['gotify']['priority']) 127 if config['notify']['email']['enabled']: 128 fpyutils.notify.send_email( 129 message, 130 config['notify']['email']['smtp_server'], 131 config['notify']['email']['port'], 132 config['notify']['email']['sender'], 133 config['notify']['email']['user'], 134 config['notify']['email']['password'], 135 config['notify']['email']['receiver'], 136 config['notify']['email']['subject']) 137 else: 138 # Drop test requests if a disk is running a test in a particular moment. 139 # This avoid putting the disks under too much stress. 140 print('disk ' + d + ' not ready, checking the next...')
create a
configuration file
includes/home/jobs/scripts/by-user/root/smartd_test.yaml#1# 2# smartd_test.yaml 3# 4# Copyright (C) 2019-2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 5# 6# This program is free software: you can redistribute it and/or modify 7# it under the terms of the GNU General Public License as published by 8# the Free Software Foundation, either version 3 of the License, or 9# (at your option) any later version. 10# 11# This program is distributed in the hope that it will be useful, 12# but WITHOUT ANY WARRANTY; without even the implied warranty of 13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14# GNU General Public License for more details. 15# 16# You should have received a copy of the GNU General Public License 17# along with this program. If not, see <http://www.gnu.org/licenses/>. 18 19devices: 20 ata-disk1: 21 enabled: true 22 test: 'long' 23 log: true 24 busy_status: 249 25 ata-disk2: 26 enabled: true 27 test: 'long' 28 log: false 29 busy_status: 249 30 ata-diskn: 31 enabled: true 32 test: 'long' 33 log: true 34 busy_status: 249 35 36notify: 37 gotify: 38 enabled: true 39 url: '<gotify url>' 40 token: '<app token>' 41 title: 'smart test' 42 message: 'starting smart test on' 43 priority: 5 44 email: 45 enabled: true 46 smtp_server: 'smtp.gmail.com' 47 port: 465 48 sender: 'myusername@gmail.com' 49 user: 'myusername' 50 password: 'my awesome password' 51 receiver: 'myusername@gmail.com' 52 subject: 'smartd test'
Important
absent devices are ignored
devices must be explicitly enabled
do not prepend
/dev/disk/by-id/
to drive namesrun a short test to get the
busy_status
value.smartctl -t short /dev/disk/by-id/${drive_name}
You should be able to capture the value while the test is running by looking at the
Self-test execution status:
line. In my case it is always249
, but this value is not hardcoded in smartmontools’ source codesmartctl --all /dev/disk/by-id/${drive_name}
use this
Systemd service unit file
/home/jobs/services/by-user/root/smartd-test.ata_disk1.service#1[Unit] 2Description=execute smartd on ata-disk1 3 4[Service] 5Type=simple 6ExecStart=/home/jobs/scripts/by-user/root/smartd_test.py /home/jobs/scripts/by-user/root/smartd_test.yaml ata-disk1 7User=root 8Group=root
use this
Systemd timer unit file
/home/jobs/services/by-user/root/smartd-test.ata_disk1.timer#1[Unit] 2Description=Once every two months smart test ata-disk1 3 4[Timer] 5OnCalendar=*-01,03,05,07,09,11-01 00:00:00 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 -R /home/jobs/scripts/by-user/smartd_test.* chmod 700 -R /home/jobs/services/by-user/root
run the deploy script
Important
To avoid tests being interrupted you must avoid putting the disks to sleep, therefore, programs like hd-idle must be stopped before running the tests.
Services#
Notify unit status#
This script is useful to notfiy about failed Systemd service.
Some time ago my Gitea instance could not start after an update. If I used this script I would have known immediately about the problem instead of several days later.
See also
linux - get notification when systemd-monitored service enters failed state - Server Fault 11
install fpyutils. See reference
create the
script
/home/jobs/scripts/by-user/root/notify_unit_status.py#1#!/usr/bin/env python3 2# -*- coding: utf-8 -*- 3# 4# notify_unit_status.py 5# 6# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 7# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 8# Copyright (C) 2020-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 9# 10# This script is licensed under a 11# Creative Commons Attribution-ShareAlike 4.0 International License. 12# 13# You should have received a copy of the license along with this 14# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 15r"""Send a notification when a Systemd unit fails.""" 16 17import shlex 18import sys 19 20import fpyutils 21import yaml 22 23if __name__ == '__main__': 24 configuration_file = shlex.quote(sys.argv[1]) 25 config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader) 26 27 failed_unit = shlex.quote(sys.argv[2]) 28 29 message = 'service failure: ' + failed_unit 30 if config['notify']['gotify']['enabled']: 31 m = config['notify']['gotify']['message'] + '\n' + message 32 fpyutils.notify.send_gotify_message( 33 config['notify']['gotify']['url'], 34 config['notify']['gotify']['token'], m, 35 config['notify']['gotify']['title'], 36 config['notify']['gotify']['priority']) 37 if config['notify']['email']['enabled']: 38 fpyutils.notify.send_email(message, 39 config['notify']['email']['smtp server'], 40 config['notify']['email']['port'], 41 config['notify']['email']['sender'], 42 config['notify']['email']['user'], 43 config['notify']['email']['password'], 44 config['notify']['email']['receiver'], 45 config['notify']['email']['subject'])
create a
configuration file
includes/home/jobs/scripts/by-user/root/notify_unit_status.yaml#1# 2# notify_unit_status.yaml 3# 4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 6# Copyright (C) 2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 7# 8# This script is licensed under a 9# Creative Commons Attribution-ShareAlike 4.0 International License. 10# 11# You should have received a copy of the license along with this 12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 13 14notify: 15 gotify: 16 enabled: true 17 url: '<gotify url>' 18 token: '<app token>' 19 title: 'service failure' 20 message: 'service failure' 21 priority: 5 22 email: 23 enabled: true 24 smtp server: 'smtp.gmail.com' 25 port: 465 26 sender: 'myusername@gmail.com' 27 user: 'myusername' 28 password: 'my awesome password' 29 receiver: 'myusername@gmail.com' 30 subject: 'service failure'
use this
Systemd service unit file
includes/home/jobs/services/by-user/root/notify-unit-status@.service#1# 2# notify-unit-status@.service 3# 4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100) 5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100) 6# Copyright (C) 2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 7# 8# This script is licensed under a 9# Creative Commons Attribution-ShareAlike 4.0 International License. 10# 11# You should have received a copy of the license along with this 12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>. 13 14[Unit] 15Description=Unit Status Mailer Service 16After=network-online.target 17Wants=network-online.target 18 19[Service] 20Type=simple 21ExecStart=/home/jobs/scripts/by-user/root/notify_unit_status.py /home/jobs/scripts/by-user/root/notify_unit_status.yaml %I
edit the Systemd service you want to monitor. In this example the service to be monitored is Gitea
systemctl edit gitea.service
add this content
# [ ... ] [Unit] # [ ... ] OnFailure=notify-unit-status@%n.service # [ ... ]
Updates#
Update action#
This script can be used to update software not supported by the package manager, for example Docker images.
Important
Any arbitrary command can be configured.
See also
A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks 10
install the dependencies
apt-get install python3-yaml python3-requests
install fpyutils. See reference
create the
script
/home/jobs/scripts/by-user/root/update_action.py#1#!/usr/bin/env python3 2# -*- coding: utf-8 -*- 3# 4# update_action.py 5# 6# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 7# 8# This program is free software: you can redistribute it and/or modify 9# it under the terms of the GNU General Public License as published by 10# the Free Software Foundation, either version 3 of the License, or 11# (at your option) any later version. 12# 13# This program is distributed in the hope that it will be useful, 14# but WITHOUT ANY WARRANTY; without even the implied warranty of 15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16# GNU General Public License for more details. 17# 18# You should have received a copy of the GNU General Public License 19# along with this program. If not, see <http://www.gnu.org/licenses/>. 20r"""update_action.py.""" 21 22import shlex 23import sys 24 25import fpyutils 26import yaml 27 28 29def send_notification(message: str, notify: dict): 30 m = notify['gotify']['message'] + '\n' + message 31 if notify['gotify']['enabled']: 32 fpyutils.notify.send_gotify_message(notify['gotify']['url'], 33 notify['gotify']['token'], m, 34 notify['gotify']['title'], 35 notify['gotify']['priority']) 36 if notify['email']['enabled']: 37 fpyutils.notify.send_email( 38 message, notify['email']['smtp_server'], notify['email']['port'], 39 notify['email']['sender'], notify['email']['user'], 40 notify['email']['password'], notify['email']['receiver'], 41 notify['email']['subject']) 42 43 44if __name__ == '__main__': 45 46 def main(): 47 configuration_file = shlex.quote(sys.argv[1]) 48 config = yaml.load(open(configuration_file, 'r'), 49 Loader=yaml.SafeLoader) 50 51 # Action types. Preserve this order. 52 types = ['pre', 'update', 'post'] 53 services = config['services'] 54 55 for service in services: 56 for type in types: 57 for cmd in services[service]['commands'][type]: 58 for name in cmd: 59 retval = fpyutils.shell.execute_command_live_output( 60 cmd[name]['command'], dry_run=False) 61 if cmd[name]['notify']['success'] and retval == cmd[ 62 name]['expected_retval']: 63 send_notification( 64 'command "' + name + '" of service "' + 65 service + '": OK', config['notify']) 66 elif cmd[name]['notify']['error'] and retval != cmd[ 67 name]['expected_retval']: 68 send_notification( 69 'command "' + name + '" of service "' + 70 service + '": ERROR', config['notify']) 71 72 main()
create a
configuration file
includes/home/jobs/scripts/by-user/root/update_action.mypurpose.yaml#1# 2# update_action.mypurpose.yaml 3# 4# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com) 5# 6# This program is free software: you can redistribute it and/or modify 7# it under the terms of the GNU General Public License as published by 8# the Free Software Foundation, either version 3 of the License, or 9# (at your option) any later version. 10# 11# This program is distributed in the hope that it will be useful, 12# but WITHOUT ANY WARRANTY; without even the implied warranty of 13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14# GNU General Public License for more details. 15# 16# You should have received a copy of the GNU General Public License 17# along with this program. If not, see <http://www.gnu.org/licenses/>. 18 19notify: 20 email: 21 enabled: true 22 smtp_server: 'smtp.gmail.com' 23 port: 465 24 sender: 'myusername@gmail.com' 25 user: 'myusername' 26 password: 'my awesome password' 27 receiver: 'myusername@gmail.com' 28 subject: 'update action' 29 gotify: 30 enabled: true 31 url: '<gotify url>' 32 token: '<app token>' 33 title: 'update action' 34 message: 'update action' 35 priority: 5 36 37services: 38 hello: 39 commands: 40 pre: 41 - stop_service: 42 # string 43 command: 'systemctl stop docker-compose.hello.service' 44 # integer 45 expected_retval: 0 46 # boolean: {true,false} 47 notify: 48 success: true 49 error: true 50 update: 51 - pull: 52 command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose pull' 53 expected_retval: 0 54 notify: 55 success: true 56 error: true 57 - build: 58 command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose build --pull' 59 expected_retval: 0 60 notify: 61 success: true 62 error: true 63 post: 64 - start_service: 65 command: 'systemctl start docker-compose.hello.service' 66 expected_retval: 0 67 notify: 68 success: true 69 error: true 70 goodbye: 71 commands: 72 pre: 73 - stop_service: 74 command: 'systemctl stop docker-compose.goodbye.service' 75 expected_retval: 0 76 notify: 77 success: true 78 error: true 79 update: 80 - pull_only: 81 command: 'pushd /home/jobs/scripts/by-user/root/docker/goodbye && docker-compose pull' 82 expected_retval: 0 83 notify: 84 success: true 85 error: true 86 post: 87 - start_service: 88 command: 'systemctl start docker-compose.goodbye.service' 89 expected_retval: 0 90 notify: 91 success: true 92 error: true
use this
Systemd service unit file
/home/jobs/services/by-user/root/update-action.mypurpose.service#1[Unit] 2Description=Update action mypurpose 3Wants=network-online.target 4After=network-online.target 5 6[Service] 7Type=simple 8ExecStart=/home/jobs/scripts/by-user/root/update_action.py /home/jobs/scripts/by-user/root/update_action.mypurpose.yaml 9User=root 10Group=root
use this
Systemd timer unit file
/home/jobs/services/by-user/root/update-action.mypurpose.timer#1[Unit] 2Description=Update action mypurpose monthly 3 4[Timer] 5OnCalendar=monthly 6Persistent=true 7 8[Install] 9WantedBy=timers.target
fix the permissions
chmod 700 -R /home/jobs/scripts/by-user/update_action.* chmod 700 -R /home/jobs/services/by-user/root
run the deploy script
Footnotes
- 1
https://database.guide/what-is-collation-in-databases/ unknown license
- 2
https://www.keepalived.org/index.html unknown license
- 3
https://github.com/osixia/docker-keepalived/issues/34 unknown license
- 4
https://docs.syseleven.de/syseleven-stack/en/howtos/l3-high-availability unknown license
- 5
https://forums.digitalpoint.com/threads/how-to-redirect-all-pages-to-one-page.25353/ unknown license
- 6
https://www.claudiokuenzler.com/blog/695/unbound-behind-a-virtual-ip-vip-reply-from-unexpected-source unknown license
- 7
https://askubuntu.com/questions/759802/where-does-update-initramfs-look-for-kernel-versions CC BY-SA 3.0, copyright (c) 2016-2017, askubuntu contributors
- 8
https://unix.stackexchange.com/questions/411206/how-to-wipe-md-raid-meta CC BY-SA 3.0, copyright (c) 2017, stackexchange contributors
- 9
https://blog.franco.net.eu.org/notes/raid-data-scrubbing.html CC-BY-SA 4.0, copyright (c) 2019-2021, Franco Masotti
- 10(1,2)
https://software.franco.net.eu.org/frnmst/automated-tasks GNU GPLv3+, copyright (c) 2019-2022, Franco Masotti
- 11
https://serverfault.com/questions/694818/get-notification-when-systemd-monitored-service-enters-failed-state CC BY-SA 3.0, copyright (c) 2015-2018, serverfault contributors