Maintenance#

Software#

Database#

PostgreSQL#

Importante

All these instructions have been tested with PostgreSQL version 13 only!

Changing collate#

Sometimes you might create a new database without specifying the collate information. PosgreSQL will use the default collate setting and there is no way to change it once the database is created. The only solution is to dump the database, to create a new one with the correct collate and finally to import the dump and drop the original database.

Vedi anche

  • What is Collation in Databases? [1]

  1. backup old database

    pg_dump -U myuser olddb > olddb.bak.sql
    
  2. create new database with correct collate

    CREATE DATABASE newdb WITH OWNER myuser TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8';
    
  3. import dump into new database

    psql -U myuser -d newdb < olddb.bak.sql
    
  4. rename old database

    ALTER DATABASE olddb RENAME TO olddb_bak;
    
  5. rename the new database to the original database name

    ALTER DATABASE newdb RENAME TO olddb;
    
  6. restart services and check that everything works

  7. drop old database

    DROP DATABASE olddb_bak;
    
Moving data directory#

If the data directory in the root partition is getting too large you can create a new partition (mounted on /postgresql in this example) and let PostgreSQL point to that one instead.

  1. install the dependencies

    apt-get install rsync
    
  2. stop PostgreSQL

    systemctl stop postgresql
    
  3. copy the database directory

    rsync -avAX /var/lib/postgresql/13/main /postgresql/13
    
  4. change the data_directory setting in /etc/postgresql/postgresql.conf

    data_directory = '/postgresql/13/main'      # use data in another directory
    
  5. restart PostgreSQL

    systemctl start postgresql
    

High availability#

In this example we configure two nodes. When the master node goes offline the backup node takes over the floating IP address. We replicate services on the backup node such as Apache and Unbound.

Keepalived is a tool which handles network replication for layers 3 and 4. Here you can find the configuration file for the master node (the backup node just needs minor edits).

A script you find below helps you copy the content of webservers, DNS server, etc on the backup node as well as restarting those services automatically.

Vedi anche

  • Keepalived for Linux [2]

  • Is it possible to add a static mac address for a vrrp ip? · Issue #34 · osixia/docker-keepalived · GitHub [3]

  • Building Layer 3 High Availability | Documentation [4]

Key#

Node

IP

Hostname

Network interface name

MASTER

192.168.0.10

mst

eno1

BACKUP

192.168.0.11

bak

eno1

floating IP

192.168.0.100

-

-

Basic setup#

  1. install the dependencies. Keepalived must be installed on all nodes

    apt-get install keepalived rsync
    
  2. create the configuration for the master node

    /etc/keepalived/keepalived.conf#
     1global_defs {
     2    max_auto_priority -1
     3}
     4
     5########################
     6## VRRP configuration ##
     7########################
     8
     9# Identify the VRRP instance as, in this case, "failover_link".
    10vrrp_instance failover_link {
    11
    12    # Initial state of the keepalived VRRP instance on this host
    13    # (MASTER or BACKUP). Once started, only priority matters.
    14    state MASTER
    15
    16    # interface this VRRP instance is bound to.
    17    interface eno1
    18
    19    # Arbitrary value between 1 and 255 to distinguish this VRRP
    20    # instance from others running on the same device. It must match
    21    # with other peering devices.
    22    virtual_router_id 1
    23
    24    # Highest priority value takes the MASTER role and the
    25    # virtual IP (default value is 100).
    26    priority 110
    27
    28    # Time, in seconds, between VRRP advertisements. The default is 1,
    29    # but in some cases you can achieve more reliable results by
    30    # increasing this value.
    31    advert_int 2
    32
    33    use_vmac
    34    vmac_xmit_base
    35
    36    # Authentication method: AH indicates ipsec Authentication Header.
    37    # It offers more security than PASS, which transmits the
    38    # authentication password in plaintext. Some implementations
    39    # have complained of problems with AH, so it may be necessary
    40    # to use PASS to get keepalived's VRRP working.
    41    #
    42    # The auth_pass will only use the first 8 characters entered.
    43    authentication {
    44        auth_type AH
    45        auth_pass f5K.*0Bq
    46    }
    47
    48    # VRRP advertisements ordinarily go out over multicast. This
    49    # configuration paramter causes keepalived to send them
    50    # as unicasts. This specification can be useful in environments
    51    # where multicast isn't supported or in instances where you want
    52    # to limit which devices see your VRRP announcements. The IP
    53    # address(es) can be IPv4 or IPv6, and indicate the real IP of
    54    # other members.
    55    unicast_peer {
    56        192.168.0.11
    57    }
    58    # Virtual IP address(es) that will be shared among VRRP
    59    # members. "Dev" indicates the interface the virtual IP will
    60    # be assigned to. And "label" allows for clearer description of the
    61    # virtual IP.
    62    virtual_ipaddress {
    63        192.168.0.100 dev eno1 label eno1:vip
    64    }
    65}
    

    Nota

    Copy this file in the backup node as well and change:

    • state MASTER to state BACKUP

    • unicast_peer { 192.168.0.11 } to unicast_peer { 192.168.0.10 }

    • priority 110 to priority 100

  3. restart keepalived on both nodes

    systemctl restart keepalived
    
  4. ping the floating IP address

    ping -c1 192.168.0.100
    
  5. test replication by stopping Keepalived on the master node only and pinging the floating IP address. Finally, restart keepalived

    systemctl stop keepalived
    ping -c1 192.168.0.100
    systemctl start keepalived
    

Service replication#

Make sure to be in a trusted network because we allow root login via SSH to simplify operations. In this example we copy files from

  • Apache

  • Unbound

  • dnscrypt-proxy

  • Certbot (Let’s encrypt)

The enabled_files directory in the master node contains files with lists of files or directories which will be copied by rsync to the backup server.

  1. create the script

    /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh#
     1#!/usr/bin/env bash
     2#
     3# keepalived_deploy.sh
     4#
     5# Copyright (C) 2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     6#
     7# This program is free software: you can redistribute it and/or modify
     8# it under the terms of the GNU General Public License as published by
     9# the Free Software Foundation, either version 3 of the License, or
    10# (at your option) any later version.
    11#
    12# This program is distributed in the hope that it will be useful,
    13# but WITHOUT ANY WARRANTY; without even the implied warranty of
    14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    15# GNU General Public License for more details.
    16#
    17# You should have received a copy of the GNU General Public License
    18# along with this program.  If not, see <http://www.gnu.org/licenses/>.
    19
    20set -euo pipefail
    21
    22. "${1}"
    23
    24SRC='/'
    25DST='/'
    26ENABLED_FILES=$(find enabled_files/* -type f)
    27SYSTEMD_DEPLOY_SERVICES=$(cat systemd_deploy_services.txt)
    28
    29# Sync files.
    30for f in ${ENABLED_FILES}; do
    31    printf "%s\n" "${RSYNC_BASE} --files-from="${f}" "${SRC}" \
    32        "${USER}"@"${HOST}":"${DST}""
    33    ${RSYNC_BASE} --files-from="${f}" "${SRC}" "${USER}"@"${HOST}":"${DST}"
    34done
    35
    36# Restart systemd services.
    37ssh "${USER}"@"${HOST}" "\
    38    systemctl daemon-reload \
    39    && systemctl reenable --all "${SYSTEMD_DEPLOY_SERVICES}" \
    40    && systemctl restart --all "${SYSTEMD_DEPLOY_SERVICES}" \
    41    && systemctl status --all --no-pager "${SYSTEMD_DEPLOY_SERVICES}""
    
  2. create a configuration file

    /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf#
    1# o      The  --archive  (-a) option's behavior does not imply --recursive
    2#        (-r), so specify it explicitly, if you want it.
    3# Use --dry-run to simulate
    4RSYNC_BASE='rsync -avAX -r --delete'
    5USER='root'
    6HOST='192.168.0.11'
    
  3. create an SSH key. Do not set a password for it

    ssh-keygen -t rsa -b 16384 -C "$(whoami)@$(hostname)-$(date +%F)"
    
  4. Add the following to the SSH configuration

    /root/.ssh/config#
    1Match host 192.168.0.11 user root
    2  IdentityFile=/root/.ssh/bak_root
    
  5. go to the backup node and copy the newly created public key in /root/.ssh/authorized_keys

  6. edit the SSH server configuration

    /etc/ssh/sshd_config#
    1# [ ... ]
    2
    3PermitRootLogin yes
    4AllowUsers root    # [ ... ]
    5Match user root
    6    PasswordAuthentication no
    7
    8# [ ... ]
    
  7. restart the SSH service in the backup node

    systemctl restart ssh
    
  8. go back to the master node and test if the key is working

    ssh root@192.168.0.11
    
  9. create a Systemd service unit file

    /home/jobs/services/by-user/root/keepalived-deploy.service#
     1[Unit]
     2Description=Copy files for keepalived
     3Requires=network-online.target
     4After=network-online.target
     5
     6[Service]
     7Type=simple
     8WorkingDirectory=/home/jobs/scripts/by-user/root/keepalived
     9ExecStart=/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf
    10User=root
    11Group=root
    
  10. create a Systemd timer unit file

    /home/jobs/services/by-user/root/keepalived-deploy.timer#
    1[Unit]
    2Description=Once every day copy files for keepalived
    3
    4[Timer]
    5OnCalendar=*-*-* 5:30:00
    6Persistent=true
    7
    8[Install]
    9WantedBy=timers.target
    

Apache replication#

Vedi anche

  • How to redirect all pages to one page? [5]

  1. in your master node, separate replicatable service from non-replicatable ones. You can do this by separating the configuration in multiple files and then including those configuration in the main file (/etc/apache2/apache2.conf).

  2. add these files in the master node.

    The first one copies the Apache configuration

    /home/jobs/scripts/by-user/root/keepalived/enabled_files/apache2.txt#
    /etc/apache2/apache2.conf
    /etc/apache2/replicated-servers.conf
    

    Importante

    you must change the virtual host directive to use the floating IP like this: <VirtualHost 192.168.0.100:443>

    The second file copies the server data. You can replicate static data (Jekyll website, HTML, etc…) but not programs that rely on databases without extra work

    /home/jobs/scripts/by-user/root/keepalived/enabled_files/replicated_webservers_data.txt#
    /var/www/franco.net.eu.org
    /var/www/assets.franco.net.eu.org
    /var/www/blog.franco.net.eu.org
    /var/www/docs.franco.net.eu.org
    /var/www/keepachangelog.franco.net.eu.org
    

    The third file copies all the HTTPS certificates

    /home/jobs/scripts/by-user/root/keepalived/enabled_files/letsencrypt.txt#
    /etc/letsencrypt
    
  3. in the backup node you must «patch» the non-replicatable service. You can setup an error message for each server like this:

    /etc/apache2/standard-servers.conf#
     1<IfModule mod_ssl.c>
     2<VirtualHost 192.168.0.100:443>
     3    UseCanonicalName on
     4    Keepalive On
     5    SSLCompression      off
     6    ServerName software.franco.net.eu.org
     7    RewriteEngine On
     8
     9    Include /etc/apache2/standard-servers-outage-text.conf
    10
    11    Include /etc/letsencrypt/options-ssl-apache.conf
    12    SSLCertificateFile /etc/letsencrypt/live/software.franco.net.eu.org/fullchain.pem
    13    SSLCertificateKeyFile /etc/letsencrypt/live/software.franco.net.eu.org/privkey.pem
    14</VirtualHost>
    15</IfModule>
    
    /etc/apache2/standard-servers-outage-text.conf#
     1DocumentRoot "/var/www/standard-servers"
     2<Directory "/var/www/standard-servers">
     3    Options -ExecCGI -Includes -FollowSymLinks -Indexes
     4    AllowOverride None
     5    Require all granted
     6</Directory>
     7
     8# Redirect all requests to the root directory of the virtual server.
     9RewriteEngine On
    10RewriteRule \/.+ / [L,R]
    

    Create a file in /var/www/standard-servers/index.html with your outage message

DNS replication#

I use dnscrypt-proxy as DNS server and Unbound as caching server. The systemd socket file is useful to set the listening port.

Vedi anche

  • Unbound DNS server behind a VIP - solving reply from unexpected source [6]

  1. use these configurations to replicate the two services.

    /home/jobs/scripts/by-user/root/keepalived/enabled_files/dnscrypt-proxy.txt#
    /etc/dnscrypt-proxy/dnscrypt-proxy.toml
    /etc/systemd/system/dnscrypt-proxy.socket
    
    /home/jobs/scripts/by-user/root/keepalived/enabled_files/unbound.txt#
    /etc/unbound/unbound.conf
    

Importante

Add interface-automatic: yes to the unbound configuration in the server section.

Final steps#

  1. run the deploy script

Kernel#

Vedi anche

  • filesystem - Where does update-initramfs look for kernel versions? - Ask Ubuntu [7]

RAID#

Run periodical RAID data scrubs on hard drives and SSDs.

Vedi anche

  • ubuntu - How to wipe md raid meta? - Unix & Linux Stack Exchange [8]

  • RAID data scrubbing [9]

  1. install the dependencies

    apt-get install mdadm python3-yaml python3-requests
    
  2. install fpyutils. See reference

  3. create the jobs directories. See reference

    mkdir -p /home/jobs/{scripts,services}/by-user/root
    
  4. create the script

    /home/jobs/scripts/by-user/root/mdadm_check.py#
      1#!/usr/bin/env python3
      2# -*- coding: utf-8 -*-
      3#
      4# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de>
      5#
      6#
      7#    This program is free software; you can redistribute it and/or modify
      8#    it under the terms of the GNU General Public License as published by
      9#    the Free Software Foundation; either version 2 of the License, or
     10#    (at your option) any later version.
     11#
     12#    This program is distributed in the hope that it will be useful,
     13#    but WITHOUT ANY WARRANTY; without even the implied warranty of
     14#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     15#    GNU General Public License for more details.
     16#
     17#    Author: Neil Brown
     18#    Email: <neilb@suse.com>
     19#
     20# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     21r"""Run RAID tests."""
     22
     23import collections
     24import multiprocessing
     25import os
     26import pathlib
     27import sys
     28import time
     29
     30import fpyutils
     31import yaml
     32
     33# Constants.
     34STATUS_CLEAN = 'clean'
     35STATUS_ACTIVE = 'active'
     36STATUS_IDLE = 'idle'
     37
     38
     39class UserNotRoot(Exception):
     40    """The user running the script is not root."""
     41
     42
     43class NoAvailableArrays(Exception):
     44    """No available arrays."""
     45
     46
     47class NoSelectedArraysPresent(Exception):
     48    """None of the arrays in the configuration file exists."""
     49
     50
     51def get_active_arrays():
     52    active_arrays = list()
     53    with open('/proc/mdstat', 'r') as f:
     54        line = f.readline()
     55        while line:
     56            if STATUS_ACTIVE in line:
     57                active_arrays.append(line.split()[0])
     58            line = f.readline()
     59
     60    return active_arrays
     61
     62
     63def get_array_state(array: str):
     64    return open('/sys/block/' + array + '/md/array_state', 'r').read().rstrip()
     65
     66
     67def get_sync_action(array: str):
     68    return open('/sys/block/' + array + '/md/sync_action', 'r').read().rstrip()
     69
     70
     71def run_action(array: str, action: str):
     72    with open('/sys/block/' + array + '/md/sync_action', 'w') as f:
     73        f.write(action)
     74
     75
     76def main_action(array: str, config: dict):
     77    action = devices[array]
     78    go = True
     79    while go:
     80        if get_sync_action(array) == STATUS_IDLE:
     81            message = 'running ' + action + ' on /dev/' + array + '. pid: ' + str(
     82                os.getpid())
     83            run_action(array, action)
     84            message += '\n\n'
     85            message += 'finished pid: ' + str(os.getpid())
     86            print(message)
     87
     88            if config['notify']['gotify']['enabled']:
     89                m = config['notify']['gotify']['message'] + ' ' + '\n' + message
     90                fpyutils.notify.send_gotify_message(
     91                    config['notify']['gotify']['url'],
     92                    config['notify']['gotify']['token'], m,
     93                    config['notify']['gotify']['title'],
     94                    config['notify']['gotify']['priority'])
     95            if config['notify']['email']['enabled']:
     96                fpyutils.notify.send_email(
     97                    message, config['notify']['email']['smtp_server'],
     98                    config['notify']['email']['port'],
     99                    config['notify']['email']['sender'],
    100                    config['notify']['email']['user'],
    101                    config['notify']['email']['password'],
    102                    config['notify']['email']['receiver'],
    103                    config['notify']['email']['subject'])
    104
    105            go = False
    106        if go:
    107            print('waiting ' + array + ' to be idle...')
    108            time.sleep(config['generic']['timeout_idle_check'])
    109
    110
    111if __name__ == '__main__':
    112    if os.getuid() != 0:
    113        raise UserNotRoot
    114
    115    configuration_file = sys.argv[1]
    116    config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader)
    117    devices = dict()
    118    for dev_element in config['devices']:
    119        key = dev_element.keys()
    120        device = list(key)[0]
    121        devices[device] = dev_element[device]
    122
    123    active_arrays = get_active_arrays()
    124    dev_queue = collections.deque()
    125    if len(active_arrays) > 0:
    126        for dev in active_arrays:
    127            if pathlib.Path('/sys/block/' + dev + '/md/sync_action').is_file():
    128                state = get_array_state(dev)
    129                if state == STATUS_CLEAN or state == STATUS_ACTIVE or state == STATUS_IDLE:
    130                    try:
    131                        if devices[dev] != 'ignore' and dev in devices:
    132                            dev_queue.append(dev)
    133                    except KeyError:
    134                        pass
    135
    136    if len(active_arrays) == 0:
    137        raise NoAvailableArrays
    138    if len(dev_queue) == 0:
    139        raise NoSelectedArraysPresent
    140
    141    while len(dev_queue) > 0:
    142        for i in range(0, config['generic']['max_concurrent_checks']):
    143            if len(dev_queue) > 0:
    144                ready = dev_queue.popleft()
    145                p = multiprocessing.Process(target=main_action,
    146                                            args=(
    147                                                ready,
    148                                                config,
    149                                            ))
    150                p.start()
    151        p.join()
    
  5. create a configuration file

    /home/jobs/scripts/by-user/root/mdadm_check.yaml#
     1#
     2# mdadm_check.yaml
     3#
     4# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de>
     5#
     6#
     7#    This program is free software; you can redistribute it and/or modify
     8#    it under the terms of the GNU General Public License as published by
     9#    the Free Software Foundation; either version 2 of the License, or
    10#    (at your option) any later version.
    11#
    12#    This program is distributed in the hope that it will be useful,
    13#    but WITHOUT ANY WARRANTY; without even the implied warranty of
    14#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    15#    GNU General Public License for more details.
    16#
    17#    Author: Neil Brown
    18#    Email: <neilb@suse.com>
    19#
    20# Copyright (C) 2019-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
    21
    22generic:
    23    # The maximum number of concurrent processes.
    24    max_concurrent_checks: 2
    25
    26    # In seconds.
    27    timeout_idle_check: 10
    28
    29# key:      RAID array name without '/dev/'.
    30# value:    action.
    31devices:
    32    md1: 'check'
    33    md2: 'ignore'
    34    md3: 'check'
    35
    36notify:
    37    email:
    38        enabled: true
    39        smtp_server: 'smtp.gmail.com'
    40        port: 465
    41        sender: 'myusername@gmail.com'
    42        user: 'myusername'
    43        password: 'my awesome password'
    44        receiver: 'myusername@gmail.com'
    45        subject: 'mdadm operation'
    46    gotify:
    47        enabled: true
    48        url: '<gotify url>'
    49        token: '<app token>'
    50        title: 'mdadm operation'
    51        message: 'starting mdadm operation'
    52        priority: 5
    

    Importante

    • do not prepend /dev to RAID device names

    • possible values: check, repair, idle, ignore

      • ignore will make the script skip the device

      • use repair at your own risk

    • absent devices are ignored

    • run these commands to get the names of RAID arrays

      lsblk
      cat /proc/mdstat
      
  6. create a Systemd service unit file

    /home/jobs/services/by-user/root/mdadm-check.service#
     1[Unit]
     2Description=mdadm check
     3Requires=sys-devices-virtual-block-md1.device
     4Requires=sys-devices-virtual-block-md2.device
     5Requires=sys-devices-virtual-block-md3.device
     6After=sys-devices-virtual-block-md1.device
     7After=sys-devices-virtual-block-md2.device
     8After=sys-devices-virtual-block-md3.device
     9
    10[Service]
    11Type=simple
    12ExecStart=/home/jobs/scripts/by-user/root/mdadm_check.py /home/jobs/scripts/by-user/root/mdadm_check.yaml
    13User=root
    14Group=root
    15
    16[Install]
    17WantedBy=multi-user.target
    
  7. create a Systemd timer unit file

    /home/jobs/services/by-user/root/mdadm-check.timer#
    1[Unit]
    2Description=Once a month check mdadm arrays
    3
    4[Timer]
    5OnCalendar=Monthly
    6Persistent=true
    7
    8[Install]
    9WantedBy=timers.target
    
  8. fix the permissions

    chmod 700 /home/jobs/{scripts,services}/by-user/root
    
  9. run the deploy script

S.M.A.R.T.#

Run periodical S.M.A.R.T. tests on hard drives and SSDs. The provided script supports only /dev/disk/by-id names.

Vedi anche

  • A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [10]

  1. install the dependencies

    apt-get install hdparm smartmontools python3-yaml python3-requests
    
  2. install fpyutils. See reference

  3. identify the drives you want to check S.M.A.R.T. values

    ls /dev/disk/by-id
    

    See also the udev rule file /lib/udev/rules.d/60-persistent-storage.rules. You can also use this command to have more details of specific drives

    hdparm -I /dev/disk/by-id/${drive_name}
    # or
    hdparm -I /dev/sd${letter}
    
  4. create the jobs directories. See reference

    mkdir -p /home/jobs/{scripts,services}/by-user/root
    chmod 700 -R /home/jobs/{scripts,services}/by-user/root
    
  5. create the script

    /home/jobs/scripts/by-user/root/smartd_test.py#
      1#!/usr/bin/env python3
      2# -*- coding: utf-8 -*-
      3#
      4# smartd_test.py
      5#
      6# Copyright (C) 2019-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
      7#
      8# This program is free software: you can redistribute it and/or modify
      9# it under the terms of the GNU General Public License as published by
     10# the Free Software Foundation, either version 3 of the License, or
     11# (at your option) any later version.
     12#
     13# This program is distributed in the hope that it will be useful,
     14# but WITHOUT ANY WARRANTY; without even the implied warranty of
     15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     16# GNU General Public License for more details.
     17#
     18# You should have received a copy of the GNU General Public License
     19# along with this program.  If not, see <http://www.gnu.org/licenses/>.
     20r"""Run S.M.A.R.T tests on hard drives."""
     21
     22import json
     23import os
     24import pathlib
     25import re
     26import shlex
     27import subprocess
     28import sys
     29
     30import fpyutils
     31import yaml
     32
     33
     34class UserNotRoot(Exception):
     35    """The user running the script is not root."""
     36
     37
     38def get_disks() -> list:
     39    r"""Scan all the disks."""
     40    disks = list()
     41    for d in pathlib.Path('/dev/disk/by-id').iterdir():
     42        # Ignore disks ending with part-${integer} to avoid duplicates (names
     43        # corresponding to partitions of the same disk).
     44        disk = str(d)
     45        if re.match('.+-part[0-9]+$', disk) is None:
     46            try:
     47                ddict = json.loads(
     48                    subprocess.run(
     49                        shlex.split('smartctl --capabilities --json ' +
     50                                    shlex.quote(disk)),
     51                        capture_output=True,
     52                        check=False,
     53                        shell=False,
     54                        timeout=30).stdout)
     55                try:
     56                    # Check for smart test support.
     57                    if ddict['ata_smart_data']['capabilities'][
     58                            'self_tests_supported']:
     59                        disks.append(disk)
     60                except KeyError:
     61                    pass
     62            except subprocess.TimeoutExpired:
     63                print('timeout for ' + disk)
     64            except subprocess.CalledProcessError:
     65                print('device ' + disk +
     66                      ' does not support S.M.A.R.T. commands, skipping...')
     67
     68    return disks
     69
     70
     71def disk_ready(disk: str, busy_status: int = 249) -> bool:
     72    r"""Check if the disk is ready."""
     73    # Raises a KeyError if disk has not S.M.A.R.T. status capabilities.
     74    ddict = json.loads(
     75        subprocess.run(shlex.split('smartctl --capabilities --json ' +
     76                                   shlex.quote(disk)),
     77                       capture_output=True,
     78                       check=True,
     79                       shell=False,
     80                       timeout=30).stdout)
     81    if ddict['ata_smart_data']['self_test']['status']['value'] != busy_status:
     82        return True
     83    else:
     84        return False
     85
     86
     87def run_test(disk: str, test_length: str = 'long') -> str:
     88    r"""Run the smartd test."""
     89    return subprocess.run(
     90        shlex.split('smartctl --test=' + shlex.quote(test_length) + ' ' +
     91                    shlex.quote(disk)),
     92        capture_output=True,
     93        check=True,
     94        shell=False,
     95        timeout=30).stdout
     96
     97
     98if __name__ == '__main__':
     99    if os.getuid() != 0:
    100        raise UserNotRoot
    101
    102    configuration_file = shlex.quote(sys.argv[1])
    103    config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader)
    104
    105    # Do not prepend '/dev/disk/by-id/'.
    106    disks_to_check = shlex.quote(sys.argv[2])
    107    disks_available = get_disks()
    108
    109    for d in config['devices']:
    110        dev = '/dev/disk/by-id/' + d
    111        if config['devices'][d]['enabled'] and dev in disks_available:
    112            if disks_to_check == 'all' or disks_to_check == d:
    113                if disk_ready(dev, config['devices'][d]['busy_status']):
    114                    print('attempting ' + d + ' ...')
    115                    message = run_test(
    116                        dev, config['devices'][d]['test']).decode('utf-8')
    117                    print(message)
    118                    if config['devices'][d]['log']:
    119                        if config['notify']['gotify']['enabled']:
    120                            m = config['notify']['gotify'][
    121                                'message'] + ' ' + d + '\n' + message
    122                            fpyutils.notify.send_gotify_message(
    123                                config['notify']['gotify']['url'],
    124                                config['notify']['gotify']['token'], m,
    125                                config['notify']['gotify']['title'],
    126                                config['notify']['gotify']['priority'])
    127                        if config['notify']['email']['enabled']:
    128                            fpyutils.notify.send_email(
    129                                message,
    130                                config['notify']['email']['smtp_server'],
    131                                config['notify']['email']['port'],
    132                                config['notify']['email']['sender'],
    133                                config['notify']['email']['user'],
    134                                config['notify']['email']['password'],
    135                                config['notify']['email']['receiver'],
    136                                config['notify']['email']['subject'])
    137                else:
    138                    # Drop test requests if a disk is running a test in a particular moment.
    139                    # This avoid putting the disks under too much stress.
    140                    print('disk ' + d + ' not ready, checking the next...')
    
  6. create a configuration file

    includes/home/jobs/scripts/by-user/root/smartd_test.yaml#
     1#
     2# smartd_test.yaml
     3#
     4# Copyright (C) 2019-2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     5#
     6# This program is free software: you can redistribute it and/or modify
     7# it under the terms of the GNU General Public License as published by
     8# the Free Software Foundation, either version 3 of the License, or
     9# (at your option) any later version.
    10#
    11# This program is distributed in the hope that it will be useful,
    12# but WITHOUT ANY WARRANTY; without even the implied warranty of
    13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    14# GNU General Public License for more details.
    15#
    16# You should have received a copy of the GNU General Public License
    17# along with this program.  If not, see <http://www.gnu.org/licenses/>.
    18
    19devices:
    20    ata-disk1:
    21        enabled: true
    22        test: 'long'
    23        log: true
    24        busy_status: 249
    25    ata-disk2:
    26        enabled: true
    27        test: 'long'
    28        log: false
    29        busy_status: 249
    30    ata-diskn:
    31        enabled: true
    32        test: 'long'
    33        log: true
    34        busy_status: 249
    35
    36notify:
    37    gotify:
    38        enabled: true
    39        url: '<gotify url>'
    40        token: '<app token>'
    41        title: 'smart test'
    42        message: 'starting smart test on'
    43        priority: 5
    44    email:
    45        enabled: true
    46        smtp_server: 'smtp.gmail.com'
    47        port: 465
    48        sender: 'myusername@gmail.com'
    49        user: 'myusername'
    50        password: 'my awesome password'
    51        receiver: 'myusername@gmail.com'
    52        subject: 'smartd test'
    

    Importante

    • absent devices are ignored

    • devices must be explicitly enabled

    • do not prepend /dev/disk/by-id/ to drive names

    • run a short test to get the busy_status value.

      smartctl -t short /dev/disk/by-id/${drive_name}
      

      You should be able to capture the value while the test is running by looking at the Self-test execution status: line. In my case it is always 249, but this value is not hardcoded in smartmontools” source code

      smartctl --all /dev/disk/by-id/${drive_name}
      
  7. use this Systemd service unit file

    /home/jobs/services/by-user/root/smartd-test.ata_disk1.service#
    1[Unit]
    2Description=execute smartd on ata-disk1
    3
    4[Service]
    5Type=simple
    6ExecStart=/home/jobs/scripts/by-user/root/smartd_test.py /home/jobs/scripts/by-user/root/smartd_test.yaml ata-disk1
    7User=root
    8Group=root
    
  8. use this Systemd timer unit file

    /home/jobs/services/by-user/root/smartd-test.ata_disk1.timer#
    1[Unit]
    2Description=Once every two months smart test ata-disk1
    3
    4[Timer]
    5OnCalendar=*-01,03,05,07,09,11-01 00:00:00
    6Persistent=true
    7
    8[Install]
    9WantedBy=timers.target
    
  9. fix the permissions

    chmod 700 -R /home/jobs/scripts/by-user/smartd_test.*
    chmod 700 -R /home/jobs/services/by-user/root
    
  10. run the deploy script

Importante

To avoid tests being interrupted you must avoid putting the disks to sleep, therefore, programs like hd-idle must be stopped before running the tests.

Services#

Notify unit status#

This script is useful to notfiy about failed Systemd service.

Some time ago my Gitea instance could not start after an update. If I used this script I would have known immediately about the problem instead of several days later.

Vedi anche

  • linux - get notification when systemd-monitored service enters failed state - Server Fault [11]

  1. install fpyutils. See reference

  2. create the script

    /home/jobs/scripts/by-user/root/notify_unit_status.py#
     1#!/usr/bin/env python3
     2# -*- coding: utf-8 -*-
     3#
     4# notify_unit_status.py
     5#
     6# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100)
     7# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100)
     8# Copyright (C) 2020-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     9#
    10# This script is licensed under a
    11# Creative Commons Attribution-ShareAlike 4.0 International License.
    12#
    13# You should have received a copy of the license along with this
    14# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>.
    15r"""Send a notification when a Systemd unit fails."""
    16
    17import shlex
    18import sys
    19
    20import fpyutils
    21import yaml
    22
    23if __name__ == '__main__':
    24    configuration_file = shlex.quote(sys.argv[1])
    25    config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader)
    26
    27    failed_unit = shlex.quote(sys.argv[2])
    28
    29    message = 'service failure: ' + failed_unit
    30    if config['notify']['gotify']['enabled']:
    31        m = config['notify']['gotify']['message'] + '\n' + message
    32        fpyutils.notify.send_gotify_message(
    33            config['notify']['gotify']['url'],
    34            config['notify']['gotify']['token'], m,
    35            config['notify']['gotify']['title'],
    36            config['notify']['gotify']['priority'])
    37    if config['notify']['email']['enabled']:
    38        fpyutils.notify.send_email(message,
    39                                   config['notify']['email']['smtp server'],
    40                                   config['notify']['email']['port'],
    41                                   config['notify']['email']['sender'],
    42                                   config['notify']['email']['user'],
    43                                   config['notify']['email']['password'],
    44                                   config['notify']['email']['receiver'],
    45                                   config['notify']['email']['subject'])
    
  3. create a configuration file

    includes/home/jobs/scripts/by-user/root/notify_unit_status.yaml#
     1#
     2# notify_unit_status.yaml
     3#
     4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100)
     5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100)
     6# Copyright (C) 2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     7#
     8# This script is licensed under a
     9# Creative Commons Attribution-ShareAlike 4.0 International License.
    10#
    11# You should have received a copy of the license along with this
    12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>.
    13
    14notify:
    15    gotify:
    16        enabled: true
    17        url: '<gotify url>'
    18        token: '<app token>'
    19        title: 'service failure'
    20        message: 'service failure'
    21        priority: 5
    22    email:
    23        enabled: true
    24        smtp server: 'smtp.gmail.com'
    25        port: 465
    26        sender: 'myusername@gmail.com'
    27        user: 'myusername'
    28        password: 'my awesome password'
    29        receiver: 'myusername@gmail.com'
    30        subject: 'service failure'
    
  4. use this Systemd service unit file

    includes/home/jobs/services/by-user/root/notify-unit-status@.service#
     1#
     2# notify-unit-status@.service
     3#
     4# Copyright (C) 2015 Pablo Martinez @ Stack Exchange (https://serverfault.com/a/701100)
     5# Copyright (C) 2018 Davy Landman @ Stack Exchange (https://serverfault.com/a/701100)
     6# Copyright (C) 2020 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     7#
     8# This script is licensed under a
     9# Creative Commons Attribution-ShareAlike 4.0 International License.
    10#
    11# You should have received a copy of the license along with this
    12# work. If not, see <http://creativecommons.org/licenses/by-sa/4.0/>.
    13
    14[Unit]
    15Description=Unit Status Mailer Service
    16After=network-online.target
    17Wants=network-online.target
    18
    19[Service]
    20Type=simple
    21ExecStart=/home/jobs/scripts/by-user/root/notify_unit_status.py /home/jobs/scripts/by-user/root/notify_unit_status.yaml %I
    
  5. edit the Systemd service you want to monitor. In this example the service to be monitored is Gitea

    systemctl edit gitea.service
    
  6. add this content

    # [ ... ]
    
    [Unit]
    
    # [ ... ]
    
    OnFailure=notify-unit-status@%n.service
    
    # [ ... ]
    

Updates#

Update action#

This script can be used to update software not supported by the package manager, for example Docker images.

Importante

Any arbitrary command can be configured.

Vedi anche

  • A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [10]

  1. install the dependencies

    apt-get install python3-yaml python3-requests
    
  2. install fpyutils. See reference

  3. create the script

    /home/jobs/scripts/by-user/root/update_action.py#
     1#!/usr/bin/env python3
     2# -*- coding: utf-8 -*-
     3#
     4# update_action.py
     5#
     6# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     7#
     8# This program is free software: you can redistribute it and/or modify
     9# it under the terms of the GNU General Public License as published by
    10# the Free Software Foundation, either version 3 of the License, or
    11# (at your option) any later version.
    12#
    13# This program is distributed in the hope that it will be useful,
    14# but WITHOUT ANY WARRANTY; without even the implied warranty of
    15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    16# GNU General Public License for more details.
    17#
    18# You should have received a copy of the GNU General Public License
    19# along with this program.  If not, see <http://www.gnu.org/licenses/>.
    20r"""update_action.py."""
    21
    22import shlex
    23import sys
    24
    25import fpyutils
    26import yaml
    27
    28
    29def send_notification(message: str, notify: dict):
    30    m = notify['gotify']['message'] + '\n' + message
    31    if notify['gotify']['enabled']:
    32        fpyutils.notify.send_gotify_message(notify['gotify']['url'],
    33                                            notify['gotify']['token'], m,
    34                                            notify['gotify']['title'],
    35                                            notify['gotify']['priority'])
    36    if notify['email']['enabled']:
    37        fpyutils.notify.send_email(
    38            message, notify['email']['smtp_server'], notify['email']['port'],
    39            notify['email']['sender'], notify['email']['user'],
    40            notify['email']['password'], notify['email']['receiver'],
    41            notify['email']['subject'])
    42
    43
    44if __name__ == '__main__':
    45
    46    def main():
    47        configuration_file = shlex.quote(sys.argv[1])
    48        config = yaml.load(open(configuration_file, 'r'),
    49                           Loader=yaml.SafeLoader)
    50
    51        # Action types. Preserve this order.
    52        types = ['pre', 'update', 'post']
    53        services = config['services']
    54
    55        for service in services:
    56            for type in types:
    57                for cmd in services[service]['commands'][type]:
    58                    for name in cmd:
    59                        retval = fpyutils.shell.execute_command_live_output(
    60                            cmd[name]['command'], dry_run=False)
    61                        if cmd[name]['notify']['success'] and retval == cmd[
    62                                name]['expected_retval']:
    63                            send_notification(
    64                                'command "' + name + '" of service "' +
    65                                service + '": OK', config['notify'])
    66                        elif cmd[name]['notify']['error'] and retval != cmd[
    67                                name]['expected_retval']:
    68                            send_notification(
    69                                'command "' + name + '" of service "' +
    70                                service + '": ERROR', config['notify'])
    71
    72    main()
    
  4. create a configuration file

    includes/home/jobs/scripts/by-user/root/update_action.mypurpose.yaml#
     1#
     2# update_action.mypurpose.yaml
     3#
     4# Copyright (C) 2021-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
     5#
     6# This program is free software: you can redistribute it and/or modify
     7# it under the terms of the GNU General Public License as published by
     8# the Free Software Foundation, either version 3 of the License, or
     9# (at your option) any later version.
    10#
    11# This program is distributed in the hope that it will be useful,
    12# but WITHOUT ANY WARRANTY; without even the implied warranty of
    13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    14# GNU General Public License for more details.
    15#
    16# You should have received a copy of the GNU General Public License
    17# along with this program.  If not, see <http://www.gnu.org/licenses/>.
    18
    19notify:
    20    email:
    21        enabled: true
    22        smtp_server: 'smtp.gmail.com'
    23        port: 465
    24        sender: 'myusername@gmail.com'
    25        user: 'myusername'
    26        password: 'my awesome password'
    27        receiver: 'myusername@gmail.com'
    28        subject: 'update action'
    29    gotify:
    30        enabled: true
    31        url: '<gotify url>'
    32        token: '<app token>'
    33        title: 'update action'
    34        message: 'update action'
    35        priority: 5
    36
    37services:
    38    hello:
    39        commands:
    40            pre:
    41                - stop_service:
    42                    # string
    43                    command: 'systemctl stop docker-compose.hello.service'
    44                    # integer
    45                    expected_retval: 0
    46                    # boolean: {true,false}
    47                    notify:
    48                        success: true
    49                        error: true
    50            update:
    51                - pull:
    52                    command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose pull'
    53                    expected_retval: 0
    54                    notify:
    55                        success: true
    56                        error: true
    57                - build:
    58                    command: 'pushd /home/jobs/scripts/by-user/root/docker/hello && docker-compose build --pull'
    59                    expected_retval: 0
    60                    notify:
    61                        success: true
    62                        error: true
    63            post:
    64                - start_service:
    65                    command: 'systemctl start docker-compose.hello.service'
    66                    expected_retval: 0
    67                    notify:
    68                        success: true
    69                        error: true
    70    goodbye:
    71        commands:
    72            pre:
    73                - stop_service:
    74                    command: 'systemctl stop docker-compose.goodbye.service'
    75                    expected_retval: 0
    76                    notify:
    77                        success: true
    78                        error: true
    79            update:
    80                - pull_only:
    81                    command: 'pushd /home/jobs/scripts/by-user/root/docker/goodbye && docker-compose pull'
    82                    expected_retval: 0
    83                    notify:
    84                        success: true
    85                        error: true
    86            post:
    87                - start_service:
    88                    command: 'systemctl start docker-compose.goodbye.service'
    89                    expected_retval: 0
    90                    notify:
    91                        success: true
    92                        error: true
    
  5. use this Systemd service unit file

    /home/jobs/services/by-user/root/update-action.mypurpose.service#
     1[Unit]
     2Description=Update action mypurpose
     3Wants=network-online.target
     4After=network-online.target
     5
     6[Service]
     7Type=simple
     8ExecStart=/home/jobs/scripts/by-user/root/update_action.py /home/jobs/scripts/by-user/root/update_action.mypurpose.yaml
     9User=root
    10Group=root
    
  6. use this Systemd timer unit file

    /home/jobs/services/by-user/root/update-action.mypurpose.timer#
    1[Unit]
    2Description=Update action mypurpose monthly
    3
    4[Timer]
    5OnCalendar=monthly
    6Persistent=true
    7
    8[Install]
    9WantedBy=timers.target
    
  7. fix the permissions

    chmod 700 -R /home/jobs/scripts/by-user/update_action.*
    chmod 700 -R /home/jobs/services/by-user/root
    
  8. run the deploy script

Footnotes