Maintenance =========== Software -------- Database ```````` PostgreSQL ~~~~~~~~~~ .. important:: All these instructions have been tested with PostgreSQL version 13 only! Changing collate ................ Sometimes you might create a new database without specifying the collate information. PosgreSQL will use the default collate setting and there is no way to change it once the database is created. The only solution is to dump the database, to create a new one with the correct collate and finally to import the dump and drop the original database. .. seealso:: - _`What is Collation in Databases?` [#f1]_ #. backup old database .. code-block:: shell-session pg_dump -U myuser olddb > olddb.bak.sql #. create new database with correct collate .. code-block:: postgresql-console CREATE DATABASE newdb WITH OWNER myuser TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8'; #. import dump into new database .. code-block:: shell-session psql -U myuser -d newdb < olddb.bak.sql #. rename old database .. code-block:: postgresql-console ALTER DATABASE olddb RENAME TO olddb_bak; #. rename the new database to the original database name .. code-block:: postgresql-console ALTER DATABASE newdb RENAME TO olddb; #. restart services and check that everything works #. drop old database .. code-block:: postgresql-console DROP DATABASE olddb_bak; Moving data directory ..................... If the data directory in the root partition is getting too large you can create a new partition (mounted on ``/postgresql`` in this example) and let PostgreSQL point to that one instead. #. install the dependencies .. code-block:: shell-session apt-get install rsync #. stop PostgreSQL .. code-block:: shell-session systemctl stop postgresql #. copy the database directory .. code-block:: shell-session rsync -avAX /var/lib/postgresql/13/main /postgresql/13 #. change the ``data_directory`` setting in ``/etc/postgresql/postgresql.conf`` .. code-block:: ini data_directory = '/postgresql/13/main' # use data in another directory #. restart PostgreSQL .. code-block:: shell-session systemctl start postgresql High availability ````````````````` In this example we configure two nodes. When the master node goes offline the backup node takes over the *floating IP* address. We replicate services on the backup node such as Apache and Unbound. Keepalived is a tool which handles network replication for layers 3 and 4. Here you can find the configuration file for the master node (the backup node just needs minor edits). A script you find below helps you copy the content of webservers, DNS server, etc on the backup node as well as restarting those services automatically. .. seealso:: - _`Keepalived for Linux` [#f2]_ - _`Is it possible to add a static mac address for a vrrp ip? · Issue #34 · osixia/docker-keepalived · GitHub` [#f3]_ - _`Building Layer 3 High Availability | Documentation` [#f4]_ Key ~~~ ============ ============== ====================== ====================== Node IP Hostname Network interface name ============ ============== ====================== ====================== MASTER 192.168.0.10 mst eno1 BACKUP 192.168.0.11 bak eno1 floating IP 192.168.0.100 \- \- ============ ============== ====================== ====================== Basic setup ~~~~~~~~~~~ #. install the dependencies. Keepalived must be installed on all nodes .. code-block:: shell-session apt-get install keepalived rsync #. create the :download:`configuration ` for the master node .. literalinclude:: includes/etc/keepalived/keepalived.conf :linenos: :caption: /etc/keepalived/keepalived.conf .. note:: Copy this file in the backup node as well and change: - ``state MASTER`` to ``state BACKUP`` - ``unicast_peer { 192.168.0.11 }`` to ``unicast_peer { 192.168.0.10 }`` - ``priority 110`` to ``priority 100`` #. restart keepalived on both nodes .. code-block:: shell-session systemctl restart keepalived #. ping the floating IP address .. code-block:: shell-session ping -c1 192.168.0.100 #. test replication by stopping Keepalived on the master node only and pinging the *floating IP* address. Finally, restart keepalived .. code-block:: shell-session systemctl stop keepalived ping -c1 192.168.0.100 systemctl start keepalived Service replication ~~~~~~~~~~~~~~~~~~~ Make sure to be in a trusted network because we allow root login via SSH to simplify operations. In this example we copy files from - Apache - Unbound - dnscrypt-proxy - Certbot (Let's encrypt) The ``enabled_files`` directory in the master node contains files with lists of files or directories which will be copied by rsync to the backup server. #. create the :download:`script ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh :language: shell :linenos: :caption: /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.sh #. create a :download:`configuration file ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf :linenos: :caption: /home/jobs/scripts/by-user/root/keepalived/keepalived_deploy.conf #. create an SSH key. Do not set a password for it .. code-block:: shell-session ssh-keygen -t rsa -b 16384 -C "$(whoami)@$(hostname)-$(date +%F)" #. Add the following to the SSH configuration .. code-block:: :linenos: :caption: /root/.ssh/config Match host 192.168.0.11 user root IdentityFile=/root/.ssh/bak_root #. go to the backup node and copy the newly created public key in ``/root/.ssh/authorized_keys`` #. edit the SSH server configuration .. code-block:: :linenos: :caption: /etc/ssh/sshd_config # [ ... ] PermitRootLogin yes AllowUsers root # [ ... ] Match user root PasswordAuthentication no # [ ... ] #. restart the SSH service in the backup node .. code-block:: shell-session systemctl restart ssh #. go back to the master node and test if the key is working .. code-block:: shell-session ssh root@192.168.0.11 #. create a :download:`Systemd service unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/keepalived-deploy.service :language: ini :linenos: :caption: /home/jobs/services/by-user/root/keepalived-deploy.service #. create a :download:`Systemd timer unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/keepalived-deploy.timer :language: ini :linenos: :caption: /home/jobs/services/by-user/root/keepalived-deploy.timer Apache replication ~~~~~~~~~~~~~~~~~~ .. seealso:: - _`How to redirect all pages to one page?` [#f5]_ #. in your master node, separate replicatable service from non-replicatable ones. You can do this by separating the configuration in multiple files and then including those configuration in the main file (``/etc/apache2/apache2.conf``). #. add these files in the master node. The first one copies the Apache configuration .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/enabled_files/apache2.txt :caption: /home/jobs/scripts/by-user/root/keepalived/enabled_files/apache2.txt .. important:: you must change the virtual host directive to use the floating IP like this: ```` The second file copies the server data. You can replicate static data (Jekyll website, HTML, etc...) but not programs that rely on databases without extra work .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/enabled_files/replicated_webservers_data.txt :caption: /home/jobs/scripts/by-user/root/keepalived/enabled_files/replicated_webservers_data.txt The third file copies all the HTTPS certificates .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/enabled_files/letsencrypt.txt :caption: /home/jobs/scripts/by-user/root/keepalived/enabled_files/letsencrypt.txt #. in the backup node you must "patch" the non-replicatable service. You can setup an error message for each server like this: .. code-block:: apache :linenos: :caption: /etc/apache2/standard-servers.conf UseCanonicalName on Keepalive On SSLCompression off ServerName software.franco.net.eu.org RewriteEngine On Include /etc/apache2/standard-servers-outage-text.conf Include /etc/letsencrypt/options-ssl-apache.conf SSLCertificateFile /etc/letsencrypt/live/software.franco.net.eu.org/fullchain.pem SSLCertificateKeyFile /etc/letsencrypt/live/software.franco.net.eu.org/privkey.pem .. code-block:: apache :linenos: :caption: /etc/apache2/standard-servers-outage-text.conf DocumentRoot "/var/www/standard-servers" Options -ExecCGI -Includes -FollowSymLinks -Indexes AllowOverride None Require all granted # Redirect all requests to the root directory of the virtual server. RewriteEngine On RewriteRule \/.+ / [L,R] Create a file in ``/var/www/standard-servers/index.html`` with your outage message DNS replication ~~~~~~~~~~~~~~~ I use dnscrypt-proxy as DNS server and Unbound as caching server. The systemd socket file is useful to set the listening port. .. seealso:: - _`Unbound DNS server behind a VIP - solving reply from unexpected source` [#f6]_ #. use these configurations to replicate the two services. .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/enabled_files/dnscrypt-proxy.txt :caption: /home/jobs/scripts/by-user/root/keepalived/enabled_files/dnscrypt-proxy.txt .. literalinclude:: includes/home/jobs/scripts/by-user/root/keepalived/enabled_files/unbound.txt :caption: /home/jobs/scripts/by-user/root/keepalived/enabled_files/unbound.txt .. important:: Add ``interface-automatic: yes`` to the unbound configuration in the ``server`` section. Final steps ~~~~~~~~~~~ #. run the :ref:`deploy script ` Kernel `````` .. seealso:: - _`filesystem - Where does update-initramfs look for kernel versions? - Ask Ubuntu` [#f7]_ RAID ```` Run periodical RAID data scrubs on hard drives and SSDs. .. seealso:: - _`ubuntu - How to wipe md raid meta? - Unix & Linux Stack Exchange` [#f8]_ - _`RAID data scrubbing` [#f9]_ #. install the dependencies .. code-block:: shell-session apt-get install mdadm python3-yaml python3-requests #. install fpyutils. See :ref:`reference ` #. create the jobs directories. See :ref:`reference ` .. code-block:: shell-session mkdir -p /home/jobs/{scripts,services}/by-user/root #. create the :download:`script ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/mdadm_check.py :language: python :linenos: :caption: /home/jobs/scripts/by-user/root/mdadm_check.py #. create a :download:`configuration file ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/mdadm_check.yaml :language: yaml :linenos: :caption: /home/jobs/scripts/by-user/root/mdadm_check.yaml .. important:: - do not prepend ``/dev`` to RAID device names - possible values: ``check``, ``repair``, ``idle``, ``ignore`` - ``ignore`` will make the script skip the device - use ``repair`` at your own risk - absent devices are ignored - run these commands to get the names of RAID arrays .. code-block:: shell-session lsblk cat /proc/mdstat #. create a :download:`Systemd service unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/mdadm-check.service :language: ini :linenos: :caption: /home/jobs/services/by-user/root/mdadm-check.service #. create a :download:`Systemd timer unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/mdadm-check.timer :language: ini :linenos: :caption: /home/jobs/services/by-user/root/mdadm-check.timer #. fix the permissions .. code-block:: shell-session chmod 700 /home/jobs/{scripts,services}/by-user/root #. run the :ref:`deploy script ` S.M.A.R.T. `````````` Run periodical S.M.A.R.T. tests on hard drives and SSDs. The provided script supports only ``/dev/disk/by-id`` names. .. seealso:: - A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [#f10]_ #. install the dependencies .. code-block:: shell-session apt-get install hdparm smartmontools python3-yaml python3-requests #. install fpyutils. See :ref:`reference ` #. identify the drives you want to check S.M.A.R.T. values .. code-block:: shell-session ls /dev/disk/by-id See also the udev rule file ``/lib/udev/rules.d/60-persistent-storage.rules``. You can also use this command to have more details of specific drives .. code-block:: shell-session hdparm -I /dev/disk/by-id/${drive_name} # or hdparm -I /dev/sd${letter} #. create the jobs directories. See :ref:`reference ` .. code-block:: shell-session mkdir -p /home/jobs/{scripts,services}/by-user/root chmod 700 -R /home/jobs/{scripts,services}/by-user/root #. create the :download:`script ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/smartd_test.py :language: python :linenos: :caption: /home/jobs/scripts/by-user/root/smartd_test.py #. create a :download:`configuration file ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/smartd_test.yaml :language: yaml :linenos: :caption: includes/home/jobs/scripts/by-user/root/smartd_test.yaml .. important:: - absent devices are ignored - devices must be explicitly enabled - do not prepend ``/dev/disk/by-id/`` to drive names - run a short test to get the ``busy_status`` value. .. code-block:: shell-session smartctl -t short /dev/disk/by-id/${drive_name} You should be able to capture the value while the test is running by looking at the ``Self-test execution status:`` line. In my case it is always ``249``, but this value is not hardcoded in smartmontools' source code .. code-block:: shell-session smartctl --all /dev/disk/by-id/${drive_name} #. use this :download:`Systemd service unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/smartd-test.ata_disk1.service :language: ini :linenos: :caption: /home/jobs/services/by-user/root/smartd-test.ata_disk1.service #. use this :download:`Systemd timer unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/smartd-test.ata_disk1.timer :language: ini :linenos: :caption: /home/jobs/services/by-user/root/smartd-test.ata_disk1.timer #. fix the permissions .. code-block:: shell-session chmod 700 -R /home/jobs/scripts/by-user/smartd_test.* chmod 700 -R /home/jobs/services/by-user/root #. run the :ref:`deploy script ` .. important:: To avoid tests being interrupted you must avoid putting the disks to sleep, therefore, programs like `hd-idle `_ must be stopped before running the tests. Services ```````` Notify unit status ~~~~~~~~~~~~~~~~~~ This script is useful to notfiy about failed Systemd service. Some time ago my `Gitea `_ instance `could not start after an update `_. If I used this script I would have known immediately about the problem instead of several days later. .. seealso:: - _`linux - get notification when systemd-monitored service enters failed state - Server Fault` [#f11]_ #. install fpyutils. See :ref:`reference ` #. create the :download:`script ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/notify_unit_status.py :language: python :linenos: :caption: /home/jobs/scripts/by-user/root/notify_unit_status.py #. create a :download:`configuration file ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/notify_unit_status.yaml :language: yaml :linenos: :caption: includes/home/jobs/scripts/by-user/root/notify_unit_status.yaml #. use this :download:`Systemd service unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/notify-unit-status@.service :language: ini :linenos: :caption: includes/home/jobs/services/by-user/root/notify-unit-status@.service #. edit the Systemd service you want to monitor. In this example the service to be monitored is Gitea .. code-block:: shell-session systemctl edit gitea.service #. add this content .. code-block:: ini # [ ... ] [Unit] # [ ... ] OnFailure=notify-unit-status@%n.service # [ ... ] Updates ``````` Update action ~~~~~~~~~~~~~ This script can be used to update software not supported by the package manager, for example Docker images. .. important:: Any arbitrary command can be configured. .. seealso:: - A collection of scripts I have written and/or adapted that I currently use on my systems as automated tasks [#f10]_ #. install the dependencies .. code-block:: shell-session apt-get install python3-yaml python3-requests #. install fpyutils. See :ref:`reference ` #. create the :download:`script ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/update_action.py :language: python :linenos: :caption: /home/jobs/scripts/by-user/root/update_action.py #. create a :download:`configuration file ` .. literalinclude:: includes/home/jobs/scripts/by-user/root/update_action.mypurpose.yaml :language: yaml :linenos: :caption: includes/home/jobs/scripts/by-user/root/update_action.mypurpose.yaml #. use this :download:`Systemd service unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/update-action.mypurpose.service :language: ini :linenos: :caption: /home/jobs/services/by-user/root/update-action.mypurpose.service #. use this :download:`Systemd timer unit file ` .. literalinclude:: includes/home/jobs/services/by-user/root/update-action.mypurpose.timer :language: ini :linenos: :caption: /home/jobs/services/by-user/root/update-action.mypurpose.timer #. fix the permissions .. code-block:: shell-session chmod 700 -R /home/jobs/scripts/by-user/update_action.* chmod 700 -R /home/jobs/services/by-user/root #. run the :ref:`deploy script ` .. rubric:: Footnotes .. [#f1] https://database.guide/what-is-collation-in-databases/ unknown license .. [#f2] https://www.keepalived.org/index.html unknown license .. [#f3] https://github.com/osixia/docker-keepalived/issues/34 unknown license .. [#f4] https://docs.syseleven.de/syseleven-stack/en/howtos/l3-high-availability unknown license .. [#f5] https://forums.digitalpoint.com/threads/how-to-redirect-all-pages-to-one-page.25353/ unknown license .. [#f6] https://www.claudiokuenzler.com/blog/695/unbound-behind-a-virtual-ip-vip-reply-from-unexpected-source unknown license .. [#f7] https://askubuntu.com/questions/759802/where-does-update-initramfs-look-for-kernel-versions CC BY-SA 3.0, copyright (c) 2016-2017, askubuntu contributors .. [#f8] https://unix.stackexchange.com/questions/411206/how-to-wipe-md-raid-meta CC BY-SA 3.0, copyright (c) 2017, stackexchange contributors .. [#f9] https://blog.franco.net.eu.org/notes/raid-data-scrubbing.html CC-BY-SA 4.0, copyright (c) 2019-2021, Franco Masotti .. [#f10] https://software.franco.net.eu.org/frnmst/automated-tasks GNU GPLv3+, copyright (c) 2019-2022, Franco Masotti .. [#f11] https://serverfault.com/questions/694818/get-notification-when-systemd-monitored-service-enters-failed-state CC BY-SA 3.0, copyright (c) 2015-2018, serverfault contributors