Monitoring#

Command assert#

I use this script to check that the result of shell commands correspond to some expected output. You can execute any arbirtary shell command.

If the resulting output is an unexpected one, a notification is sent.

The script also creates an RSS feed to complement the standard notifications. The RSS feed file should be accessible by an HTTP server such as Apache.

../../_images/command_assert_0.png

A Gotify notification showing a Gitea server error#

Basic setup#

  1. install the dependencies

    apt-get install python3-yaml python3-requests feedgenerator
    
  2. install fpyutils. See reference

  3. create a new user

    useradd --system -s /bin/bash -U command-assert
    passwd command-assert
    usermod -aG jobs command-assert
    
  4. create the jobs directories. See reference

    mkdir -p /home/jobs/{scripts,services}/by-user/command-assert
    
  5. create the script

    /home/jobs/scripts/by-user/command-assert/command_assert.py#
      1#!/usr/bin/env python3
      2#
      3# command_assert.py
      4#
      5# Copyright (C) 2020-2022 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
      6#
      7# This program is free software: you can redistribute it and/or modify
      8# it under the terms of the GNU General Public License as published by
      9# the Free Software Foundation, either version 3 of the License, or
     10# (at your option) any later version.
     11#
     12# This program is distributed in the hope that it will be useful,
     13# but WITHOUT ANY WARRANTY; without even the implied warranty of
     14# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     15# GNU General Public License for more details.
     16#
     17# You should have received a copy of the GNU General Public License
     18# along with this program.  If not, see <http://www.gnu.org/licenses/>.
     19r"""command_assert.py."""
     20
     21import datetime
     22import pathlib
     23import re
     24import shlex
     25import subprocess
     26import sys
     27import uuid
     28
     29import feedgenerator
     30import fpyutils
     31import yaml
     32
     33
     34class InvalidCache(Exception):
     35    pass
     36
     37
     38class InvalidConfiguration(Exception):
     39    pass
     40
     41
     42def send_notification(message: str, notify: dict):
     43    if notify['gotify']['enabled']:
     44        m = notify['gotify']['message'] + '\n' + message
     45        fpyutils.notify.send_gotify_message(
     46            notify['gotify']['url'],
     47            notify['gotify']['token'], m,
     48            notify['gotify']['title'],
     49            notify['gotify']['priority'])
     50    if notify['email']['enabled']:
     51        fpyutils.notify.send_email(message,
     52                                   notify['email']['smtp_server'],
     53                                   notify['email']['port'],
     54                                   notify['email']['sender'],
     55                                   notify['email']['user'],
     56                                   notify['email']['password'],
     57                                   notify['email']['receiver'],
     58                                   notify['email']['subject'])
     59
     60
     61def run_command(
     62    command: str,
     63    file_descriptor: str,
     64    process_timeout_interval: int = 60,
     65    process_in_timeout_retval: int = -131072,
     66    process_in_timeout_output: str = '<--##--##-->',
     67) -> tuple:
     68    r"""Run the command and capture the selected output and return value."""
     69    if file_descriptor not in ['stderr', 'stdout', 'both']:
     70        raise ValueError
     71
     72    command = shlex.split(command)
     73    try:
     74        # No exception is raised unless the process goes in timeout.
     75        result = subprocess.run(command,
     76                                capture_output=True,
     77                                timeout=process_timeout_interval)
     78        if file_descriptor == 'stdout':
     79            output = result.stdout
     80        elif file_descriptor == 'stderr':
     81            output = result.stderr
     82        elif file_descriptor == 'both':
     83            output = result.stdout + result.stderr
     84        output = output.decode('UTF-8')
     85        retval = result.returncode
     86    except subprocess.TimeoutExpired:
     87        output = process_in_timeout_output
     88        retval = process_in_timeout_retval
     89
     90    return output, retval
     91
     92
     93def assert_output(output: str,
     94                  expected_output: str,
     95                  retval: int,
     96                  expected_retval: int,
     97                  strict_matching=False) -> bool:
     98    r"""Check that the output and the return value correspond to expected values."""
     99    # Escape special regex characters.
    100    expected_output = re.escape(expected_output)
    101
    102    if strict_matching:
    103        assertion_passes = re.match(
    104            expected_output, output) is not None and retval == expected_retval
    105    else:
    106        # Similar to grep.
    107        assertion_passes = re.search(
    108            expected_output, output) is not None and retval == expected_retval
    109
    110    return assertion_passes
    111
    112
    113########
    114# Feed #
    115########
    116def add_feed_element(feed,
    117                     id: int,
    118                     title: str,
    119                     content: str,
    120                     date: datetime.datetime,
    121                     description: str,
    122                     author_email: str,
    123                     author_name: str,
    124                     link: str):
    125    feed.add_item(
    126        unique_id=str(id),
    127        title=title,
    128        link=link,
    129        description=description,
    130        author_email=author_email,
    131        author_name=author_name,
    132        pubdate=date,
    133        updatedate=date,
    134        content=content,
    135    )
    136
    137
    138#########
    139# Files #
    140#########
    141def read_yaml_file(file: str) -> dict:
    142    data = dict()
    143    if pathlib.Path(file).is_file():
    144        data = yaml.load(open(file, 'r'), Loader=yaml.SafeLoader)
    145
    146    return data
    147
    148
    149def read_cache_file(file: str) -> dict:
    150    cache = read_yaml_file(file)
    151    if not check_cache_structure(cache):
    152        raise InvalidCache
    153
    154    return cache
    155
    156
    157def write_cache(cache: dict, cache_file: str):
    158    with open(cache_file, 'w') as f:
    159        f.write(yaml.dump(cache))
    160
    161
    162##################################
    163# Check configuration structure  #
    164##################################
    165def check_configuration_structure(configuration: dict) -> bool:
    166    ok = True
    167    if ('message_status' in configuration
    168       and 'process_in_timeout' in configuration
    169       and 'feed' in configuration
    170       and 'commands' in configuration):
    171        ok = True
    172    else:
    173        ok = False
    174
    175    if (ok
    176       and 'ok' in configuration['message_status']
    177       and 'error' in configuration['message_status']
    178       and 'retval' in configuration['process_in_timeout']
    179       and 'output' in configuration['process_in_timeout']
    180       and 'enabled' in configuration['feed']
    181       and 'feed' in configuration['feed']
    182       and 'cache' in configuration['feed']
    183       and 'total_last_feeds_to_keep' in configuration['feed']
    184       and 'title' in configuration['feed']
    185       and 'link' in configuration['feed']
    186       and 'author_name' in configuration['feed']
    187       and 'author_email' in configuration['feed']
    188       and 'description' in configuration['feed']
    189       and isinstance(configuration['message_status']['ok'], str)
    190       and isinstance(configuration['message_status']['error'], str)
    191       and isinstance(configuration['process_in_timeout']['retval'], int)
    192       and isinstance(configuration['process_in_timeout']['output'], str)
    193       and isinstance(configuration['feed']['enabled'], bool)
    194       and isinstance(configuration['feed']['feed'], str)
    195       and isinstance(configuration['feed']['cache'], str)
    196       and isinstance(configuration['feed']['total_last_feeds_to_keep'], int)
    197       and isinstance(configuration['feed']['title'], str)
    198       and isinstance(configuration['feed']['link'], str)
    199       and isinstance(configuration['feed']['author_name'], str)
    200       and isinstance(configuration['feed']['author_email'], str)
    201       and isinstance(configuration['feed']['description'], str)):
    202        ok = ok & True
    203    else:
    204        ok = ok & False
    205
    206    if isinstance(configuration['commands'], dict):
    207        ok = ok & True
    208    else:
    209        ok = ok & False
    210
    211    commands_keys = list(configuration['commands'].keys())
    212    i = 0
    213    while ok and i < len(commands_keys):
    214        cmd = configuration['commands'][commands_keys[i]]
    215        if ('command' in cmd
    216           and 'file_descriptor' in cmd
    217           and 'strict_matching' in cmd
    218           and 'expected_output' in cmd
    219           and 'expected_retval' in cmd
    220           and 'timeout_interval' in cmd
    221           and 'log_if_ok' in cmd
    222           and 'feed' in cmd
    223           and isinstance(cmd['command'], str)
    224           and isinstance(cmd['file_descriptor'], str)
    225           and isinstance(cmd['strict_matching'], bool)
    226           and isinstance(cmd['expected_output'], str)
    227           and isinstance(cmd['expected_retval'], int)
    228           and isinstance(cmd['timeout_interval'], int)
    229           and isinstance(cmd['log_if_ok'], bool)
    230           and isinstance(cmd['feed'], dict)):
    231            ok = ok & True
    232            feed = cmd['feed']
    233        else:
    234            ok = ok & False
    235        if (ok
    236           and 'enabled' in feed
    237           and 'title' in feed
    238           and 'content' in feed
    239           and 'description' in feed
    240           and 'no_repeat_timeout_seconds' in feed
    241           and isinstance(feed['enabled'], bool)
    242           and isinstance(feed['title'], str)
    243           and isinstance(feed['content'], str)
    244           and isinstance(feed['description'], str)
    245           and isinstance(feed['no_repeat_timeout_seconds'], int)):
    246            ok = ok & True
    247        else:
    248            ok = ok & False
    249
    250        i += 1
    251
    252    return ok
    253
    254
    255#########################
    256# Check cache structure #
    257#########################
    258def check_cache_structure(cache: dict) -> bool:
    259    i = 0
    260    ok = True
    261    elements = list(cache.keys())
    262
    263    if len(elements) > 0:
    264        min = elements[0]
    265
    266    while ok and i < len(elements):
    267        if not isinstance(elements[i], int):
    268            ok = ok & False
    269        if ok and elements[i] > 0:
    270            if elements[i] < min:
    271                ok = ok & False
    272            else:
    273                min = elements[i]
    274        i += 1
    275
    276    i = 0
    277    while ok and i < len(cache):
    278        if (ok
    279           and 'command_id' in cache[elements[i]]
    280           and 'content' in cache[elements[i]]
    281           and 'description' in cache[elements[i]]
    282           and 'email' in cache[elements[i]]
    283           and 'link' in cache[elements[i]]
    284           and 'name' in cache[elements[i]]
    285           and 'pub_date' in cache[elements[i]]
    286           and 'title' in cache[elements[i]]
    287           and isinstance(cache[elements[i]]['command_id'], str)
    288           and isinstance(cache[elements[i]]['content'], str)
    289           and isinstance(cache[elements[i]]['description'], str)
    290           and isinstance(cache[elements[i]]['email'], str)
    291           and isinstance(cache[elements[i]]['link'], str)
    292           and isinstance(cache[elements[i]]['name'], str)
    293           and isinstance(cache[elements[i]]['pub_date'], datetime.datetime)
    294           and isinstance(cache[elements[i]]['title'], str)):
    295            ok = ok & True
    296        else:
    297            ok = ok & False
    298
    299        i += 1
    300
    301    return ok
    302
    303
    304if __name__ == '__main__':
    305    def main():
    306        r"""Run the pipeline."""
    307        # Load the configuration.
    308        configuration_file = shlex.quote(sys.argv[1])
    309        config = yaml.load(open(configuration_file, 'r'), Loader=yaml.SafeLoader)
    310        if not check_configuration_structure(config):
    311            raise InvalidConfiguration
    312
    313        commands = config['commands']
    314        # Create a new feed.
    315        feed = feedgenerator.Atom1Feed(
    316            title=config['feed']['title'],
    317            link=config['feed']['link'],
    318            author_name=config['feed']['author_name'],
    319            author_email=config['feed']['author_email'],
    320            description=config['feed']['description'],
    321        )
    322        now = datetime.datetime.now(datetime.timezone.utc)
    323
    324        # Load feed cache.
    325        cache = read_cache_file(config['feed']['cache'])
    326        if cache is None:
    327            cache = dict()
    328
    329        # First and last key will be used as offsets.
    330        if len(cache) > 0:
    331            last_key = list(cache.keys())[-1]
    332            first_key = list(cache.keys())[0]
    333        else:
    334            last_key = 0
    335            first_key = 1
    336
    337        # Keep only the last existing n elements.
    338        # Elements added to the running session will be purged on
    339        # the next run.
    340        old_cache_len = len(cache)
    341        cache = dict(list(cache.items())[-config['feed']['total_last_feeds_to_keep']:len(cache)])
    342
    343        # Update the first key be removing the first elements.
    344        first_key += old_cache_len - config['feed']['total_last_feeds_to_keep']
    345        # Set a default value if there are not enough elements.
    346        if first_key < 0:
    347            first_key = 1
    348
    349        # i is the unique id of the feed, excluding the offset.
    350        i = 0
    351        for c in cache:
    352            # Replay existing cache.
    353            add_feed_element(
    354                feed,
    355                first_key + i,
    356                cache[c]['title'],
    357                cache[c]['content'],
    358                cache[c]['pub_date'],
    359                cache[c]['description'],
    360                cache[c]['email'],
    361                cache[c]['name'],
    362                cache[c]['link'],
    363            )
    364            i += 1
    365
    366        # Counter for the cache elements.
    367        k = 1
    368        for command in commands:
    369            output, retval = run_command(
    370                commands[command]['command'],
    371                commands[command]['file_descriptor'],
    372                commands[command]['timeout_interval'],
    373                config['process_in_timeout']['retval'],
    374                config['process_in_timeout']['output'],
    375            )
    376            assertion_passes = assert_output(
    377                output, commands[command]['expected_output'], retval,
    378                commands[command]['expected_retval'],
    379                commands[command]['strict_matching'])
    380            if assertion_passes:
    381                result = config['message_status']['ok']
    382            else:
    383                result = config['message_status']['error']
    384
    385            # Log results.
    386            if not assertion_passes or commands[command]['log_if_ok']:
    387
    388                message = command + ' returned: ' + result
    389                send_notification(message, config['notify'])
    390
    391                # Create new feed.
    392                if commands[command]['feed']['enabled']:
    393                    command_id = str(uuid.uuid3(uuid.NAMESPACE_DNS, command))
    394                    found = False
    395                    idx = None
    396                    j = len(cache) - 1
    397                    cache_keys = list(cache.keys())
    398                    # Get the most recent item. Filter by uuid.
    399                    # See
    400                    # https://docs.python.org/3.8/library/stdtypes.html#dict.values
    401                    # about dict order iteration.
    402                    while not found and j >= 0:
    403                        if cache[cache_keys[j]]['command_id'] == command_id:
    404                            found = True
    405                            idx = cache_keys[j]
    406                        j -= 1
    407
    408                    timeout = commands[command]['feed']['no_repeat_timeout_seconds']
    409                    if not found or (found and (now - cache[idx]['pub_date']).seconds > timeout):
    410                        add_feed_element(
    411                            feed,
    412                            first_key + i,
    413                            commands[command]['feed']['title'],
    414                            commands[command]['feed']['content'],
    415                            now,
    416                            config['feed']['description'],
    417                            config['feed']['author_email'],
    418                            config['feed']['author_name'],
    419                            str(),
    420                        )
    421
    422                        # Always append.
    423                        # last_key+k always > last_key
    424                        cache[last_key + k] = {
    425                            'title': commands[command]['feed']['title'],
    426                            'content': commands[command]['feed']['content'],
    427                            'pub_date': now,
    428                            'description': config['feed']['description'],
    429                            'email': config['feed']['author_email'],
    430                            'name': config['feed']['author_name'],
    431                            'link': str(),
    432                            'command_id': command_id,
    433                        }
    434
    435                        k += 1
    436                        i += 1
    437
    438        # if k > 1 means that new elements were added in the last run.
    439        if ((k > 1 or
    440           not pathlib.Path(config['feed']['feed']).is_file())
    441           and config['feed']['enabled']):
    442            write_cache(cache, config['feed']['cache'])
    443            with open(config['feed']['feed'], 'w') as fp:
    444                feed.write(fp, 'utf-8')
    445
    446    main()
    
  6. create a configuration file

    /home/jobs/scripts/by-user/command-assert/command_assert.mypurpose.yaml#
      1#
      2# command_assert.mypurpose.yaml
      3#
      4# Copyright (C) 2020-2021 Franco Masotti (franco \D\o\T masotti {-A-T-} tutanota \D\o\T com)
      5#
      6# This program is free software: you can redistribute it and/or modify
      7# it under the terms of the GNU General Public License as published by
      8# the Free Software Foundation, either version 3 of the License, or
      9# (at your option) any later version.
     10#
     11# This program is distributed in the hope that it will be useful,
     12# but WITHOUT ANY WARRANTY; without even the implied warranty of
     13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     14# GNU General Public License for more details.
     15#
     16# You should have received a copy of the GNU General Public License
     17# along with this program.  If not, see <http://www.gnu.org/licenses/>.
     18
     19# The string that is used for the notifications
     20message status:
     21    ok: 'OK'
     22    error: 'ERROR'
     23
     24# Default values if a process goes in timeout.
     25process in timeout:
     26    retval: -131072
     27    output: '<--##--##-->'
     28
     29# XML feed header.
     30feed:
     31    enabled: true
     32
     33    # Path of the XML feed file.
     34    # This file is most useful if served with a web server.
     35    feed: '/home/command-assert/out/command_assert.mypurpose.xml'
     36
     37    # Path of the cache file.
     38    cache: '/home/jobs/scripts/by-user/command-assert/.command_assert.mypurpose.yml'
     39
     40    total_last_feeds_to_keep: 128
     41
     42    # Feed metadata.
     43    title: 'Outages of mypurpose'
     44    link: 'https://outage.my.domain'
     45    author_name: 'bot'
     46    author_email: 'myusername@gmail.com'
     47    description: 'Updates on outages'
     48
     49commands:
     50    webserver SSL:
     51        # The command as you would execute in a shell.
     52        command: 'curl --head https://my-server.com'
     53
     54        # {stdout,stderr,both}
     55        file_descriptor: 'stdout'
     56
     57        # If set to true match for the exact expected_output.
     58        strict_matching: false
     59
     60        # A pattern that needs to be matched in the output.
     61        # Regex are NOT supported.
     62        expected_output: 'Server: Apache'
     63
     64        # The return value is usually 0 for successful processes.
     65        expected_retval: 0
     66
     67        # Force kill the process after this time interval in seconds.
     68        timeout_interval: 5
     69
     70        # if set to true, send notifications even if the process completes correctly.
     71        log_if_ok: false
     72
     73        feed:
     74            enabled: true
     75            title: 'outage mypurpose'
     76
     77            # use HTML.
     78            content: '<em>Sorry</em>, the webserver was down'
     79
     80            description: 'outage mypurpose'
     81
     82            # If an error already exists in cache for less than no_repeat_timeout_seconds,
     83            # then do not repeat the feed.
     84            no_repeat_timeout_seconds: 3600
     85
     86    SSH server:
     87        command: 'ssh -p nonexistent@my-server.com'
     88        file_descriptor: 'stderr'
     89        strict_matching: false
     90        expected_output: 'NOTICE'
     91        expected_retval: 255
     92        timeout_interval: 5
     93        log_if_ok: false
     94        feed:
     95            enabled: true
     96            title: 'outage mypurpose'
     97            content: '<em>Sorry</em>, the SSH server was down'
     98            description: 'outage mypurpose'
     99            no_repeat_timeout_seconds: 3600
    100
    101notify:
    102    email:
    103        enabled: true
    104        smtp_server: 'smtp.gmail.com'
    105        port: 465
    106        sender: 'myusername@gmail.com'
    107        user: 'myusername'
    108        password: 'my awesome password'
    109        receiver: 'myusername@gmail.com'
    110        subject: 'command assert'
    111    gotify:
    112        enabled: true
    113        url: '<gotify url>'
    114        token: '<app token>'
    115        title: 'command assert'
    116        message: 'command assert'
    117        priority: 5
    
  7. create a Systemd service unit file

    /home/jobs/services/by-user/command-assert/command-assert.mypurpose.service#
     1[Unit]
     2Description=Command assert mypurpose
     3Requires=network-online.target
     4After=network-online.target
     5
     6[Service]
     7Type=simple
     8ExecStart=/home/jobs/scripts/by-user/command-assert/command_assert.py /home/jobs/scripts/by-user/command-assert/command_assert.mypurpose.yaml
     9User=command-assert
    10Group=command-assert
    11
    12[Install]
    13WantedBy=multi-user.target
    
  8. create a Systemd timer unit file

    /home/jobs/services/by-user/command-assert/command-assert.mypurpose.timer#
    1[Unit]
    2Description=Once every 30 minutes command assert mypurpose
    3
    4[Timer]
    5OnCalendar=*:0/30
    6Persistent=true
    7
    8[Install]
    9WantedBy=timers.target
    
  9. fix owners and permissions

    chown -R command-assert:command-assert /home/jobs/{scripts,services}/by-user/command-assert
    chmod 700 -R /home/jobs/{scripts,services}/by-user/command-assert
    
  10. run the deploy script

Sharing RSS feeds#

We assume that the Apache HTTP webserver is up and running before following these steps.

  1. install the dependencies

    apt-get install bindfs
    
  2. add this entry to the fstab file. In this example the directory is mounted in /srv/http/command_assert

    /etc/fstab#
    1/home/command-assert/out /srv/http/command_assert fuse.bindfs  auto,force-user=www-data,force-group=www-data,ro 0 0
    
  3. create a directory readable be Apache

    mkdir -p /srv/http/command_assert
    chown www-data:www-data /srv/http/command_assert
    chmod 700 /srv/http/command_assert
    
  4. serve the files via HTTP by creating a new Apache virtual host. Replace FQDN with the appropriate domain and include this file from the Apache configuration

    /etc/apache2/command_assert.apache.conf#
     1<IfModule mod_ssl.c>
     2<VirtualHost *:443>
     3
     4    UseCanonicalName on
     5
     6    Keepalive On
     7    RewriteEngine on
     8
     9    ServerName ${FQDN}
    10
    11    # Set the icons also to avoid 404 errors.
    12    Alias /icons/ "/usr/share/apache2/icons/"
    13
    14    DocumentRoot "/srv/http/command_assert"
    15    <Directory "/srv/http/command_assert">
    16        Options -ExecCGI -Includes
    17        Options +Indexes +SymlinksIfOwnerMatch
    18        IndexOptions NameWidth=* +SuppressDescription FancyIndexing Charset=UTF-8 VersionSort FoldersFirst
    19
    20        ReadmeName footer.html
    21        IndexIgnore header.html footer.html
    22
    23        #
    24        # AllowOverride controls what directives may be placed in .htaccess files.
    25        # It can be "All", "None", or any combination of the keywords:
    26        #   AllowOverride FileInfo AuthConfig Limit
    27        #
    28        AllowOverride All
    29
    30        #
    31        # Controls who can get stuff from this server.
    32        #
    33        Require all granted
    34    </Directory>
    35
    36    SSLCompression      off
    37
    38    Include /etc/letsencrypt/options-ssl-apache.conf
    39    SSLCertificateFile /etc/letsencrypt/live/${FQDN}/fullchain.pem
    40    SSLCertificateKeyFile /etc/letsencrypt/live/${FQDN}/privkey.pem
    41</VirtualHost>
    42</IfModule>
    
  5. create an HTML file to be served as an explanation for the RSS feeds

    /srv/http/command_assert/footer.html#
    1<h1>Command assert</h1>
    2
    3<h2>My purpose</h2>
    
  6. restart the Apache webserver

    systemctl restart apache2
    
  7. run the bind-mount

    mount -a