Ollama#

All configurations presented here use docker-compose. Read the Docker instructions first.

Avvertimento

This contents of this page have not been tested for all Ollama versions.

Ollama on Docker setup (CPU only)#

Variable name

Description

DATA_PATH

directory containing model files for Ollama

OLLAMA_BASE_URL

the base URL where the Ollama Docker instance is listening on, usually http://${ADDR}:11434.

Basic setup#

These instructions will cover the basic docker-compose setup to get Ollama running.

Vedi anche

  • Ollama [1]

  • GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. [2]

  • ollama/envconfig/config.go at v0.4.2 · ollama/ollama · GitHub [3]

  • Global Configuration Variables for Ollama · Issue #2941 · ollama/ollama · GitHub [4]

  • Ollama crashes with Deepseek-Coder-V2-Lite-Instruct · Issue #6199 · ollama/ollama · GitHub [5]

  1. follow the Docker instructions

  2. create the jobs directories. See reference

    mkdir -p /home/jobs/scripts/by-user/root/docker/ollama
    cd /home/jobs/scripts/by-user/root/docker/ollama
    
  3. create a Docker compose file

    /home/jobs/scripts/by-user/root/docker/ollama/docker-compose.yml#
    version: '3'
    
    services:
      ollama:
        image: ollama/ollama:0.1.32
        volumes:
          - ${DATA_PATH}:/root/.ollama
        container_name: ollama
        tty: true
        restart: always
        hostname: ollama
        ports:
          - 11434:11434
        environment:
          # Keep all loaded models in RAM forever.
          # See source code for reason of negative value.
          # https://github.com/ollama/ollama/blob/v0.4.2/envconfig/config.go#L99
          # No default unloading after 5 minutes.
          - OLLAMA_KEEP_ALIVE=-1
    
          # Might improve inference.
          - OLLAMA_FLASH_ATTENTION=true
    
          # You will see everything happening.
          - OLLAMA_DEBUG=true
    
          # In RAM.
          - OLLAMA_MAX_LOADED_MODELS=4
    

    Nota

    Replace these variables with the appropriate values

    • DATA_PATH

    Nota

    These settings have been tested on CPU-only setups

  4. create a Systemd unit file. See also the Docker compose services section

    /home/jobs/services/by-user/root/docker-compose.ollama.service#
    [Unit]
    Requires=docker.service
    Requires=network-online.target
    After=docker.service
    After=network-online.target
    
    [Service]
    Type=simple
    WorkingDirectory=/home/jobs/scripts/by-user/root/docker/ollama
    
    ExecStart=/usr/bin/docker-compose up --remove-orphans
    ExecStop=/usr/bin/docker-compose down --remove-orphans
    
    Restart=always
    
    [Install]
    WantedBy=multi-user.target
    
  5. run the deploy script

Improving performance#

To possibly improve performance you can change the CPU governor. I haven’t verified the imact of these settings.

Vedi anche

  • Set CPU governor to performance in 18.04 - Ask Ubuntu [6]

  • CPU frequency scaling - ArchWiki [7]

Avvertimento

There might be some tools already managing the governor such as cpupower, thermald, power-profiles-daemon, etc… Before following the steps below make sure not to have any of them active as this might cause conflicts!

  1. login as root

    sudo -i
    
  2. check the current governor

    cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
  3. to change to the performance governor simply echo it to the device

    echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
  4. if you have an Intel CPU there is another setting you can check. It works in a similar way

    cat /sys/devices/system/cpu/cpu*/power/energy_perf_bias
    
  5. set this one to maximum performance

    echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias
    
  6. to make these changes persistent add them to the root crontab

    crontab -e
    

    A text editor will open. Add this to the first line after the comments

    @reboot /usr/bin/sleep 120 && echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor && echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias
    

Extras#

Variable name

Description

AUTH_TOKEN

the authorization token to be used by HTTPS clients to connect to Ollama

HTTPS backend with Authorization#

To be able to use Ollama remotely and in a safe manner, without having to deal with SSH tunnelling, you can use a reverse proxy.

At the current stage Ollama does not support any kind of authentication. A trick is to have something basic is by setting an authorization header in the reverse proxy. You’ll be limited to just one token.

In this example we’ll use Apache HTTPD.

Vedi anche

  • Requesting support for basic auth or API key authentication · Issue #1053 · ollama/ollama · GitHub [8]

  1. generate a random authorization token that you’ll later need to copy. Use a Python shell

    python3 -c 'import uuid; print(uuid.uuid4())'
    
  2. serve Ollama via HTTPS by creating a new Apache virtual host. Include this file from the Apache configuration

    /etc/apache2/ollama.apache.conf#
    ###########
    # Ollama  #
    ###########
    <IfModule mod_ssl.c>
    <VirtualHost *:80>
        ServerName ${FQDN}
    
        # Force https.
        UseCanonicalName on
        RewriteEngine on
        RewriteCond %{SERVER_NAME} =${FQDN}
    
        # Ignore rewrite rules for 127.0.0.1
        RewriteCond %{HTTP_HOST} !=127.0.0.1
        RewriteCond %{REMOTE_ADDR} !=127.0.0.1
    
        RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
    </VirtualHost>
    </IfModule>
    
    <IfModule mod_ssl.c>
    <VirtualHost *:443> ${FQDN}
    UseCanonicalName on
    
        Keepalive On
        RewriteEngine on
    
        ServerName ${FQDN}
    
        # See https://github.com/ollama/ollama/issues/1053#issuecomment-2253445923
        RewriteCond %{HTTP:Authorization} !^Bearer\s+${AUTH_TOKEN}$
        RewriteRule ^ - [R=401,L]
    
        SSLCompression off
    
        ProxyRequests Off
        RequestHeader set X-Forwarded-Proto "https"
        RequestHeader set X-Forwarded-Port "443"
        RequestHeader set X-Forwarded-Host "${FQDN}"
        RequestHeader set X-Forwarded-For "%{REMOTE_ADDR}s"
        RequestHeader set Host "${FQDN}"
    
        ProxyPass / http://127.0.0.1:11434/ connectiontimeout=300 timeout=300 Keepalive=On max=50
        ProxyPassReverse / http://127.0.0.1:11434/
    
        Include /etc/letsencrypt/options-ssl-apache.conf
        SSLCertificateFile /etc/letsencrypt/live/${FQDN}/fullchain.pem
        SSLCertificateKeyFile /etc/letsencrypt/live/${FQDN}/privkey.pem
    </VirtualHost>
    </IfModule>
    

    Nota

    Replace these variables with the appropriate values

    • FQDN

    • AUTH_TOKEN

    Nota

    If you use Python’s uuid4 function, the RewriteCond line can be something like:

    RewriteCond %{HTTP:Authorization} !^Bearer\s+db0d7378-f45a-432a-a795-ad48bd6da621$
    

    or

    RewriteCond %{HTTP:Authorization} !^Bearer\s+bfe7e7be-45f0-44d2-891e-038df7b74738$
    
  3. test the connection

    1. with the correct header value

      curl -H 'Authorization: Bearer ${AUTH_TOKEN}' https://${FQDN}
      

      results in

      Ollama is running
      
    2. with a wrong header value

      curl -H 'Authorization: Bearer fake-000' https://${FQDN}
      

      results in

      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      <html><head>
      <title>401 Unauthorized</title>
      </head><body>
      <h1>Unauthorized</h1>
      <p>This server could not verify that you
      are authorized to access the document
      requested.  Either you supplied the wrong
      credentials (e.g., bad password), or your
      browser doesn't understand how to supply
      the credentials required.</p>
      <p>Additionally, a 401 Unauthorized
      error was encountered while trying to use an ErrorDocument to handle the request.</p>
      </body></html>
      
    3. without the header alltogether

      curl https://${FQDN}
      

      results in

      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
      <html><head>
      <title>401 Unauthorized</title>
      </head><body>
      <h1>Unauthorized</h1>
      <p>This server could not verify that you
      are authorized to access the document
      requested.  Either you supplied the wrong
      credentials (e.g., bad password), or your
      browser doesn't understand how to supply
      the credentials required.</p>
      <p>Additionally, a 401 Unauthorized
      error was encountered while trying to use an ErrorDocument to handle the request.</p>
      </body></html>
      

Android app#

There are a couple of Android apps on F-Droid capable of connecting to Ollama. For them to work it is required to follow Basic setup and HTTPS backend with Authorization. You also need to pull some models: you can use Open WebUI to perform this operation using an admin user.

Vedi anche

  • maid | F-Droid - Free and Open Source Android App Repository [9]

  • GPTMobile | F-Droid - Free and Open Source Android App Repository [10]

  1. install GPTMobile

  2. open the settings and fill in the API URL, key (authorization token) and model name

  3. start a new chat

Footnotes