Ollama#

All configurations presented here use docker-compose. Read the Docker instructions first.

Warning

This contents of this page have not been tested for all Ollama versions.

Ollama on Docker setup (CPU only)#

Variable name	Description
`DATA_PATH`	directory containing model files for Ollama
`OLLAMA_BASE_URL`	the base URL where the Ollama Docker instance is listening on, usually `http://${ADDR}:11434`.

Basic setup#

These instructions will cover the basic docker-compose setup to get Ollama running.

See also

Ollama [1]
GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. [2]
ollama/envconfig/config.go at v0.4.2 · ollama/ollama · GitHub [3]
Global Configuration Variables for Ollama · Issue #2941 · ollama/ollama · GitHub [4]
Ollama crashes with Deepseek-Coder-V2-Lite-Instruct · Issue #6199 · ollama/ollama · GitHub [5]

follow the Docker instructions

create the jobs directories. See reference

mkdir -p /home/jobs/scripts/by-user/root/docker/ollama
cd /home/jobs/scripts/by-user/root/docker/ollama

create a Docker compose file

/home/jobs/scripts/by-user/root/docker/ollama/docker-compose.yml#

version: '3'

services:
  ollama:
    image: ollama/ollama:0.1.32
    volumes:
      - ${DATA_PATH}:/root/.ollama
    container_name: ollama
    tty: true
    restart: always
    hostname: ollama
    ports:
      - 11434:11434
    environment:
      # Keep all loaded models in RAM forever.
      # See source code for reason of negative value.
      # https://github.com/ollama/ollama/blob/v0.4.2/envconfig/config.go#L99
      # No default unloading after 5 minutes.
      - OLLAMA_KEEP_ALIVE=-1

      # Might improve inference.
      - OLLAMA_FLASH_ATTENTION=true

      # You will see everything happening.
      - OLLAMA_DEBUG=true

      # In RAM.
      - OLLAMA_MAX_LOADED_MODELS=4

Note

Replace these variables with the appropriate values

DATA_PATH

Note

These settings have been tested on CPU-only setups

create a Systemd unit file. See also the Docker compose services section

/home/jobs/services/by-user/root/docker-compose.ollama.service#

[Unit]
Requires=docker.service
Requires=network-online.target
After=docker.service
After=network-online.target

[Service]
Type=simple
WorkingDirectory=/home/jobs/scripts/by-user/root/docker/ollama

ExecStart=/usr/bin/docker-compose up --remove-orphans
ExecStop=/usr/bin/docker-compose down --remove-orphans

Restart=always

[Install]
WantedBy=multi-user.target

run the deploy script

Improving performance#

To possibly improve performance you can change the CPU governor. I haven’t verified the imact of these settings.

See also

Set CPU governor to performance in 18.04 - Ask Ubuntu [6]
CPU frequency scaling - ArchWiki [7]

Warning

There might be some tools already managing the governor such as cpupower, thermald, power-profiles-daemon, etc… Before following the steps below make sure not to have any of them active as this might cause conflicts!

check the current governor

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

to change to the performance governor simply echo it to the device

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

if you have an Intel CPU there is another setting you can check. It works in a similar way
```
cat /sys/devices/system/cpu/cpu*/power/energy_perf_bias
```

set this one to maximum performance

echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias

to make these changes persistent add them to the root crontab

crontab -e

A text editor will open. Add this to the first line after the comments

@reboot /usr/bin/sleep 120 && echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor && echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias

Extras#

Variable name	Description
`AUTH_TOKEN`	the authorization token to be used by HTTPS clients to connect to Ollama

HTTPS backend with Authorization#

To be able to use Ollama remotely and in a safe manner, without having to deal with SSH tunnelling, you can use a reverse proxy.

At the current stage Ollama does not support any kind of authentication. A trick is to have something basic is by setting an authorization header in the reverse proxy. You’ll be limited to just one token.

In this example we’ll use Apache HTTPD.

See also

Requesting support for basic auth or API key authentication · Issue #1053 · ollama/ollama · GitHub [8]

generate a random authorization token that you’ll later need to copy. Use a Python shell
```
python3 -c 'import uuid; print(uuid.uuid4())'
```

serve Ollama via HTTPS by creating a new Apache virtual host. Include this file from the Apache configuration

/etc/apache2/ollama.apache.conf#

###########
# Ollama  #
###########
<IfModule mod_ssl.c>
<VirtualHost *:80>
    ServerName ${FQDN}

    # Force https.
    UseCanonicalName on
    RewriteEngine on
    RewriteCond %{SERVER_NAME} =${FQDN}

    # Ignore rewrite rules for 127.0.0.1
    RewriteCond %{HTTP_HOST} !=127.0.0.1
    RewriteCond %{REMOTE_ADDR} !=127.0.0.1

    RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
</VirtualHost>
</IfModule>

<IfModule mod_ssl.c>
<VirtualHost *:443> ${FQDN}
UseCanonicalName on

    Keepalive On
    RewriteEngine on

    ServerName ${FQDN}

    # See https://github.com/ollama/ollama/issues/1053#issuecomment-2253445923
    RewriteCond %{HTTP:Authorization} !^Bearer\s+${AUTH_TOKEN}$
    RewriteRule ^ - [R=401,L]

    SSLCompression off

    ProxyRequests Off
    RequestHeader set X-Forwarded-Proto "https"
    RequestHeader set X-Forwarded-Port "443"
    RequestHeader set X-Forwarded-Host "${FQDN}"
    RequestHeader set X-Forwarded-For "%{REMOTE_ADDR}s"
    RequestHeader set Host "${FQDN}"

    ProxyPass / http://127.0.0.1:11434/ connectiontimeout=300 timeout=300 Keepalive=On max=50
    ProxyPassReverse / http://127.0.0.1:11434/

    Include /etc/letsencrypt/options-ssl-apache.conf
    SSLCertificateFile /etc/letsencrypt/live/${FQDN}/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/${FQDN}/privkey.pem
</VirtualHost>
</IfModule>

Note

Replace these variables with the appropriate values

FQDN
AUTH_TOKEN

Note

If you use Python’s uuid4 function, the RewriteCond line can be something like:

RewriteCond %{HTTP:Authorization} !^Bearer\s+db0d7378-f45a-432a-a795-ad48bd6da621$

RewriteCond %{HTTP:Authorization} !^Bearer\s+bfe7e7be-45f0-44d2-891e-038df7b74738$

test the connection

with the correct header value

curl -H 'Authorization: Bearer ${AUTH_TOKEN}' https://${FQDN}

results in

Ollama is running

with a wrong header value

curl -H 'Authorization: Bearer fake-000' https://${FQDN}

results in

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Unauthorized</title>
</head><body>
<h1>Unauthorized</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<p>Additionally, a 401 Unauthorized
error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>

without the header alltogether

curl https://${FQDN}

results in

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Unauthorized</title>
</head><body>
<h1>Unauthorized</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<p>Additionally, a 401 Unauthorized
error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>

Android app#

There are a couple of Android apps on F-Droid capable of connecting to Ollama. For them to work it is required to follow Basic setup and HTTPS backend with Authorization. You also need to pull some models: you can use Open WebUI to perform this operation using an admin user.

See also

maid | F-Droid - Free and Open Source Android App Repository [9]
GPTMobile | F-Droid - Free and Open Source Android App Repository [10]

install GPTMobile
open the settings and fill in the API URL, key (authorization token) and model name
start a new chat

Footnotes

Ollama

Contents

Ollama#

Ollama on Docker setup (CPU only)#

Basic setup#

Improving performance#

Extras#

HTTPS backend with Authorization#

Android app#