Ollama#
All configurations presented here use docker-compose. Read the Docker instructions first.
Warning
This contents of this page have not been tested for all Ollama versions.
Ollama on Docker setup (CPU only)#
Variable name |
Description |
---|---|
|
directory containing model files for Ollama |
|
the base URL where the Ollama Docker instance is listening on, usually |
Basic setup#
These instructions will cover the basic docker-compose setup to get Ollama running.
See also
Ollama [1]
GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. [2]
ollama/envconfig/config.go at v0.4.2 · ollama/ollama · GitHub [3]
Global Configuration Variables for Ollama · Issue #2941 · ollama/ollama · GitHub [4]
Ollama crashes with Deepseek-Coder-V2-Lite-Instruct · Issue #6199 · ollama/ollama · GitHub [5]
follow the Docker instructions
create the jobs directories. See reference
mkdir -p /home/jobs/scripts/by-user/root/docker/ollama cd /home/jobs/scripts/by-user/root/docker/ollama
create a
Docker compose file
version: '3' services: ollama: image: ollama/ollama:0.1.32 volumes: - ${DATA_PATH}:/root/.ollama container_name: ollama tty: true restart: always hostname: ollama ports: - 11434:11434 environment: # Keep all loaded models in RAM forever. # See source code for reason of negative value. # https://github.com/ollama/ollama/blob/v0.4.2/envconfig/config.go#L99 # No default unloading after 5 minutes. - OLLAMA_KEEP_ALIVE=-1 # Might improve inference. - OLLAMA_FLASH_ATTENTION=true # You will see everything happening. - OLLAMA_DEBUG=true # In RAM. - OLLAMA_MAX_LOADED_MODELS=4
Note
Replace these variables with the appropriate values
DATA_PATH
Note
These settings have been tested on CPU-only setups
create a
Systemd unit file
. See also the Docker compose services section[Unit] Requires=docker.service Requires=network-online.target After=docker.service After=network-online.target [Service] Type=simple WorkingDirectory=/home/jobs/scripts/by-user/root/docker/ollama ExecStart=/usr/bin/docker-compose up --remove-orphans ExecStop=/usr/bin/docker-compose down --remove-orphans Restart=always [Install] WantedBy=multi-user.target
run the deploy script
Improving performance#
To possibly improve performance you can change the CPU governor. I haven’t verified the imact of these settings.
See also
Warning
There might be some tools already managing the governor such as cpupower, thermald, power-profiles-daemon, etc… Before following the steps below make sure not to have any of them active as this might cause conflicts!
login as root
sudo -i
check the current governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
to change to the performance governor simply echo it to the device
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
if you have an Intel CPU there is another setting you can check. It works in a similar way
cat /sys/devices/system/cpu/cpu*/power/energy_perf_bias
set this one to maximum performance
echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias
to make these changes persistent add them to the root crontab
crontab -e
A text editor will open. Add this to the first line after the comments
@reboot /usr/bin/sleep 120 && echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor && echo 0 | tee /sys/devices/system/cpu/cpu*/power/energy_perf_bias
Extras#
Variable name |
Description |
---|---|
|
the authorization token to be used by HTTPS clients to connect to Ollama |
Android app#
There are a couple of Android apps on F-Droid capable of connecting to Ollama. For them to work it is required to follow Basic setup and HTTPS backend with Authorization. You also need to pull some models: you can use Open WebUI to perform this operation using an admin user.
See also
install GPTMobile
open the settings and fill in the API URL, key (authorization token) and model name
start a new chat
Footnotes