Ollama turns any Linux server into a private AI inference endpoint. Whether you're running a homelab, a cloud VM, or a bare-metal Ubuntu box, this guide walks you through everything — from a fresh Ubuntu install to serving LLM completions over your network.
Prerequisites
- A machine running Ubuntu 20.04, 22.04, or 24.04 (desktop or server edition)
sudoaccess- At least 8GB of RAM (16GB+ recommended for larger models)
- (Optional) An NVIDIA or AMD GPU for hardware-accelerated inference
Step 1: Update Your System
Always start with a fully updated system to avoid dependency conflicts:
sudo apt update && sudo apt upgrade -y
Reboot if a kernel update was applied:
sudo reboot
Step 2: Install the Ollama CLI
Ollama provides an official install script that handles everything — binary download, PATH setup, and systemd service registration.
The script will output something like:
curl -fsSL https://ollama.com/install.sh | sh
Verify the installation:
ollama --version
# ollama version is 0.17.7
Prefer a manual install? Download the binary directly from github.com/ollama/ollama/releases, move it to/usr/local/bin/ollama, and make it executable withchmod +x /usr/local/bin/ollama.
Step 3: (Optional) Install NVIDIA GPU Drivers
Skip this step if you're running CPU-only inference. If you have an NVIDIA GPU, install the drivers and CUDA toolkit before proceeding — Ollama will detect and use them automatically.
Check if a GPU is present:
Install the recommended NVIDIA driver:
After reboot, verify the driver is loaded:
You should see your GPU listed with its driver version and VRAM. Ollama will now offload model layers to the GPU automatically when you run a model.
lspci | grep -i nvidia
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot
nvidia-smi
Step 4: Start and Enable the Ollama Service
The installer registers Ollama as a systemd service that starts automatically on boot. Check its status:sudo systemctl status ollama
sudo systemctl start ollama
sudo systemctl enable ollama
curl http://localhost:11434
# Ollama is running
Step 5: Pull Your First Model
Now let's download an LLM. We'll use Llama 3.2 (3B) — a capable model that runs well on modest hardware:ollama pull llama3.2
ollama list
# NAME ID SIZE MODIFIED
# llama3.2:latest 91ab477bec9d 2.0 GB 5 seconds ago
ollama pull mistral # Mistral 7B — great all-rounder
ollama pull gemma2:2b # Google Gemma 2 (2B) — fast and lightweight
ollama pull codellama # Code-focused Llama variant
ollama pull llama3.1:70b # Llama 3.1 70B — needs 40GB+ RAM or a large GPU
Step 6: Run the Model Interactively
Test your setup with an interactive chat session:ollama run llama3.2
>>> Send a message (/? for help)
>>> Summarise what Mulesoft is in two sentences.
Type
Generate a completion:
/bye to exit. The model stays cached locally for future use.Step 7: Use the REST API
Ollama's built-in REST API lets you integrate LLMs into your own applications. By default it's only accessible onlocalhost— we'll open it up to the network in the next step.Generate a completion:
curl http://localhost:11434/api/generate \
-d '{
"model": "llama3.2",
"prompt": "What is Mulesoft best used for?",
"stream": false
}'
Chat endpoint:
curl http://localhost:11434/api/chat \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Give me three reasons to self-host AI models." }
],
"stream": false
}'
List loaded models:
curl http://localhost:11434/api/tags
Step 8: Expose Ollama to Your Network (Optional)
By default, Ollama only listens on127.0.0.1. To make it accessible from other machines on your network, configure it to bind to all interfaces.Edit the systemd service override:
sudo systemctl edit ollamaThis opens a blank override file. Add the following:
[Service]Save, then reload and restart the service:
Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl daemon-reloadVerify it's listening on all interfaces:
sudo systemctl restart ollama
ss -tlnp | grep 11434Now you can call the API from any machine on your network:
# LISTEN 0 128 0.0.0.0:11434 0.0.0.0:*
curl http://<your-server-ip>:11434/api/generate \
-d '{"model": "llama3.2", "prompt": "Hello!", "stream": false}'
Security note: Ollama has no built-in authentication. If exposing it to a network, protect port 11434 with a firewall rule, a reverse proxy with auth (like Nginx + basic auth), or a VPN.Step 9: (Optional) Set Up a Firewall Rule
If you're usingufw (Ubuntu's default firewall), restrict access to trusted IPs only:# Allow only a specific IP to reach Ollama
sudo ufw allow from 192.168.1.50 to any port 11434
# Or allow the whole local subnet
sudo ufw allow from 192.168.1.0/24 to any port 11434
sudo ufw enable
sudo ufw status
Step 10: (Optional) Add a Web UI with Open WebUI
Want a ChatGPT-style interface for your server? Open WebUI connects directly to your Ollama instance and just needs Docker:# Install Docker if not already installedOpen your browser and navigate to
sudo apt install -y docker.io
sudo systemctl enable --now docker
# Run Open WebUI connected to Ollama
docker run -d \
--name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host-gateway:11434 \
--add-host=host-gateway:host-gateway \
--restart always \
ghcr.io/open-webui/open-webui:main
http://<your-server-ip>:3000. You'll have a full chat UI backed by your local Ollama instance.Managing the Ollama Service
| Task | Command |
|---|---|
| Check status | sudo systemctl status ollama |
| Start | sudo systemctl start ollama |
| Stop | sudo systemctl stop ollama |
| Restart | sudo systemctl restart ollama |
| View logs | journalctl -u ollama -f |
| List models | ollama list |
| Remove a model | ollama rm <model-name> |
Troubleshooting
Ollama service fails to start:
journalctl -u ollama -n 50 --no-pager
curl returns connection refused: Make sure the service is running: sudo systemctl status ollamaGPU not detected after driver install: Ensure you rebooted after the driver installation, then check
nvidia-smi. Ollama reads GPU availability at startup.Slow inference on CPU: This is expected — LLMs are computationally expensive. Try a smaller model like
gemma2:2b or llama3.2 (3B) for faster responses without a GPU.Port 11434 not reachable from another machine: Check that
OLLAMA_HOST=0.0.0.0 is set in the systemd override and that your firewall allows the port.