Build a Home Lab for Local LLMs with Docker + AMD iGPU

Running local LLMs at home is easier than it looks. With the right mini PC, Ubuntu setup, BIOS tweaks, and Docker containers, you can build a powerful lab that runs models like Mistral 7B and Ollama on AMD hardware. Here’s how I set up mine.

It started the way many of my tinkering projects do: with a cup of coffee and a quiet Saturday morning. For weeks I’d been thinking about building a small server — something quiet enough to sit under the desk, strong enough to handle 7B–8B (or even 12-14B!) models, and flexible enough to let me learn Docker properly without melting my laptop. This time, instead of just daydreaming, I actually pressed “Order”. A little box was on its way to become my home LLM LAB.

Contents ☕️📰 hide

1 What Beans Do You Buy for the Brew? (Choosing Hardware for Local LLMs)

2 Ubuntu or Fedora? (Best Linux Distro for LLMs)

3 Grinding the Beans: Making the Bootable USB

4 Dialling in the Recipe: BIOS Settings

5 The First Pour-Over: Installer Woes

6 First Boot: Updates, Mirrors, SSH

6.1 Use a fast HTTPS mirror (deb822 on 24.04)

6.2 Update, HWE kernel, firmware, reboot

6.3 SSH server + basic hardening

7 Brewing With Vulkan: AMD iGPU Setup for llama.cpp

7.1 Build llama.cpp with Vulkan (+ server)

8 From Espresso Shot to Full Brew: Running Mistral 7B Locally

9 Ollama: The Quick Filter Coffee

10 Docker: The Brewing Gear

11 Open WebUI: The Café Counter

11.1 Option 1: Host networking (simplest)

12 Monitoring: Watching the Brew

13 Useful links

14 Lessons Learned From This Coffee Journey

What Beans Do You Buy for the Brew? (Choosing Hardware for Local LLMs)

Like coffee beans, hardware choice makes all the difference. I fed my requirements into Perplexity: small form factor, strong iGPU, 64 GB RAM, at least 1 TB NVMe, Wi-Fi 7, USB4; must handle local models comfortably.

After trimming the shortlist, this landed as the Goldilocks choice:

GMKtec EVO-X2 AI Mini PC — AMD Ryzen AI Max+ 395 (up to 5.1 GHz), 64 GB LPDDR5X 8000 MHz, 1 TB PCIe 4.0 SSD, quad-screen 8K display, Wi-Fi 7 & USB4, SD Card Reader 4.0.
Compact, quiet, and powerful enough to be a real LLM playground.

Ubuntu or Fedora? (Best Linux Distro for LLMs)

Once the box arrived, the next question was: what flavour of Linux should I install?
Your choice of distro sets the tone for everything else — just like choosing beans sets the base flavour for your coffee.

I considered two main options for my home lab for LLMs:

Fedora: super-fresh kernels/Mesa (great for brand-new AMD graphics), SELinux by default, fast cadence.
Ubuntu 24.04 LTS: long support window, huge community resources, predictable packaging, and the HWE kernelkeeps graphics reasonably fresh.

I know Ubuntu best right now, so: Ubuntu 24.04.3 LTS it is.

Grinding the Beans: Making the Bootable USB

Before we can install Ubuntu, we need a way to get it onto the box. That means creating a bootable USB stick.
This is the equivalent of grinding your beans before the brew: a small but essential step.

# 1) Identify the target device (careful!)
lsblk -o NAME,SIZE,MODEL,TRAN

# 2) Write the ISO (note the capital M in bs=4M)
sudo dd if=~/Downloads/ubuntu-24.04.3-desktop-amd64.iso of=/dev/sdX bs=4M status=progress oflag=sync
sync

Tip: it’s 4M, not 4m. Lowercase “m” throws dd: invalid number: '4m'.

Dialling in the Recipe: BIOS Settings

Now comes the behind-the-scenes prep.
Before the operating system even loads, we need to make sure the BIOS is set up properly so the hardware can perform at its best. Think of this as checking your kettle, grinder, and scale before you brew.

UEFI boot, NVMe first (USB first only for the install pass)
SVM / Virtualization: Enabled
Resizable BAR / Above 4G Decoding: Enabled (if present)
Secure Boot: OK to leave On
UMA/iGPU memory: Low (headless) or 8–16 GB if you’ll attach a monitor
Fan profile: Performance

Save → reboot → boot from the USB stick.

The First Pour-Over: Installer Woes

With the BIOS ready, it was time to actually install Ubuntu on my home lab for LLMs.
This is usually straightforward — but as anyone who’s built servers knows, sometimes you hit a bump.

In my case, the installer failed with an “unknown error.” The culprit was a flaky official mirror. Solution:

Switch installer to offline mode
Unplug Ethernet cable
Run installer fully offline → success

That small hiccup gave me old-school sysadmin vibes.

First Boot: Updates, Mirrors, SSH

Once the OS was installed, the system was running — but not yet ready.
A fresh Linux installation always needs updates, a good mirror, and a secure way to connect remotely. Here’s how I set that up.

Use a fast HTTPS mirror (deb822 on 24.04)

sudo cp /etc/apt/sources.list.d/ubuntu.sources{,.bak} 2>/dev/null || true
sudo tee /etc/apt/sources.list.d/ubuntu.sources >/dev/null <<'EOF'
Types: deb
URIs: https://mirror.mythic-beasts.com/ubuntu
Suites: noble noble-updates noble-backports
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

Types: deb
URIs: https://mirror.mythic-beasts.com/ubuntu
Suites: noble-security
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
EOF

echo 'Acquire::ForceIPv4 "true";' | sudo tee /etc/apt/apt.conf.d/99force-ipv4
sudo apt clean && sudo apt update

Update, HWE kernel, firmware, reboot

sudo apt full-upgrade -y
sudo apt install -y linux-generic-hwe-24.04
sudo apt install --reinstall -y linux-firmware
sudo reboot

SSH server + basic hardening

sudo apt install -y openssh-server
sudo systemctl enable --now ssh
sudo ufw allow OpenSSH

# keys (from your client):
ssh-keygen -t ed25519 -C "$(whoami)@$(hostname)"
ssh-copy-id <user>@<server-ip>

# simple hardening
sudo bash -c 'cat >/etc/ssh/sshd_config.d/90-custom.conf' <<'EOF'
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
KbdInteractiveAuthentication no
X11Forwarding no
ClientAliveInterval 60
ClientAliveCountMax 3
MaxAuthTries 3
EOF
sudo sshd -t && sudo systemctl reload ssh

Brewing With Vulkan: AMD iGPU Setup for llama.cpp

With the system online and secure, it was time to unlock GPU acceleration.
Running models efficiently on AMD iGPUs requires Vulkan drivers and proper group permissions. This step makes the difference between a server that runs LLMs and one that runs them well.

# Userspace + tools
sudo apt install -y git build-essential cmake pkg-config \
  libvulkan-dev libvulkan1 mesa-vulkan-drivers vulkan-tools \
  libcurl4-openssl-dev glslc spirv-tools radeontop

# GPU access (new login required to apply groups)
sudo usermod -aG render,video "$USER"

# Optional: silence headless warning in current shell
export XDG_RUNTIME_DIR=/run/user/$(id -u); mkdir -p "$XDG_RUNTIME_DIR"; chmod 700 "$XDG_RUNTIME_DIR"

# Quick check
vulkaninfo --summary | head -n 20

Build llama.cpp with Vulkan (+ server)

cd ~
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
cmake -S . -B build -DGGML_VULKAN=ON -DLLAMA_BUILD_SERVER=ON
cmake --build build -j"$(nproc)"

Smoke test with a tiny model:

mkdir -p ~/models/tinyllama && cd ~/models/tinyllama
curl -L -o TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf \
  "https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf?download=true"
xxd -l 4 -p TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf   # 47475546 = "GGUF"

AMD_VULKAN_ICD=RADV ~/llama.cpp/build/bin/llama-cli \
  -m ~/models/tinyllama/TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf \
  -ngl 999 -c 4096 -b 512 \
  -p "Say hello in one short sentence."

From Espresso Shot to Full Brew: Running Mistral 7B Locally

Once the smoke test worked, it was time to move up to a real model: Mistral 7B.
This step transforms the box from “toy” to “serious LLM lab.”

export LLAMA_CACHE=~/models/.cache/llama
mkdir -p "$LLAMA_CACHE"

# Serve on all interfaces so Docker bridge containers can reach it
AMD_VULKAN_ICD=RADV ~/llama.cpp/build/bin/llama-server \
  --hf-repo TheBloke/Mistral-7B-Instruct-v0.2-GGUF \
  --hf-file mistral-7b-instruct-v0.2.Q4_K_M.gguf \
  --alias mistral-7b-q4km \
  -ngl 999 -c 8192 -b 512 --threads "$(nproc)" \
  --host 0.0.0.0 --port 8081

Sanity:

curl -s http://127.0.0.1:8081/v1/models | jq .

Ollama: The Quick Filter Coffee

Not every interaction needs the GPU. Sometimes you just want something fast and simple.
That’s where Ollama fits in — easy pulls, OpenAI-compatible API, and CPU execution on AMD Linux today.

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
curl -s http://127.0.0.1:11434/api/tags | jq .

Is it using CPU or GPU? Watch both:

# CPU
htop   # look for "ollama"

# GPU busy % (AMD)
watch -n0.5 'cat /sys/class/drm/card0/device/gpu_busy_percent'
sudo apt install -y radeontop && radeontop

# Who's using the render node?
sudo lsof -nP /dev/dri/renderD128

For guaranteed AMD GPU acceleration, run bigger jobs via llama.cpp (Vulkan) and keep Ollama for quick fiddling.

Docker: The Brewing Gear

At this point, my home lab for LLMs can run models directly. But to really learn and scale, Docker is essential.
Containers keep experiments clean and reproducible — like having different brewers for espresso, pour-over, and cold brew without mixing flavours.

# prerequisites
sudo apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null || true
sudo apt update
sudo apt install -y ca-certificates curl gnupg

# repo + key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu noble stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list >/dev/null

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"
newgrp docker <<<' '
docker version

Open WebUI: The Café Counter

With Ollama and llama.cpp running, Open WebUI becomes the friendly barista.
It ties everything together — one interface to manage models, prompts, and histories. It’s the café counter where all the tools meet.

Option 1: Host networking (simplest)

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    network_mode: "host"
    environment:
      - OLLAMA_BASE_URL=http://127.0.0.1:11434
      # We’ll add llama.cpp as a second "OpenAI-compatible" provider in the UI:
      - OPENAI_API_BASE=http://127.0.0.1:8081
      - OPENAI_API_KEY=local
    volumes:
      - /home/zagor/src/open-webui/data:/app/backend/data
    restart: unless-stopped

Start it:

docker compose up -d

Open http://<server-ip>:8080, go to Settings → Providers → Add OpenAI-compatible:

Base URL: http://127.0.0.1:8081 (no /v1)
API key: local (anything)

Monitoring: Watching the Brew

Once models are running, it’s easy to forget the system itself.
But keeping an eye on GPU load, memory, and containers ensures smooth performance — like watching the kettle while the coffee drips.

# GPU
watch -n0.5 'cat /sys/class/drm/card0/device/gpu_busy_percent'
sudo apt install -y radeontop && radeontop
sudo mount -t debugfs none /sys/kernel/debug 2>/dev/null || true
watch -n0.5 'sudo egrep "GPU load|GPU clock|Memory clock" /sys/kernel/debug/dri/0/amdgpu_pm_info'

# RAM + swap
watch -n1 free -h
swapon --show

# Containers
docker stats

Useful links

Category	Description	Link
Official Site & Docs	Main LM Studio website	lmstudio.ai
System Requirements	Hardware and software requirements	System Requirements
Getting Started	Basic usage and setup instructions	Getting Started
AMD ROCm Support	Info on AMD GPU support via ROCm	ROCm & Ryzen AI
Docker Image (Unofficial)	Docker image for LM Studio with CUDA	Docker Hub – noneabove1182/lmstudio-cuda
Docker Compose Example	Example docker-compose for LM Studio	GitLab LM Studio docker-compose
MCP Toolkit Guide	Guide to running MCP Toolkit with LM Studio	DEV Guide
Running LLMs with LM Studio	General guide on running local LLMs	GPU Mart Guide
ROCm Supported GPUs	AMD ROCm compatible GPUs list	ROCm Supported GPUs
Unlock AMD GPU Support	Guide to enable LM Studio on any AMD GPU	ROCm Unlock Guide
AMD GPU LLM Guide	Beginner Guide: AMD GPU for Local LLMs	TechTeamGB Guide

Lessons Learned From This Coffee Journey

A server build is like brewing coffee — patience and small adjustments make the difference.
BIOS settings matter more than you think; like water temperature, get them wrong and nothing tastes right.
Expect hiccups: a flaky mirror or a mis-typed dd flag is part of the journey.
Document as you go — future you will thank you when something breaks at 2am.
Start small (TinyLlama) before going full espresso shot (Mistral 7B).

By the time the last commands finished running, my coffee was cold — but the LAB was alive. A capable server now sits under my desk, running Docker containers, Ollama for quick fiddling, and llama.cpp with Vulkan acceleration when I want to stretch the GPU.

And like brewing coffee, the fun isn’t in perfection. It’s in the practice.

PS. I used AI-generated picture because my own desk is too messy 🙂

👉 Over to you: Have you tried setting up a local LLM LAB yet? What “mirror trolls” or BIOS quirks did you run into? Share your stories — I’d love to compare notes with my home lab for LLMs saga 🙂

Thanks for reading!

If you enjoyed this article, feel free to share it on social media and spread some positivity, and join my newsletter.