Running local LLMs at home is easier than it looks. With the right mini PC, Ubuntu setup, BIOS tweaks, and Docker containers, you can build a powerful lab that runs models like Mistral 7B and Ollama on AMD hardware. Here’s how I set up mine.
It started the way many of my tinkering projects do: with a cup of coffee and a quiet Saturday morning. For weeks I’d been thinking about building a small server — something quiet enough to sit under the desk, strong enough to handle 7B–8B (or even 12-14B!) models, and flexible enough to let me learn Docker properly without melting my laptop. This time, instead of just daydreaming, I actually pressed “Order”. A little box was on its way to become my home LLM LAB.
What Beans Do You Buy for the Brew? (Choosing Hardware for Local LLMs)
Like coffee beans, hardware choice makes all the difference. I fed my requirements into Perplexity: small form factor, strong iGPU, 64 GB RAM, at least 1 TB NVMe, Wi-Fi 7, USB4; must handle local models comfortably.
After trimming the shortlist, this landed as the Goldilocks choice:
GMKtec EVO-X2 AI Mini PC — AMD Ryzen AI Max+ 395 (up to 5.1 GHz), 64 GB LPDDR5X 8000 MHz, 1 TB PCIe 4.0 SSD, quad-screen 8K display, Wi-Fi 7 & USB4, SD Card Reader 4.0.
Compact, quiet, and powerful enough to be a real LLM playground.
Ubuntu or Fedora? (Best Linux Distro for LLMs)
Once the box arrived, the next question was: what flavour of Linux should I install?
Your choice of distro sets the tone for everything else — just like choosing beans sets the base flavour for your coffee.
I considered two main options for my home lab for LLMs:
- Fedora: super-fresh kernels/Mesa (great for brand-new AMD graphics), SELinux by default, fast cadence.
- Ubuntu 24.04 LTS: long support window, huge community resources, predictable packaging, and the HWE kernelkeeps graphics reasonably fresh.
I know Ubuntu best right now, so: Ubuntu 24.04.3 LTS it is.
Grinding the Beans: Making the Bootable USB
Before we can install Ubuntu, we need a way to get it onto the box. That means creating a bootable USB stick.
This is the equivalent of grinding your beans before the brew: a small but essential step.
# 1) Identify the target device (careful!)
lsblk -o NAME,SIZE,MODEL,TRAN
# 2) Write the ISO (note the capital M in bs=4M)
sudo dd if=~/Downloads/ubuntu-24.04.3-desktop-amd64.iso of=/dev/sdX bs=4M status=progress oflag=sync
sync
Tip: it’s
4M
, not4m
. Lowercase “m” throwsdd: invalid number: '4m'
.
Dialling in the Recipe: BIOS Settings
Now comes the behind-the-scenes prep.
Before the operating system even loads, we need to make sure the BIOS is set up properly so the hardware can perform at its best. Think of this as checking your kettle, grinder, and scale before you brew.
- UEFI boot, NVMe first (USB first only for the install pass)
- SVM / Virtualization: Enabled
- Resizable BAR / Above 4G Decoding: Enabled (if present)
- Secure Boot: OK to leave On
- UMA/iGPU memory: Low (headless) or 8–16 GB if you’ll attach a monitor
- Fan profile: Performance
Save → reboot → boot from the USB stick.
The First Pour-Over: Installer Woes
With the BIOS ready, it was time to actually install Ubuntu on my home lab for LLMs.
This is usually straightforward — but as anyone who’s built servers knows, sometimes you hit a bump.
In my case, the installer failed with an “unknown error.” The culprit was a flaky official mirror. Solution:
- Switch installer to offline mode
- Unplug Ethernet cable
- Run installer fully offline → success
That small hiccup gave me old-school sysadmin vibes.
First Boot: Updates, Mirrors, SSH
Once the OS was installed, the system was running — but not yet ready.
A fresh Linux installation always needs updates, a good mirror, and a secure way to connect remotely. Here’s how I set that up.
Use a fast HTTPS mirror (deb822 on 24.04)
sudo cp /etc/apt/sources.list.d/ubuntu.sources{,.bak} 2>/dev/null || true
sudo tee /etc/apt/sources.list.d/ubuntu.sources >/dev/null <<'EOF'
Types: deb
URIs: https://mirror.mythic-beasts.com/ubuntu
Suites: noble noble-updates noble-backports
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
Types: deb
URIs: https://mirror.mythic-beasts.com/ubuntu
Suites: noble-security
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
EOF
echo 'Acquire::ForceIPv4 "true";' | sudo tee /etc/apt/apt.conf.d/99force-ipv4
sudo apt clean && sudo apt update
Update, HWE kernel, firmware, reboot
sudo apt full-upgrade -y
sudo apt install -y linux-generic-hwe-24.04
sudo apt install --reinstall -y linux-firmware
sudo reboot
SSH server + basic hardening
sudo apt install -y openssh-server
sudo systemctl enable --now ssh
sudo ufw allow OpenSSH
# keys (from your client):
ssh-keygen -t ed25519 -C "$(whoami)@$(hostname)"
ssh-copy-id <user>@<server-ip>
# simple hardening
sudo bash -c 'cat >/etc/ssh/sshd_config.d/90-custom.conf' <<'EOF'
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
KbdInteractiveAuthentication no
X11Forwarding no
ClientAliveInterval 60
ClientAliveCountMax 3
MaxAuthTries 3
EOF
sudo sshd -t && sudo systemctl reload ssh
Brewing With Vulkan: AMD iGPU Setup for llama.cpp
With the system online and secure, it was time to unlock GPU acceleration.
Running models efficiently on AMD iGPUs requires Vulkan drivers and proper group permissions. This step makes the difference between a server that runs LLMs and one that runs them well.
# Userspace + tools
sudo apt install -y git build-essential cmake pkg-config \
libvulkan-dev libvulkan1 mesa-vulkan-drivers vulkan-tools \
libcurl4-openssl-dev glslc spirv-tools radeontop
# GPU access (new login required to apply groups)
sudo usermod -aG render,video "$USER"
# Optional: silence headless warning in current shell
export XDG_RUNTIME_DIR=/run/user/$(id -u); mkdir -p "$XDG_RUNTIME_DIR"; chmod 700 "$XDG_RUNTIME_DIR"
# Quick check
vulkaninfo --summary | head -n 20
Build llama.cpp with Vulkan (+ server)
cd ~
git clone https://github.com/ggerganov/llama.cpp.git
cd ~/llama.cpp
cmake -S . -B build -DGGML_VULKAN=ON -DLLAMA_BUILD_SERVER=ON
cmake --build build -j"$(nproc)"
Smoke test with a tiny model:
mkdir -p ~/models/tinyllama && cd ~/models/tinyllama
curl -L -o TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf \
"https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf?download=true"
xxd -l 4 -p TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf # 47475546 = "GGUF"
AMD_VULKAN_ICD=RADV ~/llama.cpp/build/bin/llama-cli \
-m ~/models/tinyllama/TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf \
-ngl 999 -c 4096 -b 512 \
-p "Say hello in one short sentence."
From Espresso Shot to Full Brew: Running Mistral 7B Locally
Once the smoke test worked, it was time to move up to a real model: Mistral 7B.
This step transforms the box from “toy” to “serious LLM lab.”
export LLAMA_CACHE=~/models/.cache/llama
mkdir -p "$LLAMA_CACHE"
# Serve on all interfaces so Docker bridge containers can reach it
AMD_VULKAN_ICD=RADV ~/llama.cpp/build/bin/llama-server \
--hf-repo TheBloke/Mistral-7B-Instruct-v0.2-GGUF \
--hf-file mistral-7b-instruct-v0.2.Q4_K_M.gguf \
--alias mistral-7b-q4km \
-ngl 999 -c 8192 -b 512 --threads "$(nproc)" \
--host 0.0.0.0 --port 8081
Sanity:
curl -s http://127.0.0.1:8081/v1/models | jq .
Ollama: The Quick Filter Coffee
Not every interaction needs the GPU. Sometimes you just want something fast and simple.
That’s where Ollama fits in — easy pulls, OpenAI-compatible API, and CPU execution on AMD Linux today.
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
curl -s http://127.0.0.1:11434/api/tags | jq .
Is it using CPU or GPU? Watch both:
# CPU
htop # look for "ollama"
# GPU busy % (AMD)
watch -n0.5 'cat /sys/class/drm/card0/device/gpu_busy_percent'
sudo apt install -y radeontop && radeontop
# Who's using the render node?
sudo lsof -nP /dev/dri/renderD128
For guaranteed AMD GPU acceleration, run bigger jobs via llama.cpp (Vulkan) and keep Ollama for quick fiddling.
Docker: The Brewing Gear
At this point, my home lab for LLMs can run models directly. But to really learn and scale, Docker is essential.
Containers keep experiments clean and reproducible — like having different brewers for espresso, pour-over, and cold brew without mixing flavours.
# prerequisites
sudo apt-get remove -y docker docker-engine docker.io containerd runc 2>/dev/null || true
sudo apt update
sudo apt install -y ca-certificates curl gnupg
# repo + key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu noble stable" | \
sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"
newgrp docker <<<' '
docker version
Open WebUI: The Café Counter
With Ollama and llama.cpp running, Open WebUI becomes the friendly barista.
It ties everything together — one interface to manage models, prompts, and histories. It’s the café counter where all the tools meet.
Option 1: Host networking (simplest)
# docker-compose.yml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
network_mode: "host"
environment:
- OLLAMA_BASE_URL=http://127.0.0.1:11434
# We’ll add llama.cpp as a second "OpenAI-compatible" provider in the UI:
- OPENAI_API_BASE=http://127.0.0.1:8081
- OPENAI_API_KEY=local
volumes:
- /home/zagor/src/open-webui/data:/app/backend/data
restart: unless-stopped
Start it:
docker compose up -d
Open http://<server-ip>:8080, go to Settings → Providers → Add OpenAI-compatible:
- Base URL:
http://127.0.0.1:8081
(no/v1
) - API key:
local
(anything)
Monitoring: Watching the Brew
Once models are running, it’s easy to forget the system itself.
But keeping an eye on GPU load, memory, and containers ensures smooth performance — like watching the kettle while the coffee drips.
# GPU
watch -n0.5 'cat /sys/class/drm/card0/device/gpu_busy_percent'
sudo apt install -y radeontop && radeontop
sudo mount -t debugfs none /sys/kernel/debug 2>/dev/null || true
watch -n0.5 'sudo egrep "GPU load|GPU clock|Memory clock" /sys/kernel/debug/dri/0/amdgpu_pm_info'
# RAM + swap
watch -n1 free -h
swapon --show
# Containers
docker stats
Useful links
Category | Description | Link |
---|---|---|
Official Site & Docs | Main LM Studio website | lmstudio.ai |
System Requirements | Hardware and software requirements | System Requirements |
Getting Started | Basic usage and setup instructions | Getting Started |
AMD ROCm Support | Info on AMD GPU support via ROCm | ROCm & Ryzen AI |
Docker Image (Unofficial) | Docker image for LM Studio with CUDA | Docker Hub – noneabove1182/lmstudio-cuda |
Docker Compose Example | Example docker-compose for LM Studio | GitLab LM Studio docker-compose |
MCP Toolkit Guide | Guide to running MCP Toolkit with LM Studio | DEV Guide |
Running LLMs with LM Studio | General guide on running local LLMs | GPU Mart Guide |
ROCm Supported GPUs | AMD ROCm compatible GPUs list | ROCm Supported GPUs |
Unlock AMD GPU Support | Guide to enable LM Studio on any AMD GPU | ROCm Unlock Guide |
AMD GPU LLM Guide | Beginner Guide: AMD GPU for Local LLMs | TechTeamGB Guide |
Lessons Learned From This Coffee Journey
- A server build is like brewing coffee — patience and small adjustments make the difference.
- BIOS settings matter more than you think; like water temperature, get them wrong and nothing tastes right.
- Expect hiccups: a flaky mirror or a mis-typed
dd
flag is part of the journey. - Document as you go — future you will thank you when something breaks at 2am.
- Start small (TinyLlama) before going full espresso shot (Mistral 7B).
By the time the last commands finished running, my coffee was cold — but the LAB was alive. A capable server now sits under my desk, running Docker containers, Ollama for quick fiddling, and llama.cpp with Vulkan acceleration when I want to stretch the GPU.
And like brewing coffee, the fun isn’t in perfection. It’s in the practice.
PS. I used AI-generated picture because my own desk is too messy 🙂
👉 Over to you: Have you tried setting up a local LLM LAB yet? What “mirror trolls” or BIOS quirks did you run into? Share your stories — I’d love to compare notes with my home lab for LLMs saga 🙂
Thanks for reading!
If you enjoyed this article, feel free to share it on social media and spread some positivity, and join my newsletter.