Why Ollama?
Ollama lets you run AI models locally. Your resume data never leaves your machine. No API keys, no usage fees, complete privacy.
Prerequisites
- Docker Desktop installed
- 16GB+ RAM (32GB recommended for larger models)
- 10GB+ free disk space
Setup Options
Option 1: Ollama on Host (Recommended)
Install Ollama on your machine, then connect Resume Matcher to it.
- Install Ollama: ollama.com
- Pull a model:
ollama pull qwen3:8b - Start Resume Matcher with Docker
- Configure in Settings:
- Provider:
Ollama - Model:
qwen3:8b(or your chosen model from Ollama Library) - Server URL: See table below
- Provider:
Ollama Server URL by Platform:
| Platform | URL |
|---|---|
| Mac/Windows (Docker Desktop) | http://host.docker.internal:11434 |
| Linux (default) | http://172.17.0.1:11434 |
| Linux (host network) | http://localhost:11434 |
Option 2: Ollama in Docker
Run both Resume Matcher and Ollama as containers:
# docker-compose.yml
services:
resume-matcher:
build: .
ports:
- "3000:3000"
- "8000:8000"
environment:
- LLM_PROVIDER=ollama
- LLM_MODEL=qwen3:8b
- LLM_API_BASE=http://ollama:11434
depends_on:
- ollama
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
After starting, pull a model:
docker exec -it ollama ollama pull qwen3:8b
Recommended Models
Resume Matcher requires models that reliably return structured JSON. These models are tested and optimized for JSON schema compliance:
| Model | Size | Speed | Best For |
|---|---|---|---|
qwen3:4b | 4B | Fast | Quick iterations, lower RAM |
qwen3:8b | 8B | Medium | Recommended — best balance of speed and quality |
granite4:3b | 3B | Fast | Lightweight, built for structured output |
glm-4.7-flash | 30B | Slower | Highest quality, needs 32GB+ RAM |
Why Qwen3? Alibaba’s Qwen3 models are specifically optimized for reasoning and structured responses. They consistently produce valid JSON even at smaller sizes.
GPU Acceleration
NVIDIA GPUs: Install NVIDIA Container Toolkit, then add to your docker-compose.yml:
ollama:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Apple Silicon: Works automatically. No extra config needed.
Troubleshooting
“Connection refused” error?
- Check Ollama is running:
curl http://localhost:11434/api/tags - Verify the Server URL matches your platform
Slow responses?
- Use a smaller model:
ollama pull qwen3:4b - Check available RAM
- First request is always slower (model loading)
Out of memory?
- Increase Docker memory in Desktop settings
- Use a quantized model (q4_0 suffix)
Next Steps
- Features - Explore capabilities
- Contributing - Help improve the project