Setup

Docker + Ollama Setup

Run Resume Matcher with local AI using Ollama

Why Ollama?

Ollama lets you run AI models locally. Your resume data never leaves your machine. No API keys, no usage fees, complete privacy.

Prerequisites

  • Docker Desktop installed
  • 16GB+ RAM (32GB recommended for larger models)
  • 10GB+ free disk space

Setup Options

Install Ollama on your machine, then connect Resume Matcher to it.

  1. Install Ollama: ollama.com
  2. Pull a model: ollama pull qwen3:8b
  3. Start Resume Matcher with Docker
  4. Configure in Settings:
    • Provider: Ollama
    • Model: qwen3:8b (or your chosen model from Ollama Library)
    • Server URL: See table below

Ollama Server URL by Platform:

PlatformURL
Mac/Windows (Docker Desktop)http://host.docker.internal:11434
Linux (default)http://172.17.0.1:11434
Linux (host network)http://localhost:11434

Option 2: Ollama in Docker

Run both Resume Matcher and Ollama as containers:

# docker-compose.yml
services:
  resume-matcher:
    build: .
    ports:
      - "3000:3000"
      - "8000:8000"
    environment:
      - LLM_PROVIDER=ollama
      - LLM_MODEL=qwen3:8b
      - LLM_API_BASE=http://ollama:11434
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

After starting, pull a model:

docker exec -it ollama ollama pull qwen3:8b

Resume Matcher requires models that reliably return structured JSON. These models are tested and optimized for JSON schema compliance:

ModelSizeSpeedBest For
qwen3:4b4BFastQuick iterations, lower RAM
qwen3:8b8BMediumRecommended — best balance of speed and quality
granite4:3b3BFastLightweight, built for structured output
glm-4.7-flash30BSlowerHighest quality, needs 32GB+ RAM

Why Qwen3? Alibaba’s Qwen3 models are specifically optimized for reasoning and structured responses. They consistently produce valid JSON even at smaller sizes.

GPU Acceleration

NVIDIA GPUs: Install NVIDIA Container Toolkit, then add to your docker-compose.yml:

ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Apple Silicon: Works automatically. No extra config needed.

Troubleshooting

“Connection refused” error?

  • Check Ollama is running: curl http://localhost:11434/api/tags
  • Verify the Server URL matches your platform

Slow responses?

  • Use a smaller model: ollama pull qwen3:4b
  • Check available RAM
  • First request is always slower (model loading)

Out of memory?

  • Increase Docker memory in Desktop settings
  • Use a quantized model (q4_0 suffix)

Next Steps