Setup

Docker + Ollama Setup

Run Resume Matcher with local AI using Ollama

Why Ollama?

Ollama lets you run AI models locally. Your resume data never leaves your machine. No API keys, no usage fees, complete privacy.

Prerequisites

  • Docker Desktop installed
  • 16GB+ RAM (32GB recommended for larger models)
  • 10GB+ free disk space

Setup Options

Install Ollama on your machine, then connect Resume Matcher to it.

  1. Install Ollama: ollama.com
  2. Pull a model: ollama run gemini-3-flash-preview
  3. Start Resume Matcher with Docker
  4. Configure in Settings:
    • Provider: Ollama
    • Model: gemini-3-flash-preview (or your chosen model from list)
    • Server URL: See table below

Ollama Server URL by Platform:

PlatformURL
Mac/Windows (Docker Desktop)http://host.docker.internal:11434
Linux (default)http://172.17.0.1:11434
Linux (host network)http://localhost:11434

Option 2: Ollama in Docker

Run both Resume Matcher and Ollama as containers:

# docker-compose.yml
services:
  resume-matcher:
    build: .
    ports:
      - "3000:3000"
      - "8000:8000"
    environment:
      - LLM_PROVIDER=ollama
      - LLM_MODEL=llama3.2
      - LLM_API_BASE=http://ollama:11434
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

After starting, pull a model:

docker exec -it ollama ollama pull llama3.2
ModelSizeSpeedQuality
llama3.23BFastGood for most tasks
llama3.2:7b7BMediumBetter quality
mistral7BMediumGood balance
gemma29BMediumGoogle’s model

Start with llama3.2 and upgrade if you need better output.

GPU Acceleration

NVIDIA GPUs: Install NVIDIA Container Toolkit, then add to your docker-compose.yml:

ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Apple Silicon: Works automatically. No extra config needed.

Troubleshooting

“Connection refused” error?

  • Check Ollama is running: curl http://localhost:11434/api/tags
  • Verify the Server URL matches your platform

Slow responses?

  • Use a smaller model: ollama pull llama3.2:3b-q4_0
  • Check available RAM
  • First request is always slower (model loading)

Out of memory?

  • Increase Docker memory in Desktop settings
  • Use a quantized model (q4_0 suffix)

Next Steps