Troubleshoot Ollama local server, model load, cloud API, timeout, runner crash, memory, GPU, and connection errors for local and hosted inference.
This Ollama troubleshooting hub collects real error signatures, quick fixes, common causes, and
step-by-step debugging paths for developers who need a practical answer instead of a broad overview.
106 articles in this category.
Technologies covered: 1
Date range: May 11, 2026 to May 17, 2026
Latest updates: 6
Troubleshooting overview
Start by matching the exact error message, then check the technology, environment, credentials, network path,
and deployment context. The pages below are grouped so you can move from broad Ollama symptoms to
specific root-cause families without relying on client-side search.
Common error types
connection refused
500 internal server error
503 service unavailable
model failed to load
runner process terminated
read timeout
Common causes
Ollama is a widely used local LLM platform. Connection refused errors block all local AI inference, affecting developers running self-hosted models for cost savings or privacy. (1 page)
Sentry-reported production issue: ConnectError [Errno 111] Connection refused when connecting to Ollama API. Multiple instances traced to httpcore connection failure. (1 page)
Stack Overflow user reports MCP server fails to connect in VS Code Continue extension when Ollama runs in Docker container on WSL2. Ollama reachable at localhost:11434 but Continue extension doesn't see MCP tools. WSL2/Docker networking and stdio transport issue. Category is Ollama (local LLM serving + MCP integration). (1 page)
Ollama fails to load gemma4 models on Apple M5 chips with '500 Internal Server Error: model failed to load' and 'exit status 2' crash. gemma3:4b works fine on same hardware. M5 is new hardware with growing user base. (1 page)
Consistent crash on Apple M5 (Darwin 25.0.0) with gemma4:e2b and gemma4:e4b. Crash at same fault address regardless of GPU/CPU mode. gemma3:4b works. Gemma 4 works via Google AI Studio API — Ollama-specific. (1 page)
Ollama returns 500 Internal Server Error with 'unable to load model' message — blocks local LLM inference (1 page)
Fix LangChain + Ollama integration where streaming completions incorrectly use stop_sequences as finish_reason, causing early termination or garbled output in production LLM pipelines. Includes evidence for Ollama troubleshooting demand.
OllamaLLM streaming in LangChain returns stop sequences as finish_reason instead of proper completion signal, breaking streaming pipelines
Developers running local Ollama models via Codex App notice extreme slowdowns; root cause is Codex App not reading model parameter num_ctx and defaulting to max context sizes Includes evidence for Ollama troubleshooting demand.
Codex App sends requests with context_window=128000-262144 tokens instead of model's configured num_ctx (e.g., 32768) — causing severe generation slowdowns on local models
Fix Ollama model returning empty string response from generate() API call despite model being installed Includes evidence for Ollama troubleshooting demand.
Ollama model returning empty string from ollama.generate() — model installed but generate() returns empty output
Fix MCP server connection failure between VS Code Continue extension and Ollama running in Docker on Windows WSL2 Includes evidence for Ollama troubleshooting demand.
MCP server connection failed — Continue extension does not show or trigger MCP tools when Ollama runs in Docker on WSL2
Fix Ollama ignoring http_proxy environment variable when downloading models that use plain HTTP manifests, causing connection refused behind proxies Includes evidence for Ollama troubleshooting demand.
request failed: Get "http://registry.ollama.ai/v2/...": dial tcp: connect: connection refused
Fix Ollama model not supporting tools error — resolve invalid_request_error for tool-calling with local models Includes evidence for Ollama troubleshooting demand.
BadRequestError: Error code: 400 — registry.ollama.ai/library/llama3:latest does not support tools
Fix Ollama v0.24.0 crash on startup due to OOM during GGUF metadata parsing with corrupted model headers Includes evidence for Ollama troubleshooting demand.
runtime: out of memory — readString/n_kv allocation in fs/gguf.readString tries to allocate 32GB+ RAM on corrupted model header
Fix Ollama MLX runner infinite repetition loop hang during large-context inference with no detection or circuit breaker Includes evidence for Ollama troubleshooting demand.
Ollama MLX runner enters repetition loop at 60K+ tokens context, no abort mechanism, streams same phrase hundreds of times indefinitely
Fix Ollama embedding endpoint returning 500 Internal Server Error with EOF when generating embeddings Includes evidence for Ollama troubleshooting demand.
Server error: POST http://localhost:11434/api/embeddings resulted in a 500 Internal Server Error — EOF