19 Oct 2025 - tsp
Last update 23 Oct 2025
4 mins
TL;DR
On a box with two NVIDIA GeForce RTX 3060 (12 GB) GPUs, Ollama auto-updated to 0.12.4 and model loads silently stopped working, upgrading to 0.12.5 did not solve the problem. Downgrading to Ollama 0.12.3 fixed it immediately. NVIDIA driver upgrade from 572.82 to 581.29 did not change the behavior (other CUDA apps were fine)
When Ollama auto-updated on my small dual RTX 3060 headless setup, model loading suddenly stopped working - no errors, just silence and hanging clients. The GPUs were detected, yet nothing generated and loading of tensors failed from one day to the other. Upgrading drivers and even moving from version 0.12.4 to 0.12.5 did not help, while other CUDA applications ran perfectly.
After a few hours of debugging, the fix turned out to be simple: rolling back to Ollama 0.12.3 instantly restored normal behavior. If you are seeing lines like llama_model_load: vocab only - skipping tensors and key with type not found ... general.alignment, this post walks through what happened and how to get your models running again without tearing your hair out.

Server was up and GPUs detected:
Listening on [::]:8182 (version 0.12.5)
inference compute ... CUDA0 / CUDA1 ... NVIDIA GeForce RTX 3060 ... total="12.0 GiB"
When a clients hit /api/show (metadata probe), logs looked normal but gave the impression models don’t load:
POST "/api/show"
llama_model_load: vocab only - skipping tensors
This is expected for /api/show, but in my case real generations never kicked in either.
Extra noise that wasn’t fatal but confused debugging:
key with type not found key=general.alignment default=32
and many lines like:
load: control token: 128... '<|reserved_special_token_...|>' is not marked as EOG
Tried a driver update from 572.82 to 581.29. This yielded no change (at least OpenCL and CUDA in other apps remained OK).
I didn’t dig deeper into whether it’s scheduling, multi-GPU runner selection, or a llama.cpp interface edge case in 0.12.4/0.12.5 - the point here is the quick fix.
Downgrade Ollama to 0.12.3. The models load and generate again.
Unfortunately on some platforms like Windows the system tray application of ollama will perform auto-updates. As it appears from various bug reports, even though the feature to disable them is desired by many people for different reasons (prevent breaking upgrades, prevent fetching huge amounts of data over metered connections, not unknowingly pull new code into your network, etc.) the developers actually do not intend to allow disabling of auto update. A workaround is completly getting rid of the system tray app.exe manually or supressing access to Internet by this application. On Windows this can be done in the powershell when executes as administrator using the following command also mentioned in the referenced bug report:
New-NetFirewallRule -DisplayName "Block Ollama App (disable autoupdate)" -Direction Outbound -Program "$env:LOCALAPPDATA\Programs\Ollama\ollama app.exe" -Action Block
On other applications you can perform similar blocks, block on DNS level - though this will also prevent fetching of models - or remove the tray application.
Hitting /api/show only opens the GGUF header and tokenizer. It prints:
llama_model_load: vocab only - skipping tensors
This is expected behaviour. You need /api/generate or /api/chat to actually allocate and load tensors into VRAM.
The general.alignment default=32 line is a benign warning from newer llama.cpp when an optional GGUF key is absent.
If your dual-GPU RTX 3060 rig suddenly stops loading models after Ollama auto-updates and you see lines like:
llama_model_load: vocab only - skipping tensors
key with type not found key=general.alignment default=32
try rolling back to Ollama 0.12.3 first. That immediately restored normal model loading and inference for me. This may solve the immediate headache before trying to get resolved how to upgrade without breaking everything.
Note: These links are Amazon affiliate links, this pages author profits from qualified purchases
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/