How To Run Aider Polygot Benchmarks on Locally Hosted Models (LLMs)

The goal here is that we want to run Qwen 3.5 27B on our local Windows machine with our GPU and serve it to the Docker container via the API. Then we want the Docker container to run the aider benchmarks while using the API to make the calls. This way the benchmarks run in the environment that Aider wants.

Here’s how we’ll do that.

Grab the latest version of llama-cpp (compiled) here

https://github.com/ggml-org/llama.cpp/releases

I used this version because I’m on Win x64 using a Nvidia GPU

https://github.com/ggml-org/llama.cpp/releases/download/b8407/llama-b8407-bin-win-cuda-13.1-x64.zip

Extract it to a folder.

You’ll also need the CUDA DLLs, extract them into the same folder that you extracted llama into.

https://github.com/ggml-org/llama.cpp/releases/download/b8407/cudart-llama-bin-win-cuda-13.1-x64.zip

Grab the the model here:

https://huggingface.co/unsloth/Qwen3.5-27B-GGUF

I used the UD-Q5_K_XL

https://huggingface.co/unsloth/Qwen3.5-27B-GGUF/blob/main/Qwen3.5-27B-UD-Q5_K_XL.gguf

You don’t necessarily need the model in the same folder because you can just supply the full path to it like I do.

Now you want open a Powershell window and cd into the folder where you extracted llama-server and the CUDA DLLs.

cd  C:\Users\user1\Downloads\llama-b8392-bin-win-cuda-13.1-x64

.\llama-server.exe -m E:\lm-models\unsloth\Qwen3.5-27B-GGUF\Qwen3.5-27B-UD-Q5_K_XL.gguf --no-mmproj --no-mmap --jinja --threads 8 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --ctx-size 80000 -kvu --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --host 0.0.0.0

Caveats:

I used .\ to run the command because this is Powershell.
I set context size to 80k, you might need to lower yours if you have VRAM issues. This is not a model you want to offload to RAM/CPU.
I quantized the KV cache to Q8_0 because some benchmarks show that it’s very safe, I don’t ever go lower than that.
I used the sampling parameters (temp, top-p, etc) found on the huggingface page that Qwen recommends for this model in thinking mode.
I set the host to 0.0.0.0 because we want Docker to be able to reach it.

Get Docker for Windows and install it
https://docs.docker.com/desktop/setup/install/windows-install/

here’s the link for Windows X64
https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe?utm_source=docker&utm_medium=webreferral&utm_campaign=docs-driven-download-win-amd64

Once you’ve installed Docker for Windows, move on to setting up the container and benchmarks

Note: I am in a folder on my E drive called llm-benchmark, you will need to change this.

cd E:\llm-benchmark
git clone https://github.com/Aider-AI/aider.git
cd aider
mkdir tmp.benchmarks
git clone https://github.com/Aider-AI/polyglot-benchmark tmp.benchmarks/polyglot-benchmark

cd E:\llm-benchmark\aider
docker run --rm -it -e AIDER_DOCKER=1 -e OPENAI_API_BASE=http://host.docker.internal:8080/v1 -e OPENAI_API_KEY=dummy --add-host=host.docker.internal:host-gateway -v "${PWD}:/aider" -w /aider aider-benchmark bash

That will put you in the Linux shell of the container

Now, test you can reach the LLAMA-SERVER API.

curl http://host.docker.internal:8080/v1/models

You should get back JSON response of the models available.
Next, test connecting to the model with Aider.

aider --model openai/Qwen3.5-27B-UD-Q5_K_XL.gguf

type “test” or something similar and wait for response.

If you got a response press CTRL + C to exit.

Now test bench within the container.

./benchmark/benchmark.py smoke-test \
--model openai/Qwen3.5-27B-UD-Q5_K_XL.gguf \
--edit-format whole \
--threads 1 \
--num-tests 1 \
--exercises-dir polyglot-benchmark

The benchmark may take a while to run.

If that fails with error “/usr/bin/env: ‘python3\r’: No such file or directory”
You cloned the aider files on windows first before creating the container and you need this:

apt-get update && apt-get install -y dos2unix
dos2unix benchmark/benchmark.py

Continue with the full benchmark. In this case I run 3 Rust tests.

./benchmark/benchmark.py my-local-run \
--model openai/Qwen3.5-27B-UD-Q5_K_XL.gguf \
--edit-format whole \
--threads 1 \
--keywords "rust" \
--num-tests 5 \
--exercises-dir polyglot-benchmark

This next part is optional.

Note that Aider runs with a custom temperature setting which you may not want for a local model. We want to use the temp set by llama-server (llama-cpp).

install nano so you can edit files

apt install nano -y

create a yml config file with the name “.custom-model-settings.yml”

nano .custom-model-settings.yml

Paste in the following

- name: openai/Qwen3.5-27B-UD-Q5_K_XL.gguf
  edit_format: whole
  weak_model_name: openai/Qwen3.5-27B-UD-Q5_K_XL.gguf
  use_repo_map: true
  use_temperature: false

And press CTRL + X to save.

Now run the benchmark with the settings file specified.

./benchmark/benchmark.py UD-Q5_K_XL-KV-Q8-Q8 \
--model openai/Qwen3.5-27B-UD-Q5_K_XL.gguf \
--edit-format whole \
--threads 1 \
--keywords "rust" \
--num-tests 1 \
--read-model-settings .custom-model-settings.yml \
--exercises-dir polyglot-benchmark

Note if you previously ran this benchmark you may need to add –name at the end to overwrite it or change the name.

That’s it, you’re done!

Related