You're here because you need Docling working with OpenWeb-UI or you noticed that Docling is not really using the GPU you have. Docling consumes a decent amount of CPU cycles while the GPU remains at 0% utilization. The only OCR engine that supports GPU acceleration is RapidOCR, but that uses ONNX Runtime in the backend …
Telegram Windows App – Expand Chat Bubble Size Fix
Are you tired of the Telegram app in Windows wasting your screen space? Me too. It's ridiculous that this app keeps chat bubbles spanned as if we were using a small phone. The app is literally made for Windows. Here, I show how we can modify the variables in memory to fix this. Before the …
Continue reading "Telegram Windows App – Expand Chat Bubble Size Fix"
Qwen3.x and LLAMA.CPP – How To Extend Context Window Past 260k
Normally Qwen3.x (3.5 and 3.6) models have a limit of about 260k context. There are many scenarios where it would be advantageous to increase this to around 300 or 400k. One primary use case is having the model ingest a ton of files before working on a problem (usually source code documents). Here are the …
Continue reading "Qwen3.x and LLAMA.CPP – How To Extend Context Window Past 260k"
Qwen3.5 27B Q8 – KV Cache Benchmarks BF16 vs F16 vs Q8_0
If you're curious about how much KV Cache quantization affects Qwen3.5 27B, take a look at the table below. The model used in all of these benchmarks is Unsloth's Q8_K_XL. KV Cache BF 16 vs F16 vs Q8_0 KV Cache TypeMean PPL(Q)ΔPPL (Q - base)PPL Ratioln RatioMean KLDMax KLDRMS Δp (%)Same Top-p (%)BF166.8653 ± 0.04470———————F166.866214 …
Continue reading "Qwen3.5 27B Q8 – KV Cache Benchmarks BF16 vs F16 vs Q8_0"
Rose Online Asset File Format Technical Specification Part 2
This is part 2 of the Rose Online Asset File Format Technical Specification document. Part 1 can be found here. In part 1 we stopped right before getting into the IFO file format. 3.6 IFO Zone Object Files (.ifo) Object Type Enumeration: enum BlockType { DeprecatedMapInfo = 0, DecoObject = 1, Npc = 2, CnstObject …
Continue reading "Rose Online Asset File Format Technical Specification Part 2"
How To Run Aider Polygot Benchmarks on Locally Hosted Models (LLMs)
The goal here is that we want to run Qwen 3.5 27B on our local Windows machine with our GPU and serve it to the Docker container via the API. Then we want the Docker container to run the aider benchmarks while using the API to make the calls. This way the benchmarks run in …
Continue reading "How To Run Aider Polygot Benchmarks on Locally Hosted Models (LLMs)"