OWLOWL/← All findings

Cold LLM inference on the GoS1 bench: TinyLlama (1.1B) at ~0.07 mWh/token, Mistral 7B at ~0.96 mWh/token β€” both on the pre-S30 ladder

🟑 Indicative · measured 2026-04-24 · refined 2026-05-08 · v1
T3 cold inference Β· TinyLlama 0.0718 mWh/token 🟑 Β· Mistral 7B 0.9639 mWh/token 🟒 Β· ratio β‰ˆ 13Γ— per token
SCOPE: Device layer only (GoS1: AMD Ryzen 9 7900 + Radeon RX 7800 XT, Ollama 0.20.2). Network and CPE excluded. No amortised training cost.
OWL Finding: Cold LLM inference on the GoS1 bench: TinyLlama (1.1B) at ~0.07 mWh/token, Mistral 7B at ~0.96 mWh/token β€” both on the pre-S30 ladder measured 2026-04-24, refined 2026-05-08 https://wattlab.greeningofstreaming.org/findings/llm-cold-inference-mwh-per-token Greening of Streaming β€” wattlab.greeningofstreaming.org
Source measurement
Loading measurement llm/2d79c99c…
Loading measurement llm/163c6442…

Caveats

What was measured

Two cold inferences on the same T3 task, executed via Ollama and measured at the wall through the Tapo P110:

Each model was unloaded before the measurement, baseline was taken across 10 polls Γ— 1 s, then the task was issued and polled at 1 Hz until completion.

Numbers, from the stored result files

| Model | mWh per token | Tokens / s | W_base / W_task | Confidence | |---|---|---|---|---| | TinyLlama 1.1B | 0.0718 | (per file) | (per file) | 🟑 | | Mistral 7B | 0.9639 | 49.7 | 61.8 W / 234.2 W | 🟒 |

Per-token ratio: Mistral 7B / TinyLlama β‰ˆ 13.4Γ— β€” the larger model uses an order of magnitude more energy per token in this measurement.

What this measurement does not establish

Read alongside

The RAG faithfulness finding (rag-faithfulness-rem-question) on the same TinyLlama version β€” same retrieval, smaller model generated a hallucinated answer. Energy per token and answer correctness are independent axes.

Methodology β†’ (docs/wattlab_traffic_light_confidence.md)
llmcold-inferencetinyllamamistralmwh-per-tokenpre-s30-panel