⚿ Sign in
OWLOWL  ← Home

OWL

Greening of Streaming · Live energy measurement · GoS1

You're browsing as Anonymous · curated demo runs, live measurement, full methodology — same numbers as members see. Sign in to unlock custom inputs and uploads, or see what changes →

OWL measures the real energy cost of video transcoding and AI inference — using a calibrated smart plug, not estimates. Every number on this page comes from a live measurement on GoS1, a server in our lab in France.

— W
GoS1 current power draw · Tapo P110 · device layer only
What's being measured?

GoS1 is an AMD Ryzen 9 workstation with an NVIDIA GeForce RTX 5080 GPU. Power is sampled at 1-second intervals via a Tapo P110 connected to the mains supply. We measure the delta between idle baseline and task power — not estimated TDP or nameplate figures.

Scope: device layer only. Network, CDN, and CPE are explicitly excluded. Amortised embodied carbon and training cost are not included in LLM measurements.

Why does this matter?

Streaming accounts for a significant and growing share of global internet traffic. Codec choice, inference model size, and hardware path all affect real energy use — but most published figures are estimates or averages. OWL produces primary measurement data that operators and researchers can reproduce and cite.

→ Read the full measurement methodology protocol, confidence framework, scope statements, calibration

Video Transcode

What this shows

Whether transcoding to the same quality target uses more energy on CPU or GPU — and whether the faster path is also the more efficient one.

What we're doing

Encoding a 4K clip (Meridian, Netflix Open Content, CC BY 4.0) to 1080p H.264 — once in software (libx264, CPU only) and once as a full GPU pipeline (hardware decode + encode via h264_nvenc). Same source. Same quality target. P110 sampled every second throughout.

How this is measured

5s idle baseline before each run. 10s thermal cooldown between CPU and GPU. Energy = ΔW × duration / 3600. Confidence 🟢 = ΔW > 5× noise and ≥ 9 polls.

Source: 812 MB, 4K. Encode time ~2–3 min CPU, ~90s GPU (full pipeline). Previous runs (partial pipeline): CPU 174s / 4.06 Wh · GPU 114s / 4.42 Wh. Full pipeline results pending first run.

Result

Scope: device layer only (GoS1). Network, CDN, and CPE not included. A faster encode does not automatically mean less energy — this measures total Wh, not rate.

LLM Inference BETA

What this shows

How much energy each generated token costs — and how model size translates into energy use per unit of output.

What we're doing

Running a fixed prompt (T3 Long — network energy attribution briefing) through Mistral 7B cold: model unloaded before baseline so we capture the true first-request cost. GPU inference via Ollama ROCm.

How this is measured

Model unloaded from VRAM. 3s settle. 10s idle baseline. Single inference run. P110 at 1s intervals. Primary metric: mWh per output token.

Model: Mistral 7B (4.4 GB). Previous result: 0.94 mWh/tok, ~47 tok/s.

Why mWh per token?

Token count varies between models and prompts, so raw Wh figures aren't comparable. Energy per token lets us place TinyLlama (0.06 mWh/tok) and Mistral 7B (0.94 mWh/tok) on the same axis — a ~15× difference.

Result

Scope: device layer only (GoS1). No amortised training cost included. mWh/token measures inference energy only — not the energy cost of training the model.

Image Generation BETA

What this shows

How much energy one AI-generated image costs — measured end to end on real hardware, not estimated from TDP or cloud benchmarks.

What we're doing

Running SD-Turbo (stabilityai/sd-turbo, CPU, 8 steps, 512×512) with a randomly modified prompt — the colour modifier changes each run to prove the image is generated live, not replayed from cache.

How this is measured

10s idle baseline. CPU diffusion run. P110 at 1s intervals. Metric: Wh per image = ΔW × generation_time / 3600.

Previous result: 0.21 Wh/image, 12s, ~30W delta above idle.

Result

Scope: device layer only (GoS1). Network and storage excluded. This measures one image on one machine — not the energy cost of a hosted API call.

RAG Energy Cost BETA

What this shows

Whether retrieval-augmented generation (RAG) — searching a local corpus before answering — costs meaningfully more energy than plain inference, and see the difference in context size the model must process.

What we're doing

Running three modes back-to-back on Mistral 7B: baseline (no retrieval), RAG (small corpus), and RAG Large (with re-ranking). Same question, same model, same hardware — only the retrieval pipeline changes.

How this is measured

Each mode: 10s idle baseline, inference with P110 at 1s intervals. Metric: mWh per output token. ChromaDB embeddings via sentence-transformers. Corpus: academic papers on streaming energy.

Result

Scope: device layer only (GoS1). Network excluded. RAG retrieval adds overhead but the dominant cost remains token generation.

How We Flag Confidence

The problem

Not every measurement we take is equally trustworthy. System noise — P110 quantisation, OS jitter, Wi-Fi polling variance — is real. A task that adds a small delta above baseline might be signal or artefact. We need a principled way to say which.

The system

Every result carries a traffic light. As of CR-028 Phase 2 it's a per-run confidence interval — "can this run be told apart from idle?" — not a fixed watt rule. confidence = Φ(ΔW / SE), SE from this run's noise + the calibrated idle floor

🟢 Repeatable
≥95% confident above idle and ≥ 9 task polls. Reliable enough to cite.
🟡 Early insight
≥80% confident above idle and ≥ 4 task polls. Directional, but needs a longer run before we'd stake a public claim on it.
🔴 Need more data
Not yet distinguishable from idle. We publish it anyway — but we won't cite it yet.
Why a confidence interval?

Fixed thresholds (e.g. "5W = green") don't adapt to the machine's actual noise level. Instead we take this run's own baseline + task power samples, form a standard error on ΔW (worst case of the run's observed noise and the calibrated idle floor, plus a drift term), and turn ΔW into a one-sided confidence that the task draws above idle. A short run can't go green on a couple of lucky readings — it also needs enough task polls.

On any result page, click a 🟢 🟡 🔴 badge for a quick reminder of the formula.

Findings

Greening of Streaming · OWL · GoS1

From OWL's body of evidence — citable findings backed by stored measurements:

🟢Input-master bitrate has no measurable effect on H.265 re-encode energy (CPU 1.7 %, GPU 4.9 % spread); input-codec has a small effect carried by the AV1-as-source case (CPU 3.4 %, GPU 10.3 % spread)v1 · 2026-05-26
Re-encode `h265_both` on 2-min 1080p siblings · bitrate axis (1.3 → 14.6 Mbps): flat · codec-of-origin axis (H.264 5.1 / H.265 3.4 / AV1 2.3 Mbps): AV1 source raises GPU energy by ~10 %
🟢On Meridian-120s at the ABR ladder, GPU encodes are 2.0× to 4.4× more energy-efficient than CPU encodes; H.265 GPU produces the lowest-energy file, AV1 GPU the fastestv1 · 2026-05-22
Per-codec ABR · H.264 4 Mbps · H.265 2 Mbps · AV1 1.5 Mbps — most efficient: H.265 GPU (0.30 Wh, 28.4 MB, VMAF 92.0) · fastest: AV1 GPU (15.1 s, 0.32 Wh, VMAF 90.8)
🟢AV1 hardware uses ~55% less energy than software at 1500 kbps, but loses ~2 VMAF points and produces ~40% larger filesv1 · 2026-05-22
1500 kbps ABR — SVT-AV1: 0.71 Wh · VMAF 92.74 · 14.5 MB · 34 s av1_vaapi: 0.32 Wh · VMAF 90.79 · 20.3 MB · 15 s
See all findings (6) →

Want to dig deeper?

OWL has three access tiers. The numbers and methodology you've just seen are identical for all three — what changes is who can shape the inputs (custom prompts, custom ffmpeg, all-codecs sweeps, your own corpus, full settings access).

Public GoS member Lab (operator)
Pre-baked workloads, live wall-power & CO2e
Guided tour, methodology, recent-run history
Custom video upload ≤ 1024 MB no cap
Custom prompts & custom ffmpeg commands
All-codecs sweeps, batch / compare-modes
RAG corpus upload (your own PDFs)
CSV / JSON export of your runs
Edit settings, run variance calibration, full results view

Lab tier is granted automatically on the GoS1 LAN (loopback / 192.168.x). There's no public sign-up for Lab — it's the operator surface for the bench itself.

Join GoS — unlock the middle column ↗ Already a member? Sign in

Same measurement quality on every tier. Members shape the inputs; everyone sees the results.


greeningofstreaming.org ↗

Scope: device layer only (GoS1). Network, CDN, CPE excluded.
LLM: no amortised training cost included.

· CPU · GPU