DiffusionGemma · AIME 2026 · problem 1 · the actual inference process

Watching DiffusionGemma think

Not one token at a time. This is the real decode of a single AIME 2026 problem, captured step‑by‑step from vLLM's diffusion sampler: 8 blocks of 256 tokens, each denoised in parallel over a handful of steps, then committed at once — arriving at the correct answer 277. Every token and number below is byte‑exact from the captured trajectory.

DiffusionGemma writes in blocks, refined in parallel — not left‑to‑right

One AIME answer, 1884 tokens. An autoregressive model emits it in 1884 sequential forward passes (one per token). DiffusionGemma used 69 decode forward passes total — ~27.3× fewer — because each pass refines a whole 256‑token block at once. (Each diffusion pass is heavier: 256 positions, bidirectional attention. The net is Google's reported ~4× wall‑clock speedup.)

256‑token blocks

decode forward passes

4–9

denoise steps / block (adaptive)

277

answer (gold 277) ✓

12s

total generation

Watch a block condense from noise into reasoning

Pick a block and scrub the denoising steps. Each token is coloured by its predictive entropy — green = confident/settled, red = still noisy. Block 1 starts from pure noise (newlines & <eos>) and resolves into "Let $v_P$ be the speed of Patrick…" within a step or two.

entropy0 (settled) → high (noise)

what the scrubber is showing

Each frame is one call to vLLM's DiffusionSampler on this block's 256‑token canvas. The text is argmax_canvas — the model's current best guess at every position. Colour is the per‑position predictive entropy H = −Σ p·log p over the vocabulary (nats), computed from that step's logits. The block commits when its mean entropy falls under the model's entropy_bound and the converged flag flips.

Confidence collapses in a few steps

Mean per‑position entropy per denoise step, for the selected block (bright) against all others (faint). It falls from near‑uniform (~7 nats) to ~0 — the collapse is the stopping rule.

Tokens don't settle left‑to‑right

The selected block's final tokens, coloured by which denoise step they locked in. If this were autoregressive, colour would increase strictly left‑to‑right. Instead late‑settling tokens are scattered — the model revises the whole canvas in parallel.

settles at stepearly → late

Harder blocks get more steps

Denoise steps per block. The model spends adaptive compute — some blocks converge in 4, others need 9.

…and it lands the answer

The committed output of all 8 blocks, reproducing the pilot's problem‑1 result: 277 = gold 277 ✓ (temperature 0, int8 weight‑only).

problem statement

Captured from 20260701-115452_trace_p1/trajectory.json · 69 decode steps · source sha256 89c40b9c417a