Not one token at a time. This is the real decode of a single AIME 2026 problem, captured step‑by‑step from vLLM's diffusion sampler: 8 blocks of 256 tokens, each denoised in parallel over a handful of steps, then committed at once — arriving at the correct answer 277. Every token and number below is byte‑exact from the captured trajectory.
One AIME answer, 1884 tokens. An autoregressive model emits it in 1884 sequential forward passes (one per token). DiffusionGemma used 69 decode forward passes total — ~27.3× fewer — because each pass refines a whole 256‑token block at once. (Each diffusion pass is heavier: 256 positions, bidirectional attention. The net is Google's reported ~4× wall‑clock speedup.)
Pick a block and scrub the denoising steps. Each token is coloured by its predictive
entropy — green = confident/settled,
red = still noisy. Block 1 starts from pure noise
(newlines & <eos>) and resolves into "Let $v_P$ be the speed of Patrick…" within a step or two.
Mean per‑position entropy per denoise step, for the selected block (bright) against all others (faint). It falls from near‑uniform (~7 nats) to ~0 — the collapse is the stopping rule.
The selected block's final tokens, coloured by which denoise step they locked in. If this were autoregressive, colour would increase strictly left‑to‑right. Instead late‑settling tokens are scattered — the model revises the whole canvas in parallel.
Denoise steps per block. The model spends adaptive compute — some blocks converge in 4, others need 9.
The committed output of all 8 blocks, reproducing the pilot's problem‑1 result: 277 = gold 277 ✓ (temperature 0, int8 weight‑only).