Highlights
Joint token & position diffusion
DDOT simultaneously denoises discrete token values and continuous positions so infilled spans can slide, stretch, or shrink while preserving parallel generation.
Sample-level optimal transport
Order-preserving optimal transport couplings align noisy and ground-truth positions within prompt and response sets, collapsing the permutation space and stabilizing training.
Flexible infilling accuracy
The uniform variant DDOT-U hits 58.4 BLEU-4 with 100% success on One-Billion-Word block infilling and mirrors that reliability on Yelp block spans—performance that helped secure acceptance to EMNLP 2025.
Abstract
Discrete diffusion models offer bidirectional context, parallel denoising, and controllable prompting, yet they have struggled to support flexible-length or flexible-position infilling without oracle token locations. We introduce DDOT (Discrete Diffusion with Optimal Transport Position Coupling), which jointly denoises token values and token positions using a sample-level optimal transport coupling. The coupling preserves prompt order while dynamically adjusting the lengths and placements of infilled spans. DDOT is orthogonal to existing discrete diffusion methods and compatible with pretrained text denoisers. On One-Billion-Word and Yelp infilling benchmarks, DDOT outperforms diffusion baselines and matches the quality of leading non-autoregressive systems while retaining parallel generation efficiency.
DDOT maintains near-perfect success as prompts grow, while LC and PoP quickly generate invalid outputs.
Longer reverse schedules improve BLEU and METEOR, showing DDOT benefits from additional denoising steps.
DDOT respects prompt keywords while generating coherent sentences on One-Billion-Word and Yelp benchmarks.
Optimal transport produces straight, non-crossing paths that keep prompt order intact during denoising.
Method Overview
- Joint denoising: A Diffusion Transformer predicts token scores and position velocities in one forward pass, keeping generation parallel while learning to move masked spans.
- Optimal transport guidance: Balanced prompt matching and injective response matching form order-preserving trajectories that clamp prompt order yet allow responses to weave between them.
- DDOT-R vs. DDOT-U: Random terminal sampling provides diverse placements, while uniform grids (DDOT-U) curb pad clustering and consistently deliver the strongest metrics.
Results at a Glance
One-Billion-Word (Block)
DDOT-U: BLEU-4 58.4 · NIST-4 8.79 · METEOR 42.1 · Success 100%
+16.4 success points over PoP while improving BLEU-4 by 8.4.
Yelp (Block)
DDOT-U: BLEU-4 59.5 · NIST-4 8.86 · METEOR 42.9 · Success 100%
Matches the best baselines while keeping every prompt token in place.
CodeParrot (Random)
DDOT-U: BLEU-4 40.4 · CodeBLEU 45.4 · Success 11.2%
Improves CodeBLEU by 25.9 points over PoP for Python infilling.
BibTeX
@inproceedings{zhang2025ddot,
title={Flexible-length Text Infilling for Discrete Diffusion Models},
author={Zhang, Andrew and Sivakumar, Anushka and Tang, Chiawei and Thomas, Chris},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
publisher={Association for Computational Linguistics},
year={2025},
url={https://arxiv.org/abs/2506.13579}
}