Flexible-length Text Infilling for Discrete Diffusion Models

Department of Computer Science, Virginia Tech EMNLP 2025 · To appear
DDOT diffuses token positions with optimal transport to enable flexible infilling

DDOT learns to move masked spans between prompts by pairing noisy and ground-truth token positions with sample-level optimal transport.

Highlights

Joint token & position diffusion

DDOT simultaneously denoises discrete token values and continuous positions so infilled spans can slide, stretch, or shrink while preserving parallel generation.

Sample-level optimal transport

Order-preserving optimal transport couplings align noisy and ground-truth positions within prompt and response sets, collapsing the permutation space and stabilizing training.

Flexible infilling accuracy

The uniform variant DDOT-U hits 58.4 BLEU-4 with 100% success on One-Billion-Word block infilling and mirrors that reliability on Yelp block spans—performance that helped secure acceptance to EMNLP 2025.

Abstract

Discrete diffusion models offer bidirectional context, parallel denoising, and controllable prompting, yet they have struggled to support flexible-length or flexible-position infilling without oracle token locations. We introduce DDOT (Discrete Diffusion with Optimal Transport Position Coupling), which jointly denoises token values and token positions using a sample-level optimal transport coupling. The coupling preserves prompt order while dynamically adjusting the lengths and placements of infilled spans. DDOT is orthogonal to existing discrete diffusion methods and compatible with pretrained text denoisers. On One-Billion-Word and Yelp infilling benchmarks, DDOT outperforms diffusion baselines and matches the quality of leading non-autoregressive systems while retaining parallel generation efficiency.

Method Overview

DDOT architecture with joint token and position diffusion
  • Joint denoising: A Diffusion Transformer predicts token scores and position velocities in one forward pass, keeping generation parallel while learning to move masked spans.
  • Optimal transport guidance: Balanced prompt matching and injective response matching form order-preserving trajectories that clamp prompt order yet allow responses to weave between them.
  • DDOT-R vs. DDOT-U: Random terminal sampling provides diverse placements, while uniform grids (DDOT-U) curb pad clustering and consistently deliver the strongest metrics.

Results at a Glance

One-Billion-Word (Block)

DDOT-U: BLEU-4 58.4 · NIST-4 8.79 · METEOR 42.1 · Success 100%

+16.4 success points over PoP while improving BLEU-4 by 8.4.

Yelp (Block)

DDOT-U: BLEU-4 59.5 · NIST-4 8.86 · METEOR 42.9 · Success 100%

Matches the best baselines while keeping every prompt token in place.

CodeParrot (Random)

DDOT-U: BLEU-4 40.4 · CodeBLEU 45.4 · Success 11.2%

Improves CodeBLEU by 25.9 points over PoP for Python infilling.

BibTeX

@inproceedings{zhang2025ddot,
  title={Flexible-length Text Infilling for Discrete Diffusion Models},
  author={Zhang, Andrew and Sivakumar, Anushka and Tang, Chiawei and Thomas, Chris},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  publisher={Association for Computational Linguistics},
  year={2025},
  url={https://arxiv.org/abs/2506.13579}
}