An alternative LLM architecture: Diffusion Large Language Models

This project explores Diffusion Large Language Models (dLLMs) as a next-generation alternative to traditional autoregressive language models. Unlike conventional LLMs that generate text sequentially, dLLMs produce text through an iterative denoising process, enabling global refinement, improved coherence, and greater flexibility in generation. The research investigates fundamental challenges in training and controlling dLLMs, including reinforcement learning for reasoning tasks, token-level revision during generation, and hybrid architectures that combine diffusion-based reasoning with autoregressive decoding. It also examines practical applications such as AI agents, controllable writing systems, and interactive text editing, while exploring how dLLMs can accelerate language model inference through speculative decoding. The project aims to advance both the theoretical understanding and real-world deployment of diffusion-based language models, helping establish them as a scalable and versatile foundation for future AI systems.

Faculty Mentors

NYU Shanghai

NYU Abu Dhabi