AI Summary • Published on Apr 14, 2026
Business Process Modeling (BPM) is a fundamental activity in Business Process Management, but it is often a complex and time-consuming task requiring specialized expertise. This frequently leads to outdated, incomplete, or inaccurate process models. Recent advancements in Large Language Models (LLMs) have sparked interest in automating or assisting these modeling tasks from natural language descriptions. However, it remains unclear how effectively LLMs can support complex process modeling in real organizational environments, which are characterized by fragmented documentation, implicit domain knowledge, and evolving requirements. Existing "naive" LLM approaches often struggle with semantic interpretation, reasoning, and the structural rigor demanded by notations like BPMN, failing to capture the inherent ambiguity of natural language or the distributed nature of process knowledge beyond explicit textual input. This review addresses the core challenge of how LLMs can genuinely support BPM in these complex, real-world organizational contexts.
The authors conducted a structured literature review, adhering to methodological principles for rigor and transparency, though not claiming a fully systematic review due to the nascent stage of the field. The review process involved three main phases: Planning (defining motivation and research questions), Conducting (structured search across Web of Science, IEEE Xplore, and ACM Digital Library using a comprehensive search string combining terms related to "translation," "AI methods" including LLMs, "business process domain," and "BPMN," followed by a rigorous selection based on exclusion criteria), and Reporting (synthesizing and analyzing the findings). The identified approaches were classified into Generative AI (GenAI) and non-Generative AI (NoGenAI), with GenAI encompassing transformer-based models. The study meticulously examined how LLMs are integrated into text-to-BPMN pipelines, focusing on instruction methods such as prompt engineering (including in-context learning, few-shot prompting, and knowledge injection), fine-tuning for specialized domains or strict representations, the use of intermediate representations (like formal modeling languages, DSLs, or structured schemas) for model generation, and iterative refinement mechanisms (including automated feedback loops, self-correction, candidate generation, and human-in-the-loop interaction).
The review reveals a significant shift from traditional rule-based and NLP pipelines toward LLM-based architectures for business process modeling, with GenAI approaches surpassing NoGenAI by 2024-2025. LLMs offer enhanced capabilities in contextual understanding, reasoning over event sequences, and generating structured outputs directly from natural language. Key advantages include rapid adaptation through prompt engineering, multi-source information integration, process variant detection, support for iterative model refinement, and a lower technical barrier for users due to conversational interfaces. However, the study also highlights persistent limitations, such as LLM-specific issues like hallucinations, limited reasoning, and prompt sensitivity, as well as linguistic challenges related to ambiguity and complex control flows. Data constraints, including scarcity of high-quality datasets and privacy concerns for commercial APIs, are significant hurdles. Methodological gaps include fragmented evaluation practices, a need for better human-in-the-loop integration, unexplored modalities, and a lack of real-world validation. Evaluation is fragmented into benchmark-oriented studies, practical tool assessments by experts, and usability studies for human-centered systems, making direct comparisons difficult. Reproducibility is also a concern due to evolving LLM APIs and enterprise privacy constraints, while training data often lacks the complexity of real-world scenarios. Furthermore, BPMN itself presents challenges due to its complexity and lack of formal semantics, leading to disparate interpretations.
The paper concludes that advancing the field of LLM-assisted business process modeling requires more than just improving generation accuracy; it necessitates integrating contextual awareness, iterative refinement, methodological rigor, and reproducibility into these systems. Future research directions are identified, particularly emphasizing the role of Retrieval-Augmented Generation (RAG) for incorporating external contextual knowledge and organizational best practices. Mechanisms for preserving contextual continuity across modeling iterations are crucial to support the inherently iterative nature of human process modeling. The development of more comprehensive, standardized, and transparent evaluation protocols, such as the BEF4LLM framework, is also vital to address the current fragmentation and enable meaningful comparisons across approaches. The authors propose hypotheses suggesting that fine-tuning Small Language Models (SLMs) and LLMs with complex, well-formed, machine-interpretable business processes, especially when co-adjuvated with RAG, could lead to significantly improved performance over current methods. This calls for a shift from one-shot generation to architectures that foster human-AI collaboration and support the progressive construction of shared process understanding in dynamic organizational settings.