Abstract
Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Bug reports typically contain these inputs, which developers extract to facilitate debugging. Since bug reports are written in natural language, prior research has leveraged various Natural Language Processing (NLP) techniques for automated input extraction. With the advent of Large Language Models (LLMs), an important research question arises: how effectively can generative LLMs extract failure-inducing inputs from bug reports? In this paper, we propose LLPut, a technique to empirically evaluate the performance of three open-source generative LLMs-LLaMA, Qwen, and Qwen-Coder-in extracting relevant inputs from bug reports. We conduct an experimental evaluation on a dataset of 206 bug reports to assess the accuracy and effectiveness of these models. Our findings provide insights into the capabilities and limitations of generative LLMs in automated bug diagnosis.
Recommended Citation
A. A. Hasan et al., "LLPut: Investigating Large Language Models for Bug Report-Based Input Generation," Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 1652 - 1659, Association for Computing Machinery, Jul 2025.
The definitive version is available at https://doi.org/10.1145/3696630.3728701
Department(s)
Computer Science
Publication Status
Open Access
Keywords and Phrases
Bug Report; Empirical Analysis
International Standard Serial Number (ISSN)
1539-7521
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2025 Association for Computing Machinery, All rights reserved.
Publication Date
28 Jul 2025

Comments
National Science Foundation, Grant CCF-2348277