Abstract
Use of figurative language, such as metaphors and idioms, is common in our daily-life communications, and it can also be found in Software Engineering (SE) channels, such as comments on GitHub. Automatically interpreting figurative language is a challenging task, even with modern Large Language Models (LLMs), as it often involves subtle nuances. This is particularly true in the SE domain, where figurative language is frequently used to convey technical concepts, often bearing developer affect (e.g., 'spaghetti code). Surprisingly, there is a lack of studies on how figurative language in SE communications impacts the performance of automatic tools that focus on understanding developer communications, e.g., bug prioritization, incivility detection. Furthermore, it is an open question to what extent state-of-the-art LLMs interpret figurative expressions in domain-specific communication such as software engineering. To address this gap, we study the prevalence and impact of figurative language in SE communication channels. This study contributes to understanding the role of figurative language in SE, the potential of LLMs in interpreting them, and its impact on automated SE communication analysis. Our results demonstrate the effectiveness of fine-tuning LLMs with figurative language in SE and its potential impact on automated tasks that involve affect. We found that, among three state-of-the-art LLMs, the best improved fine-tuned versions have an average improvement of 6.66% on a GitHub emotion classification dataset, 7.07% on a GitHub incivility classification dataset, and 3.71% on a Bugzilla bug report prioritization dataset.
Recommended Citation
M. M. Imran et al., "Shedding Light on Software Engineering-Specific Metaphors and Idioms," Proceedings - International Conference on Software Engineering, pp. 2555 - 2567, Association for Computing Machinery, Jan 2024.
The definitive version is available at https://doi.org/10.1145/3597503.3639585
Department(s)
Computer Science
Publication Status
Open Access
Keywords and Phrases
Affect Analysis; Bug Prioritization; Emotion Classification; Figurative Language; Incivility Classification; Large Language Models; Repository Mining
International Standard Serial Number (ISSN)
0270-5257
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2024 The Authors, All rights reserved.
Publication Date
01 Jan 2024