Abstract

Online hatred has become an increasingly pervasive issue, affecting individuals and communities across various digital platforms. To combat hate speech in such platforms, counter narratives (CNs) are regarded as an effective method. In recent years, there has been growing interest in using generative AI tools to construct CNs. However, most of the generative models produce generic responses to hate speech and can hallucinate, reducing their effectiveness. To address the above limitations, we propose a counter narrative generation method that enhances CNs by providing non-aggressive, fact-based narratives with relevant background knowledge from two distinct sources, including a web search module. Furthermore, we conduct a comprehensive evaluation using multiple metrics, including LLM-based measures for persuasion, factuality, and informativeness, along with human and traditional NLP evaluations. Our method significantly outperforms baselines, achieving an average factuality score of 0.915, compared to 0.741, 0.701, and 0.69 for competitive baselines, and performs well in human evaluations.

Department(s)

Computer Science

Publication Status

Open Access

Comments

University of Illinois at Urbana-Champaign, Grant ELE230014

Keywords and Phrases

Counter narrative; Fact-based narrative; Hate speech; Large language model

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2025 Association for Computing Machinery, All rights reserved.

Publication Date

28 Apr 2025

Share

 
COinS