Providing Context for Hate Speech Classifiers using Post-hoc Explanations

Puspendu Biswa, Donavalli Haritha

doi:10.52783/cana.v32.5364

PDF

Published: May 10, 2025

DOI: https://doi.org/10.52783/cana.v32.5364

Puspendu Biswa, Donavalli Haritha

Abstract

Hate speech classifiers trained on imbalanced datasets often struggle to distinguish whether group identifiers such as “gay” or “black” are being used in offensive or prejudiced contexts. This bias leads to false positives when these terms appear, as models fail to grasp the contextual nuances that define hateful usage. To address this, we extract SOC (Jin et al., 2020) post-hoc explanations from fine-tuned BERT classifiers to efficiently identify bias against identity terms. Building on these insights, we introduce a novel regularization technique that leverages these explanations to help models learn from the surrounding context of group identifiers, rather than relying solely on the identifiers themselves. Our approach outperforms baseline methods by reducing false positives on out-of-domain data while maintaining or enhancing performance on in-domain data.

Issue

Vol. 32 No. 10s (2025)

Section

Articles

Announcements

Call for Papers

Call for Papers for the Upcoming Issue.

Last Date of Submission: April 30^th, 2026

Call for Reviewers

Call for Editorial Member/ Reviewers Submitting your Application
If you would like to apply for the position of an Editorial Board Member on the journal, please contact the Editor including your CV and a brief covering letter detailing why you are a suitable candidate, to editor@internationalpubls.com. Your cover letter should be no longer than one page and should cover where you believe the research field is going (and the journal's place within it), as well as details of any previous relevant journal editorial and peer review management experience.