In this post, we explain the basics behind our paper “VERITE: a robust benchmark for multimodal misinformation detection accounting for unimodal bias”, by Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos and Panagiotis C. Petrantonakis, which has been published in the International Journal of Multimedia Information Retrieval (IJMIR).
Given the rampant spread of misinformation, the role of fact-checkers becomes increasingly important and their task more challenging given the sheer volume of content generated and shared daily across social media platforms. Studies reveal that multimedia content, with its visual appeal, tends to capture attention more effectively and enjoys broader dissemination compared to plain text (Li & Xie, 2019) and the inclusion of images can notably amplify the persuasiveness of false statements (Newman et al., 2012). It is for that reason that Multimodal Misinformation (MM) is very concerning. MM involves false or misleading information disseminated through diverse “modalities” of communication, including text, images, audio, and video. Scenarios often unfold where an image is removed from its authentic context, or its accompanying caption distorts critical elements, such as the event, date, location, or the identity of the depicted person. For instance, in the above figure (a) we observe an image where the grounds of a musical festival are covered in garbage, accompanied by the claim that it was taken in June 2022 “after Greta Thunberg’s environmentalist speech”. However, the image was removed from its authentic context, since it was actually taken in 2015, not 2022.