Aller au contenu
Accueil > AI: A Lever for ‘Decolonizing’ Archives? Web Archives as a Datafield for Critical and Inclusive Uses of AI in History

AI: A Lever for ‘Decolonizing’ Archives? Web Archives as a Datafield for Critical and Inclusive Uses of AI in History

Intervention de :

Conférence internationale : RESASW 2025 - The Datafied Web

Date : 6 juin 2025

Lieu : Siegen, Allemagne

Organisation : Marcus Burkhardt, CRC, Siegen University, GE Carolin Gerlitz, Media Studies, Speaker CRC Media of Cooperation (DFG), Siegen University, GE Sebastian Gießmann, CRC, Siegen University, GE Valérie Schafer, C²DH, University of Luxembourg Inga Schuppener, CRC,


Présentation de l'intervention

Concluded in 2024, the European program Polyvocal Interpretation Of Contested Colonial Heritage (PICCH) aimed to explore how archival documents created from a colonial perspective could be reappropriated and reinterpreted to become an effective source for constructing an inclusive future society. In France, the term ‘decolonization’ has been heavily instrumentalized, losing the profound meaning attributed to it by historical thinkers like Achille Mbembe. In this project, decolonizing French television and web archives aims to make these materials from former colonial powers more inclusive and respectful towards populations still facing discrimination today, challenges that have been driving archivists worldwide for years (Ghaddar & Caswell, 2019). One of the project’s objectives was to refine the metadata of television archives as well as web data concerning narratives of events related to the colonial past or post-colonial issues. We scrutinized the media coverage of the 1983 March for Equality and Against Racism from a transmedia perspective, based on web video corpora and archived web pages from the INA. One of the goals was to examine the visibility accorded by the media to the marchers themselves: in 1983, they were young suburbanites, born to immigrant parents in French urban suburbs, perceived as Maghrebi or Black, leading to an essentialization of the discourse on this event in the media. The marchers were relegated to the periphery of the journalistic narrative from the 1980s until more recent commemorations, and they utilize the web to reclaim the narrative of this event. Given the volume of data (archived web pages, voice-over text from videos, video metadata), we employed AI programs to automate the identification of the marchers, whether through text (names, nicknames) or through their faces in the videos.

Based on this case study, this paper will eschew the interpretation of the online media coverage of the march to concentrate on the methodological and hermeneutical questions raised by cultural biases when employing deep learning AI programs to analyze web data. It seeks to investigate under what conditions the application of AI programs to analyze archived web data can enhance the consideration of marginalized historical actors in the analysis of contemporary transmedia narratives.

Firstly, we will present the corpus and methodology used to study the media treatment of the marchers in the television and web archives of the Institut national de l’audiovisuel. Secondly, we will review the application of AI to these corpora, focusing on the significance of cultural biases in data processing through two examples: the thematization of text from HTML pages archived by the INA in 2013 and automated visual recognition in videos. Finally, we will consider the lessons learned from this experience and propose hermeneutic and ethical reflections for web historians confronted with hegemonic biases in the processing of web data.


Plus d'informations : https://www.mediacoop.uni-siegen.de/datafiedweb/