Transparency for whom? Designing data documentation with data workers
The lack of transparency in datasets poses a significant challenge to creating inclusive and intelligible machine learning (ML) systems. Various AI ethics initiatives have addressed this issue by proposing standardized dataset documentation frameworks based on the value of transparency. In this talk, I propose a shift of perspective: from documenting for transparency to documenting for reflexivity. Based on a long-term project with outsourced data workers in Argentina, Bulgaria and Syria, I argue for the need of designing documentation starting from the needs and experience of the workers who collect, sort, and label the data that trains ML models. This requires considering the historical inequalities, working conditions, and epistemological standpoints that shape both data work and datasets.
The lack of transparency in datasets poses a significant challenge to creating inclusive and intelligible machine learning (ML) systems. Various AI ethics initiatives have addressed this issue by proposing standardized dataset documentation frameworks based on the value of transparency. In this talk, I propose a shift of perspective: from documenting for transparency to documenting for reflexivity. Based on a long-term project with outsourced data workers in Argentina, Bulgaria and Syria, I argue for the need of designing documentation starting from the needs and experience of the workers who collect, sort, and label the data that trains ML models. This requires considering the historical inequalities, working conditions, and epistemological standpoints that shape both data work and datasets.