Assessing Identity Disclosure Risk in the Absence of Identified Datasets in the Public Domain
dc.contributor.author | Peter N. Muturi | |
dc.contributor.author | Andrew M. Kahonge | |
dc.contributor.author | Christopher K. Chepken | |
dc.date.accessioned | 2024-11-17T17:51:53Z | |
dc.date.available | 2024-11-17T17:51:53Z | |
dc.date.issued | 2024-07-17 | |
dc.description.abstract | Data release is essential in supporting data analytics and secondary data analyses. However, data curators need to ensure the released datasets preserve data subjects’ privacy and retain analytical utility. Data privacy is achieved through the anonymisation of datasets before release.The risk of disclosure posed to the dataset should inform the level of anonymisation to be undertaken. As anonymisation achieves data privacy, it reduces the analytical utility of the dataset by introducing alterations to the original data values. Therefore, data curators require an appropriate estimate of the dataset’s identity disclosure risk to inform the required anonymisation that balances privacy and utility. The disclosure risk varies from one geographical region to another due to varying enabling factors. This paper assesses the disclosure risk and the enabling factors in an environment lacking identified datasets in the public domain. This study used a quasi-experimental design in carrying out an empirical identity disclosure test, where respondents were given an anonymised dataset and were required to disclose the identity of any of the records. The findings were that background knowledge of the released datasets was the primary enabler in the absence of identified datasets. Respondents could only disclose records in the dataset they had familiarity with. However, the disclosure risk was within an acceptable threshold. Therefore, the study concluded that in an environment lacking identified datasets in the public domain, reasonable anonymisation could achieve a balance of privacy and utility in datasets. The findings justify private data release able to support data analytics and secondary data analyses in environments lacking identified datasets in the public domain. | |
dc.identifier.uri | https://erepository.ouk.ac.ke/handle/123456789/1484 | |
dc.publisher | East African Journal of Information Technology | |
dc.title | Assessing Identity Disclosure Risk in the Absence of Identified Datasets in the Public Domain |