Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties.
Read the full articel: https://doi.org/10.1093/bib/bbac440