Tools for data protection and secure data processing
In health research, datasets often contain personal information that must be especially protected during analysis. Anonymisation is an important method for this: data are modified in such a way that the risk of identification is minimised. NFDI4Health brings together tools, provides support for data anonymisation, and develops approaches for assessing re-identification risks.

Background
Anonymisation can be an important component of data-sharing processes to protect personal information and privacy. It involves complex procedures in which it is usually not sufficient simply to remove directly identifying features such as names or dates of birth. Instead, statistical models are used to anonymise datasets according to a defined level of protection, as well as to measure and reduce re-identification risks. NFDI4Health recommends the use of existing, robust implementations and supports researchers in integrating suitable anonymisation methods into their research practice. To this end, NFDI4Health provides an overview of open-source anonymisation tools and offers support for the ARX Data Anonymisation tool and other applications. In addition, a method for analysing re-identification risk has been developed.
Open-source Anonymisation Tools
Anonymisation methods are complex, and the landscape of available open-source tools is often difficult to navigate due to differences in functionality and maturity. To provide researchers with guidance, NFDI4Health has created a comprehensive overview of open-source anonymisation tools for tabular data, helping users select the most appropriate tool and method depending on data type and context.
ARX Data Anonymisation Tool
The ARX Data Anonymisation tool was developed by medical informatics at the BIH @ Charité, and NFDI4Health provides researchers with support via a helpdesk. The internationally established open-source software ARX includes a wide range of anonymisation methods, including masking (replacing real values with fictional ones) and generalisation (replacing precise information such as dates of birth with broader categories). ARX enables the application of various privacy models such as k-anonymity, differential privacy, and t-closeness to effectively prevent re-identification.
Method for re-identification risk analysis
One approach to protecting sensitive health data is to quantify residual privacy risks in anonymised datasets. This allows the anonymisation method to be validated and, where necessary, optimised. To this end, NFDI4Health has developed a method for re-identification risk analysis for tabular data.
Relevant publications
Haber AC, Sax U, Prasser F; NFDI4Health Consortium. Open tools for quantitative anonymization of tabular phenotype data: literature review. Brief Bioinform. 2022 Nov 19;23(6):bbac440. https://doi.org/10.1093/bib/bbac440.
Meurers T, Halilovic M, Otte K, Despraz J, Kaabachi B, Kulynych B, Raisaro JL, Prasser F. Phantom Anonymization: Adversarial testing for membership inference risks in anonymized health data. Comput Biol Med. 2025 Sep;196(Pt A):110738. https://doi.org/ 10.1016/j.compbiomed.2025.110738.
Kühnel L, Schneider J, Perrar I, Adams T, Moazemi S, Prasser F, Nöthlings U, Fröhlich H, Fluck J. Synthetic data generation for a longitudinal cohort study - evaluation, method extension and reproduction of published data analysis results. Sci Rep. 2024 Jun 22;14(1):14412. https://doi.org/10.1038/s41598-024-62102-2.
Adams T, Birkenbihl C, Otte K, Ng HG, Rieling JA, Näher AF, Sax U, Prasser F, Fröhlich H; Alzheimer’s Disease Neuroimaging Initiative. On the fidelity versus privacy and utility trade-off of synthetic patient data. iScience. 2025 Apr 14;28(5):112382. https://doi.org/10.1016/j.isci.2025.112382.
Francis P, Jurak G, Leskošek B, Otte K, Prasser F. Comparison of Three Anonymization Tools for a Health Fitness Study. Sci Data. 2025 Sep 18;12(1):1548. https://doi.org/10.1038/s41597-025-05823-x.
Deutsch