Data Harmonisation and Publication - bridging across the health domains through standards
To make health studies and their data FAIR (Findable, Accessible, Interoperable and Reusable) we have developed a publication policy and a metadata schema that provide standards for description and sharing of data.
The publication policy describes recommendations and requirements for the publication of research data from health-related studies, with a focus on the services developed by NFDI4Health. It classifies resource types to be published, such as study descriptions or types of study documents or data collections, outlines licensing for such resources and usage of universal identifiers, as well as defines formatting and metadata description requirements for the published resources and their data.
For the metadata that describes the resources, the publication policy refers to the tailored NFDI4Health metadata schema, which is implemented in our services. This metadata schema combines common elements and their controlled vocabularies relevant for all domains and use cases covered by NFDI4Health. It is designed in a modular fashion and also comprises modules that are more specific for certain sub-domains.
Publication policy
One of the purposes of NFDI4Health is to make data collected in health-related studies FAIR. The first step towards this is the publication of comprehensive information about the context and accessibility of the data and metadata. In addition, study documents contain further detailed information needed for the correct interpretation of the collected data and should therefore be published as well. The NFDI4Health publication policy describes the requirements for the publication of research data from health-related studies in the German Central health Study Hub. Researchers aiming at making their studies discoverable and corresponding research data FAIR are highly encourged to stick to the NFDI4Health standards.
Metadata schema
To make clinical, epidemiological, and public health research data FAIR (Findable, Accessible, Interoperable and Reusable), the NFDI4Health metadata schema enables the standardised publication of health resources’ metadata on the German Central Health Study Hub. Though originally developed by the NFDI4Health Task Force COVID-19 and tailored to COVID-19 studies, the generic nature of the schema enables the registration of further types of resources such as registries, secondary data sources, and various study documents. The schema also extends to other health domains by adopting a modular structure, comprising core and domain-specific metadata items in generic and use-case-dedicated modules. Most items were primarily adapted from established standards and models, including DataCite, ClinicalTrials.gov, DRKS, Maelstrom, and MIABIS.
The schema’s core module captures information commonly collected by any type of health resource, while further resource-type- and/or use-case-specific modules gather descriptions of resources of certain types or belonging to certain health domains. Bibliographic information, such as the resource’s title, description, and acronyms, is included in the core module, along with information about contributors and identifiers of relevant resources registered on the health study hub or elsewhere. Items to trigger other modules as well as provenance details about the publication of the resource are also included.
For design and data access information, the schema provides a design module, comprising characteristics pertaining to certain resource types. For studies and substudies, the module distinguishes between interventional and non-interventional study designs and provides dedicated sections for aspects of each design type. The module also provides descriptive information about the study conditions and population, including recruitment area and sample size information. An administrative information section covers details about the ethics committee approval, status, and dates of the study, along with dedicated sections for eligibility criteria and outcome measures/time points. Information about data sharing is also included, triggering the record linkage module, when applicable. Due to multiple overlapping characteristics, most sections also apply to registries and secondary data sources.
The nutritional epidemiology module provides domain-specific information, mainly related to dietary assessment instruments applied in relevant studies. The chronic diseases module specifies whether prevalent or incident disease data were collected and indicates the sources from which the data were generated. The third dedicated module provides legal, consent, and budget information required for enabling record linkage. Modules providing clinical trials and imaging/radiomics metadata are yet to be implemented. All modules incorporate mandatory, conditional, and optional items.
The schema is currently available in human-readable Excel format. Yet, towards a machine-readable version, it has been represented in ART-DECOR and mapped to HL7's FHIR. Accordingly, FHIR profiles have been created and published on Simplifier. To facilitate sharing metadata by data holding organisations, the schema is also being implemented by Local Data Hubs at several NFDI4Health partner locations.
The current version V3_3 of the NFDI4Health metadata schema can be found here.
Our Services
Health Study Hub
The German Central Health Study Hub allows researchers to publish their project characteristics, documents and data related to their research project in a FAIR manner or to find information about past and ongoing studies.
Data Train
The Data Train cross-disciplinary graduate training programme, a core element of the NFDI4Health training approach, aims at building the next generation of data-savvy researchers in the biomedical sciences.
Personal Health Train
Local Data Hub
Data publication
Health data, as collected in clinical trials and epidemiological, as well as public health studies, cannot be freely published, but are valuable datasets whose reuse is of high importance for health research. NFDI4Health has established a metadata standard and process for the publication of health studies to make health data FAIR.