For planning a clinical trial (including design and power calculation), information is needed on the disease, the target population and the biometrical properties of the endpoints used. Clinical trial planning should be based on best available evidence. However, in many cases, published results of prior or similar trials are available, but relevant details are not provided. Examples of such information are statistical properties of certain variables in a target patient population, time course of response variables, statistical properties of before-after differences, effect sizes of treatment differences in trial endpoints, distribution of prognostic factors in the target population, frequency of conditions considered as exclusion criteria in the intended trial, and many more.
Clinical trials prospectively collect data based on a predefined documentation concept derived from the study protocol, which often includes several hundred variables in structured case report forms. Great efforts are made to specify and enforce quality metrics for each data element (e.g., mandatory fields, formatting requirements, units of measurement, reference intervals, query actions). While local investigators through the study protocol know the exact meaning of variables and measurement methods, specific training or standard operating procedures (SOPs), it is difficult for external researchers to understand the exact meaning because of lacking or limited descriptions. A detailed description incl. concepts from medical terminologies, plausibility checks and information on the origin of the data (inquired/measured/calculated/ transferred from secondary system), would increase the quality of the documentation concepts and support the reuse of the data. In the long end, harmonization and standardization of clinical trial data elements would be helpful both in the creation of new study concepts (since one could refer to gold standard data elements) and in meta-analyses comparing studies on the same topic.
Clinical trials centers (CTC) at German medical faculties support a wide range of academic, publicly funded clinical trials. About 400 academic clinical trials are activated in Germany each year. Once a clinical trial is completed and its results published, data should be made available for scientific reuse. This is strongly encouraged by public funders of clinical trials such as DFG or BMBF. Currently, no infrastructure exists to support clinical trials data sharing concerning findability, accessibility and reusability and in compliance with the legislative and regulatory requirements for personal medical data.
To overcome these limitations, this use case will first develop a catalogue of typical characteristics of clinical data sets (e.g., disease area, type of clinical trial, trial unique identifier, trial design characteristics, type of intervention, full text search in the trial synopsis, target population, outcome variable, type of therapy) in collaboration with T2.1 and T2.2. The characteristics will be implemented as searchable and filterable facets in T3.1. Together with TA6, concepts for different use and access mechanisms will be developed and thus implemented in T3.4 or as distributed analysis in T3.7. As part of this activity, the implemented service and their orchestration will be evaluated in real-life scenarios in a CTC.
Third, we will define and consent a metadata catalogue for top 750 most used data elements in typical academic clinical trials. For this, at least two CTCs will provide the annotated CRFs – preferably in a machine-readable format like CDISC ODM – for ten different trials each. The metadata definitions will be extracted in collaboration with T2.2 as recommendations for the creation of data dictionaries. Then metadata definition will be curated and completed based on best practices specified by T2.2 utilizing T3.2. Based on the standards defined in T2.2, to support semantic interoperability, data elements will be annotated with concepts from medical terminologies (LOINC, SNOMED, CDASH, UMLS, MeSH) and local metadata will be mapped to a central catalogue consisting of different domains (as defined in CDISC CDASH). Metrics for computing similarity of data elements beyond trivial comparisons of lexical labels will be implemented. Finally, central metadata will be harmonized (in collaboration with T2.2). This activity will be undertaken in alignment with the interoperability working group of the MII.
Deliverables of Task 5.4