Transfer of clinical trial data
Instead of the conventional approach of receiving data for pharmacometric analysis after the trial completion, an iterative data transfer and reproducible data handling workflow was developed by consensus between the clinical, data management, and pharmacometric teams who collaborated in this clinical trial. Data QC and review responsibilities were shared between the clinical, data management, and pharmacometric teams. Figure2 shows the participation in the trial from start of enrolment to final visit. First data was transferred once data management procedures were developed, which occurred as early as 16% enrolment. In total, 41 data transfers occurred on average every 1.8weeks. The Screening and enrolment database was locked within three weeks after enrolment completed. Trial participation showed a slight decrease between the last participant in and first end of study visit (first participant out, April 2021) due to withdrawal and lost to follow-up or death (n=11 and n=2, respectively, in that interval). The full database was locked within 5 working days after trial completion, after which unblinding took place.
Overview of the data management throughout the clinical trial. Number of participants on trial over time is shown in purple solid line, database locks (n=2) are shown in black dashed lines, scheduled data review (n=41) are shown as magenta top rug plot, first reported diagnosed COVID-19 case in South Africa is shown as grey bottom axis mark for reference.
The frequent interim data QC by the clinical, data management, and pharmacometric teams was a time-saving investment. All records were subject to check after entry into the eCRF, and the clinical QC and data management combined found in 20.9% of the records that a correction was needed when the eCRF was compared to the paper source document. A total of 201 queries accounting for 10.7% of total records were found by the pharmacometrics team and resolved while the trial was ongoing. The last data check after the trial completed only resulted in 4 additional queries which were resolved in two days, after which the data could be locked. In addition to saving time after study completion, addressing queries while the study is still ongoing was also found to be advantageous because an incorrect measurement (e.g. weight) can still be re-measured and recorded. Pharmacometric analysis (magenta hexagon in Fig.1) could commence practically immediately after trial completion because of this streamlined review process (other magenta, purple, and black boxes in Fig.1). Best practices and examples of the data review are described below.
Data review was challenging because of the large size of the database. The full database consisted of 20,457 records. Figure3 shows the database architecture including number of records per master database. The four master databases Enrolment/screening, Events, Lab results, and Follow-up, contained 24, 13, 12, and 10 datasets in .dat format, respectively (Supplementary Table I). Each dataset came with a metadata file in .stsd format reporting on each variable, possible values, and units. All records were linked through the participants ID (n=1000) in the integrated database.
Number of eCRFs submitted per master database. Dataset architecture consisted of 4 master databases (Screening/enrolment, Events, Lab results, and Follow-up) for which the number of records is shown.
Most important in the pharmacometric data review was the understanding of the clinical meaning of the data entries. For example, COVID-19 was defined as a symptomatic disease with confirmed SARS-CoV-2 infection. As such, a COVID-19 event with a health status score of 0, or a polymerase chain reaction (PCR) confirmed asymptomatic SARS-CoV-2 infections with a health status of 1 or higher, would result in a query directed to the clinical team on how to interpret these results. The records would subsequently be corrected in the next data transfer for the health status score to reflect the event definition. Another example was post viral syndrome, i.e. long COVID. A record without a preceding COVID-19 event would also result in a query.
The consistency review between the different databases and datasets mostly focussed on the Events master database. The weekly health status score was captured in two different datasets; in the original Events dataset for the first observation(s) and thereafter in the Follow-up dataset. In the integrated datasets, these weekly health status scores were merged and checked for consistency. Where different health status scores for a single week were reported, or where the number of weekly scores did not equal the number of weeks an event was ongoing, a query was opened. Each event had a unique event number, so duplicate event numbers were flagged to the data management team. Consistency between Follow-up and Events master databases was important because participants self-reported COVID-19 events during the follow-up contact, which would result in a record in the Events master database when symptomatic. Consistency of dates between the Lab, Follow-up, and Events master databases was checked to prevent ongoing events after trial completion.
Records were checked for missing or not applicable (NA) values. Additionally, dates (negative timepoints, the same record with different dates), MedDRA event descriptions, and spelling were checked. Spelling was a noteworthy issue where COVID-19 was recorded with 63 different spelling alternatives, including COVID-19, COVID 19, COVID, COVID-19 infection, COVID-19 pneumonia, COVID-19 respiratory tract infections, while post COVID viral syndrome was recorded in 10 different alternatives. Therefore, the MedDRA term initially utilized, but unfortunately also contained two alternative spellings for both. From this insight, the MedDRA numerical codes were included into the data processing.
The initial analysis workflow evolved over time with new information and methods arising during the pandemic which were unknown at database setup. Post viral syndrome after COVID-19, also coined long COVID34,35,36, was one example, which was first reported on trial in August 2020. Discussions on long COVID developed around two points. First, the link between COVID-19 and long COVID was important to be established, by assigning those events the same event number. Second, long COVID could very well last longer than the maximum 12weeks for which the eCRF was equipped. An additional data field was incorporated to record health status scores needed after week 12. Measurement of SARS-CoV-2 antibodies was approved by South African regulators in August 2020, and first results were reported to site in October 2020. This led to discussions around participants who were SARS-CoV-2 antibody positive at baseline, participants who were SARS-CoV-2 antibody negative after confirmed COVID-19, and on how to interpret reversal of seroconversion from positive to negative. Globally, SARS-CoV-2 specific vaccinations were first approved in December 2020 but became only available in South Africa in February 2021. Understandably, health care workers were among the first to be vaccinated with specific COVID-19 vaccines, which needed to be recorded in the database for appropriate censoring in the pharmacometric analyses. Regarding handling of events ongoing after the final (week 52) study visit consensus was reached to allow ongoing events after the final visit if the event was an important endpoint of the trial, for example COVID-19 events or respiratory tract infections in general that were symptomatic at the final visit. Acute events would be followed-up until resolution of symptoms, while chronic events like post viral syndrome would not.
Interoperability between members of the pharmacometric team was essential to divide the work with the short timelines. The pharmacometric processing script was stored in a private Github repository where multiple coders could work simultaneously. Through Github, changes to parts of the script by team members could be reviewed and incorporated into an updated version, all while tracking these changes and being able to revert to an earlier version in case of debugging. Additionally, the file structure between pharmacometricians was standardized, so only the path to the working directory needed to be changed relative to which all other files were inputted or outputted. The path to the working directory was automatically called at the start of the script based on an if-statement with the systems info of the users machine (Fig.4). Interoperability was also improved by using clear, transparent, and well commented coding. The Tidyverse packages including the magrittr pipe operator (%>%) allowed for better readable and interpretable code25,27. Interoperability between data management and pharmacometric teams was ensured by naming standards for the four master databases.
Interoperability through standardized file structure and automatic extraction of working directory using the systems info. The ifelse() statement can be expanded with nested ifelse() statement for more collaborators.
The pharmacometric team prepared the data reports for the DSMB to review the safety and efficacy of the ongoing trial. Because of the time-sensitive nature of the vaccination trial, initially biweekly reporting was proposed, which was later amended to a lower frequency by request of the DSMB and the clinical team because of reduced clinical urgency. Two types of reports were prepared. The open report showed the data aggregated which was open to review for the whole clinical trial, while the closed report showed the blinded data per study arm for the closed session of the DSMB. The pharmacometric processing script was developed to automatically generate a report based on the integrated database, to prevent repetitive manual report drafting with the suggested frequency. Using this method, a transparent and reproducible workflow was established from the raw eCRF input through to the DSMB report. RMarkdown was used to integrate the R-based processing of the integrated database with Markdown and LaTeX text compilers to create a report in pdf format in which the numerical, graphical, and tabular elements were automatically updated with each compilation (Fig.5A).
RMarkdown was used to combine text and R variables in the automatically generated report. (A) In-line calling of R variables to include them in a written sentence, (B) R variable CLOSED was used to switch between open and closed reporting using if-statements for tables and graphs called in R-chunks or (C) called in in-line R calls.
To create the two versions of the report in a consistent manner, an R-variable was integrated into the relevant numerical, graphical, and tabular elements where aggregated or per-arm data was reported. This had the advantage of not having to work in two RMarkdown scripts at the same time with the risk of inconsistencies and code conflicts that occur when coding even when working as diligently as possible. As a result, the open and closed reports showed the exact same data with the only difference being the presentation of these data. The switch-variable (CLOSED) was used in if-statements throughout the report to show figures and tables either aggregated or per arm (Fig.5B), as well as in R-code that was called in-line in the RMarkdown file (Fig.5C).
When the DSMB meeting schedule was set, a corresponding data transfer schedule was set. On average, the DSMB received the compiled and reviewed report within 3days after the cut-off date of the data, including the final unblinded report. The DSMB repeatedly expressed their appreciation for these excellent turnaround times.
The pharmacometric processing script was also developed to include the pharmacometric analysis dataset creation. This resulted in a transparent, traceable, and version-controlled workflow from the raw eCRF input data to the analysis dataset in NONMEM format. Moreover, because the same script and integrated database was utilized to that aim, the datasets were consistent with the figures and tables in the DSMB reports.
The reproducible workflow and subsequent confidence in handling of the data allowed for preparation of the pharmacometric analysis of the primary and secondary endpoints while the trial was still ongoing. Based on interim graphical exploration of the data, modelling strategies were developed per endpoint including which functions to test. Model scripts were written, tested, and code reviewed before the data lock. Analysis of the primary endpoint had the highest priority. Because of the reproducible workflow and preparations for the pharmacometric analysis before the data lock, the primary endpoint analysis was completed and reviewed within three days after data lock and unblinding, and shared with the DSMB and the clinical team. Analysis of the secondary endpoints, including a total of 7 time-to-event analyses for COVID-19, RTI, and hospitalization due to all causes in both intention-to-treat and per-protocol datasets, as well as an exploratory time-to-SARS-CoV2 specific vaccination analysis, was completed and reviewed within two weeks after data lock and unblinding, and presented to the DSMB and the clinical team. As we focus here on the reproducible pharmacometrics workflow, the results of these analyses are out of scope and reported separately.
Go here to see the original:
Reproducibility in pharmacometrics applied in a phase III trial of ... - Nature.com
Read More..