Tackling the reproducibility and driving machine learning with digitisation – Scientific Computing World

Dr Birthe Nielsen discusses the role of the Methods Database in supporting life sciences research by digitising methods data across different life science functions.

Reproducibility of experiment findings and data interoperability are two of the major barriers facing life sciences R&D today. Independently verifying findings by re-creating experiments and generating the same results is fundamental to progressing research to the next stage in its lifecycle, be it advancing a drug to clinical development, or a product to market. Yet, in the field of biology alone, one study found that 70 per cent of researchers are unable to reproduce the findings of other scientists, and 60 per cent are unable to reproduce their own findings.

This causes delays to the R&D process throughout the life sciences ecosystem. For example, biopharmaceutical companies often use an external Contract Research Organisation (CROs) to conduct clinical studies. Without a centralised repository to provide consistent access, analytical methods are often shared with CROs via email or even by physical documents, and not in a standard format but using an inconsistent terminology. This leads to unnecessary variability and several versions of the same analytical protocol. This makes it very challenging for a CRO to re-establish and revalidate methods without a labour-intensive process that is open to human interpretation and thus error.

To tackle issues like this, the Pistoia Alliance launched the Methods Hub project. The project aims to overcome the issue of reproducibility by digitising methods data across different life science functions, and ensuring data is FAIR (Findable, Accessible, Interoperable, Reusable) from the point of creation. This will enable seamless and secure sharing within the R&D ecosystem, reduce experiment duplication, standardise formatting to make data machine-readable, and increase reproducibility and efficiency. Robust data management is also the building block for machine learning and is the stepping-stone to realising the benefits of AI.

Digitisation of paper-based processes increases the efficiency and quality of methods data management. But it goes beyond manually keying in method parameters on a computer or using an Electronic Lab Notebook; A digital and automated workflow increases efficiency, instrument usages and productivity. Applying a shared data standards ensures consistency and interoperability in addition to fast and secure transfer of information between stakeholders.

One area that organisations need to address to comply with FAIR principles, and a key area in which the Methods Hub project helps, is how analytical methods are shared. This includes replacing free-text data capture with a common data model and standardised ontologies. For example, in a High-Performance Liquid Chromatography (HPLC) experiment, rather than manually typing out the analytical parameters (pump flow, injection volume, column temperature etc. etc.), the scientist will simply download a method which will automatically populate the execution parameters in any given Chromatographic Data System (CSD). This not only saves time during data entry, but the common format eliminates room for human interpretation or error.

Additionally, creating a centralised repository like the Methods Hub in a vendor-neutral format is a step towards greater cyber-resiliency in the industry. When information is stored locally on a PC or an ELN and is not backed up, a single cyberattack can wipe it out instantly. Creating shared spaces for these notes via the cloud protects data and ensures it can be easily restored.

A proof of concept (PoC) via the Methods Hub project was recently successfully completed to demonstrate the value of methods digitisation. The PoC involved the digital transfer via cloud of analytical HPLC methods, proving it is possible to move analytical methods securely between two different companies and CDS vendors with ease. It has been successfully tested in labs at Merck and GSK, where there has been an effective transfer of HPLC-UV information between different systems. The PoC delivered a series of critical improvements to methods transfer that eliminated the manual keying of data, reduces risk, steps, and error, while increasing overall flexibility and interoperability.

The Alliance project team is now working to extend the platforms functionality to connect analytical methods with results data, which would be an industry first. The team will also be adding support for columns and additional hardware and other analytical techniques, such as mass spectrometry and nuclear magnetic resonance spectroscopy (NMR). It also plans to identify new use cases, and further develop the cloud platform that enables secure methods transfer.

If industry-wide data standards and approaches to data management are to be agreed on and implemented successfully, organisations must collaborate. The Alliance recognises methods data management is a big challenge for the industry, and the aim is to make Methods Hub an integral part of the system infrastructure in every analytical lab.

Tackling issues such as digitisation of methods data doesnt just benefit individual companies but will have a knock-on effect for the whole life sciences industry. Introducing shared standards accelerates R&D, improves quality, and reduces the cost and time burden on scientists and organisations. Ultimately this ensures that new therapies and breakthroughs reach patients sooner. We are keen to welcome new contributors to the project, so we can continue discussing common barriers to successful data management, and work together to develop new solutions.

Dr Birthe Nielsen is the Pistoia Alliance Methods Database project manager

Here is the original post:
Tackling the reproducibility and driving machine learning with digitisation - Scientific Computing World

Related Posts

Comments are closed.