Building Speed and Smarts Into Discovery and Innovation
The challenge: Derive insights from Big Data. Create a global digital platform that researchers will use to ingest, validate, store, organize, share and analyze vast quantities of today’s hugely heterogeneous phenotypic, clinical and genomic research data. The exponential growth in the size and variety of this data is well known. Data platform architectures must choose to support interoperability and scalability dictated by this growth. Meeting the specific requirements for this future digital platform and enabling this functionality for the next generation of data-driven discovery are priorities. This next-generation platform is central to support a mission of discovery of biomarkers, diagnostics, therapeutics, and cures. This platform must serve as a vehicle for collecting, collaborating, documenting, sharing, and processing data to quality standards to ensure repeatability of test results. It would most certainly leverage today’s Cloud Containerized applications, Kubernetes, and Data Lakes.
Big Data Ecosystems
As noted, central to the discovery mission is the collection of and analysis of research data. This data is mainly unstructured and will vary significantly by file format, raw and curated data type, data description, lexical semantics, and data size. Researchers will be required to save, store, and analyze this data at all computational pipeline stages during its data maturation. This retention and reuse will present several significant technical challenges, some of which are evolving.
This platform must also support multiple prospective dispersed longitudinal studies with cohort sizes in the tens of thousands.
Classical Machine Learning Techniques
Deep learning (DL), the mainstream of Machine Learning (ML), uses trainable computational models to learn data representation. DL methods are based on artificial neural network architectures widely used in medical imaging. The current network models for image processing include:
- Multilayer perceptron - The most vanilla neural network is a multilayer perceptron (MLP), which mimics the human brain.
- Convolutional neural network - A convolutional neural network (CNN) reflects the four key ideas: local connections, shared weights, pooling, and the integration of multiple layers.
- Fully convolutional network - A fully convolutional network (FCN) is composed of convolutional layers without fully connected layers.
- Generative adversarial network - A generative adversarial network (GAN) is a powerful way to model complicated data distributions. A GAN has a pair of two sub-networks: a generator and a discriminator.
- Recurrent neural network - A recurrent neural network (RNN) is a network where the output from a previous step is fed into the current step.
- Deep reinforcement learning - Deep reinforcement learning (DRL) combines DL (function approximation) with reinforcement learning (target optimization), which drives agents to learn the best actions