Creating Systems for Data-Driven Discoveries

Researcher, Enterprise Architecture, and Chief Technology Officer

img-2

Building Speed and Smarts Into Discovery and Innovation

The challenge: Derive insights from Big Data. Create a global digital platform that researchers will use to ingest, validate, store, organize, share and analyze vast quantities of today’s hugely heterogeneous phenotypic, clinical and genomic research data. The exponential growth in the size and variety of this data is well known. Data platform architectures must choose to support interoperability and scalability dictated by this growth. Meeting the specific requirements for this future digital platform and enabling this functionality for the next generation of data-driven discovery are priorities. This next-generation platform is central to support a mission of discovery of biomarkers, diagnostics, therapeutics, and cures. This platform must serve as a vehicle for collecting, collaborating, documenting, sharing, and processing data to quality standards to ensure repeatability of test results. It would most certainly leverage today’s Cloud Containerized applications, Kubernetes, and Data Lakes.

ADVANCED STRATEGIC TECHNOLOGY ®

A “platform” is any combination and configuration of servers, networks, high-performance data centers, tightly coupled with multivendor commercial public and private clouds supporting data commons. This next-generation platform would be globally deployed anywhere and everywhere. Global reach and cloud-centric data would logically appear as a seamless and transparent address and namespace supporting multiple scientific missions and multiple disease research.

The crushing disappointment from spending many years and billions of dollars for specific disease cures (Alzheimer’s) has led to louder and louder calls for more dense data solutions and less rigid platform designs. As a result, this platform must store clinical research data in combination with other biomedical research sources and clinical data and compute across these different data types. Common Data Models, Common Data Elements, and Ontologies will increase data reuse.

Big Data Ecosystems

As noted, central to the discovery mission is the collection of and analysis of research data. This data is mainly unstructured and will vary significantly by file format, raw and curated data type, data description, lexical semantics, and data size. Researchers will be required to save, store, and analyze this data at all computational pipeline stages during its data maturation. This retention and reuse will present several significant technical challenges, some of which are evolving.

This platform must also support multiple prospective dispersed longitudinal studies with cohort sizes in the tens of thousands.

Classical Machine Learning Techniques

Deep learning (DL), the mainstream of Machine Learning (ML), uses trainable computational models to learn data representation. DL methods are based on artificial neural network architectures widely used in medical imaging. The current network models for image processing include:

  • Multilayer perceptron - The most vanilla neural network is a multilayer perceptron (MLP), which mimics the human brain.
  • Convolutional neural network - A convolutional neural network (CNN) reflects the four key ideas: local connections, shared weights, pooling, and the integration of multiple layers.
  • Fully convolutional network - A fully convolutional network (FCN) is composed of convolutional layers without fully connected layers.
  • Generative adversarial network - A generative adversarial network (GAN) is a powerful way to model complicated data distributions. A GAN has a pair of two sub-networks: a generator and a discriminator.
  • Recurrent neural network - A recurrent neural network (RNN) is a network where the output from a previous step is fed into the current step.
  • Deep reinforcement learning - Deep reinforcement learning (DRL) combines DL (function approximation) with reinforcement learning (target optimization), which drives agents to learn the best actions