Research and Applications

Word cloud of various terms relating to data science

LLNL applies data science to a variety of application domains including:

  • Laser and photon sciences
  • Large scale physics simulations
  • Large scale data mining for predictive medicine
  • Functional data analysis for uncertainty quantification
  • Expert finding in social media
  • Knowledge extraction from text
  • Drug and vaccine discovery using HPC simulations and physical experiments
  • Reinforcement learning with real-world simulations
  • Apache Spark machine learning tools
  • Video data summarization and classification
  • Energy efficiency analysis using HPC
  • Dynamic network structure inference
  • Automated video tracking algorithms
  • Deep learning applications
  • Classification and forward modeling of hyper-spectral data


Featured Research


Machine Learning for Semi-Automating Mesh Management | Machine Learning for Cancer | Large-Scale Self-Supervised Multimodal Deep Learning | Molecular Markers for Diagnostic and Countermeasure Design | Deep Neural Networks


MIng Jiang plus Jade Supercomputer plus visualization image

Machine Learning for Semi-Automating Mesh Management—Ming Jiang

Simulation workflows for Arbitrary Lagrangian-Eulerian (ALE) methods are highly complex and often require a manual tuning process that is a significant pain point for simulation users. Developing ALE workflows is often a trial-and-error process that can be disruptive and time consuming—a few hours of simulation can require many days of manual tuning. There is an urgent need to semi-automate this process to reduce user burden and improve productivity. To address this need, we are developing novel predictive analytics for simulations and an in situ infrastructure for integration of analytics. Our goal is to predict simulation failures ahead of time and proactively avoid them as much as possible.


Data Science Ana Paula Sierra Computer Cancer Cell montage

Machine Learning for Cancer—Ana Paula Sales

Why do some cancer patients respond well to a given treatment while others don’t? How can we leverage all available information about a cancer patient to predict his or her response to a treatment, and eventually, be able to suggest optimal courses of action? As larger amounts and types of data are collected about individuals (and for large groups of people), we have the opportunity to address these types of questions. In collaboration with the National Cancer Institute and other DOE laboratories, we are developing a statistical model to capture the relationships across the many different data types collected about cancer patients in the US, and using that to make outcome predictions. In collaboration with the Cancer Registry of Norway we focus on a different question: how can we combine the existing knowledge about different cancer types, like breast and lung cancers, to improve overall outcome prediction? This is particularly appealing for improving predictions in rare cancers, for which we have little data. Here we are developing transfer and multitask learning approaches both in the context of statistical models and machine learning.


Large Scale data sets collage

Large-Scale Self-Supervised Multimodal Deep Learning Research—Barry Chen

We live in a multimodal data rich world where images, video, and text abound, but finding interesting and relevant samples from unannotated data remains a challenge. For example, how could one quickly find all the instances of “toddlers learning to catch a ball” among millions of hours of untagged video clips? In this project, we are developing new deep learning algorithms that map images, video, and text into a joint semantic feature space where conceptually related items are proximal. In such a feature space, one can quickly find multimodal data that is conceptually related to a query. Our approach involves developing self-supervised unimodal (i.e., individual imagery, video, or text modality) feature learning algorithms that allow us to learn high-quality transferable representations from the virtually limitless supply of unlabeled data. We are also developing the multimodal learning algorithms that merge the unimodal representations into a shared multimodal semantic feature space. To make this vision a reality, we are developing new scalable training algorithms that take advantage of LLNL’s world-class supercomputers for rapidly training large neural networks on massive datasets.


Jonathan Allen computational biology

Molecular Markers for Diagnostic and Countermeasure Design—Jonathan Allen

The amount of experimental molecular measurements such as genomes and gene expression is beginning to grow at a rapid rate. Access to this data presents new opportunities to learn more of the molecular drivers of disease and develop improved options for treatment. We are exploring the use of both data driven predictive models from experimental data as well as more efficient genome search and retrieval algorithms to design a suite of tools for pathogen detection and countermeasure design. Examples of questions we are pursuing include finding unique genetic markers indicative of antibiotic resistance as well as modeling and predicting response to drug treatments for cancer. We use HPC as a tool to push the limits on the size of datasets used to identify important functional molecular features.


Brain Van Essen with computing resources

Deep Neural Networks—Brian Van Essen

Deep neural networks (DNNs) require massive models and even larger data sets. We are applying parallel programming techniques to leverage the unique characteristics of existing and upcoming HPC systems, namely: low-latency interconnect, node-local NVRAM, and GPUs. DNNs provide us with a new tool that we can apply to a range of applications in national security, analysis of scientific instruments, and scientific data sets. Furthermore, we are exploring the applicability of using advanced neuromorphic architectures to execute deep neural networks. Specifically, we are working with the IBM TrueNorth Neurosynaptic architecture, developing new applications for pattern recognition for imagery, applying machine learning to HPC simulations, and embedding optimization problems directly in the TrueNorth fabric.