Research Spotlight: Reliable and Interpretable Machine Learning for Science

“New opportunities are emerging in data science as machine learning evolves alongside algorithmic innovations and hardware advances. But these developments also present critical problems that require research from the ground-up.”

– Jayaraman Thiagarajan

As supercomputers’ data collection and analysis capabilities outpace human ability, machine learning (ML) has emerged as an important tool for sifting through mountains of data as fast as a billion operations per second, and identifying patterns and connections that would take an army of humans decades if not centuries to collect, process, and analyze. But ML is not a perfect science—not yet, at least.

Currently, ML is in an exploratory phase where computer scientists build models, cross their fingers, and hope for the best, but they can’t always discern why or how ML produces its results. Simply put: ML’s capacity to learn and make predictions is only as good as the models humans develop.

With that caveat, how can users of these ML-produced predictions trust the information if the model does not reliably apply to real-world variations or if there isn’t an “interpretable” explanation for how the model arrived at its results? The question is analogous to trying to figure out how a car’s engine runs without looking under the hood. Interpretability in ML means that humans can “pop the hood” on a model and trace, identify, and understand how and why it yields its results. With reliable and interpretable models, we can trust that a self-driving car won’t run over a pedestrian or that an ML-based diagnostic screen will consistently identify a melanoma before it’s too late.

The growing use of deep learning models—algorithms designed to identify, cluster, and classify patterns—in critical applications such as healthcare, autonomous driving, and scientific discovery emphasizes the imperative that ML models be reliable, interpretable, and accurate. To do this, computer scientists at the DSI are pioneering ground-breaking ML techniques and systems that adhere to the rigors of the scientific method.

Reliability as a Design Objective

The success of ML algorithms hinges on whether the environment in which the model is deployed is similar to the conditions under which it was trained. For a sustainable acceptance of ML in scientific applications like those relevant to LLNL’s mission space, ML models need to be enhanced with rigorous guarantees. Ensuring reliable models in unknown test environments has become a crucial design criterion. LLNL researchers are developing ML systems that are provably robust and generalizable to real-world shifts, such as noise, unknown transformations, or adversarial corruptions, as well as novel evaluation mechanisms based on uncertainty quantification (UQ) that can characterize the reliability of predictive models.

In a recent study, a DSI team led by Jayaraman Thiagarajan proposed a novel approach to improve the reliability and generalization of classifier and regression models. Their approach uses a new technique called “Learn-by-Calibrating,” which consistently produces ML models that are resilient to outlying or anomalous data, reliable even with small amounts of labeled data, and robust to noisy data. The team found that this approach produces highly effective models ranging from surrogate modeling in inertial confinement fusion experiments to predicting skin lesion types from dermoscopy images.

To improve a trained model’s confidence or uncertainty, the team also introduced the concept of a “reliability plot,” which includes experts in the inference loop to reveal the trade-off between a model’s autonomy and its accuracy. By allowing a model to defer from making predictions when its confidence is low, the approach enables a holistic evaluation of how reliable the model is. Using several empirical studies in science and healthcare, ML researchers at LLNL have demonstrated the value of this new class of learning and evaluation methodologies.

Hypothesis-Driven Model Analysis

While artificial intelligence (AI) methods have scaled new heights in automating highly complex inferencing tasks, an important development is enabling users to gain insights into the trained model and the data itself. This analysis requires the design of interpretability tools that allow hypothesis-driven inquiry of trained models.

In this spirit, the Livermore team recently developed an introspection approach for healthcare problems, where the user inputs a hypothesis about the patient (such as the onset of a certain disease or progression of a symptom) and the model returns counterfactual evidence that maximally agrees with the hypothesis. Using this “what if” analysis, they were able to identify complex relationships between disparate disease conditions or stratified sub-populations of patients and gain new insights into strengths and weaknesses of the model that would not otherwise be apparent.

Interpretability and introspection techniques not only enable diagnosis of ML models, but they could also provide an entirely new way to develop models for healthcare applications, enabling physicians to form new hypotheses about diseases and aiding policymakers in decision-making that affects public health. Thiagarajan and his team recently applied these ML methods to study chest x-ray images of patients diagnosed with COVID-19, arising due to the novel SARS-CoV-2 coronavirus, to understand the role of factors such as demography, smoking habits, and medical interventions. The team is currently working with clinicians to study the effect of different interventions on the trajectory and outcome of COVID-19 patients.

Accurately Emulating Complex Scientific Processes

As scientists adopt AI methods for their workflows, researchers at DSI are identifying best practices for the most effective use of these techniques. This effort includes the investigation of suitable representation learning approaches that preserve the underlying physical laws, surrogate model design, UQ, and design optimization.

In a recent publication at the National Academy of Sciences (PNAS), DSI researchers developed a deep learning-driven Manifold & Cyclically Consistent (MaCC) surrogate model incorporating a multi-modal neural network capable of quickly and accurately emulating complex scientific processes, including the high-energy-density physics involved in inertial confinement fusion (ICF).

The research team applied the model to ICF implosions performed at LLNL’s National Ignition Facility, in which a computationally expensive numerical simulator predicted the energy yield of a target imploded by shock waves produced by the facility’s high-energy laser. Comparing the results of predictions made by the surrogate model to the simulator typically used for ICF experiments, the researchers found the MaCC surrogate was nearly indistinguishable from the simulator in errors and expected quantities of energy yield and more accurate than other types of surrogate models.

With an ever-emerging suite of cutting-edge ML tools and strategies, computer scientists at the DSI have set the stage for a new generation of interpretable, reliable, and accurate ML models. As passionate leaders in scientific exploration and discovery, they strive to understand how this critical technology can be expanded and applied to high-impact, real-world solutions.

Team Acknowledgments

Thiagarajan's team includes researchers Rushil Anirudh and Bhavya Kailkhura.

Research Archive

Learn More

Reliability as a Design Objective

Hypothesis-Driven Model Analysis

Accurately Emulating Complex Scientific Processes

Team Acknowledgments