Welcome to the DSI Newsletter

Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more.
WiDS Livermore logo of green and black silhouettes of women’s faces in profile

Register for Hybrid WiDS Livermore on March 8

The annual Women in Data Science (WiDS) conference returns on Wednesday, March 8. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event is free and will be presented in a hybrid format at the Livermore Valley Open Campus (LVOC) and via WebEx. Everyone is welcome to attend. Register by February 27.

Along with plenty of food and networking opportunities, WiDS Livermore will include a livestream of the Stanford conference where LLNL WiDS Ambassador Marisa Torres has been invited to speak to the global audience. Returning this year...

Read Volume 25
the word IGNITION on a black background with agency logos

LLNL Achieves Fusion Ignition…with Help from Data Science

On December 13, the Department of Energy (DOE) and National Nuclear Security Administration (NNSA) announced the achievement of fusion ignition at LLNL—a major scientific breakthrough decades in the making that will pave the way for advancements in national defense and the future of clean power. In the early hours of December 5, a team at LLNL’s National Ignition Facility (NIF) conducted the first controlled fusion experiment in history to reach this milestone, also known as scientific energy breakeven, meaning it produced more energy from fusion than the laser energy used to drive it. This...

Read Volume 24
graphic of a medal with the words “director’s science and technology excellence publication award” on a background of blue hexagons

Award-Winning Papers

LLNL’s data science community continues to receive accolades for ground-breaking research and techniques. PDFs or full-text web pages are linked where available.

The 2022 IEEE VIS Test of Time Awards recognize papers that are “still vibrant and useful today and have had a major impact and influence within and beyond the visualization community” (read more at LLNL News). The conference is premier forum for advances in visualization and visual analytics.

Read Volume 23
round portraits of eight people

Leadership Changes with New Fiscal Year

Coinciding with LLNL’s new fiscal year (FY23) beginning on October 1, a few personnel changes took effect for the DSI and Data Science Summer Institute (DSSI). Dan Merl, who leads the Machine Intelligence Group in LLNL’s Center for Applied Scientific Computing, joined the DSI Council to advise on computing and data initiatives. Goran Konjevod, from LLNL’s Computational Engineering Division, moved from his DSSI directorship to the Council to further promote education and workforce initiatives. Statistician Amanda Muyskens joined Nisha Mulakken in co-directing the DSSI. (Read more about Muyskens...

Read Volume 22
the letters DSO rendered as 3D blocks

Top AI Award at International Symbolic Regression Competition

An LLNL team claimed a top prize at an inaugural international symbolic regression competition for an artificial intelligence (AI) framework they developed capable of explaining and interpreting real-life COVID-19 data. Hosted by the open-source SRBench project at the 2022 Genetic and Evolutionary Computation Conference, the competition invited teams to submit their best symbolic regression algorithms. Organizers trained the models on datasets, assigned “trust ratings,” and evaluated them for accuracy and simplicity.

The team’s “Unified Deep Symbolic Regression” (uDSR) algorithm beat 12...

Read Volume 21
 high-resolution simulation of the electrical activation map in a human’s heart

Open Data Initiative Adds Simulated Cardiac Signals Dataset

Building off LLNL’s Cardioid code, which simulates the electrophysiology of the human heart, a research team has conducted a computational study to generate a dataset of cardiac simulations at high spatiotemporal resolutions. The dataset—which is publicly available for further cardiac machine learning (ML) research via the DSI’s Open Data Initiative—was built using real cardiac bi-ventricular geometries and clinically inspired endocardial activation patterns under different physiological and pathophysiological conditions.

The dataset consists of pairs of computationally simulated...

Read Volume 20
cube-shaped lattice structure with inset images showing close details as well as types of manually inserted defects

Open Data Initiative Adds X-Ray CT Dataset for Additive Manufacturing

The DSI’s Open Data Initiative (ODI) recently added a new project to its catalog: X-Ray CT Data of Additively Manufactured Octet Lattice Structures. Computed Tomography (CT) is a common imaging modality used at LLNL for non-destructive evaluation in a wide range of applications. For example, CT imaging can highlight defects in additively manufactured (AM) structures, which aids in fine-tuning subsequent iterations of development. This new addition to the ODI catalog consists of seven datasets: simulations containing models of x-ray CT simulations showing AM lattice structures with common...

Read Volume 19
three sponge-like simulated shapes resulting from data reduction, with the middle shape representing the original dataset and the left and right shapes for comparison

LLNL Wins PacificVis Best Paper Award

Three LLNL computer scientists and University of Utah colleagues have won the 2022 PacificVis Best Paper award. Harsh Bhatia, Peer-Timo Bremer, and Peter Lindstrom co-authored “AMM: Adaptive Multilinear Meshes” (see the PDF and GitHub repository). AMM provides users with a resolution-precision-adaptive representation technique that reduces mesh sizes, thereby reducing the memory and storage footprints of large scientific datasets. The approach combines two solutions into one—reducing data precision and adapting data resolution—to improve the performance and efficiency of data processing and...

Read Volume 18
multicolored grid showing hemispherical brain region activity with fMRI

Open Data Initiative Adds Neuroimaging Dataset

The DSI’s Open Data Initiative recently added a new project to its catalog: Derived Products from HCP-YA fMRI. The Human Connectome Project–Young Adult (HCP-YA) dataset includes multiple neuroimaging modalities from 1,200 healthy young adults. These modalities include functional magnetic resonance imaging (fMRI), which measures the blood oxygenation fluctuations that occur with brain activity.

The fMRI data were recorded in multiple sessions per subject: during rest and a set of tasks, designed to evoke specific brain activity. Each fMRI run is a sequence of 3D volumes, and processing these...

Read Volume 17
3x5 video chat screens on a green background

Livermore WiDS Provides Forum for Women in Data Science

LLNL celebrated the 2022 global Women in Data Science (WiDS) conference on March 7 with its fifth annual regional event, featuring workshops, mentoring sessions, and a discussion with LLNL Director Kim Budil, the first woman to hold that role. The all-day event attracted women data scientists and students inside and outside the Lab, who gathered to share coding tips and swap stories of their experiences in growing their careers. Attendees tuned in to view presentations by LLNL women data scientists, engage in breakout sessions, and view a livestream of the global WiDS conference hosted by...

Read Volume 16
WiDS Livermore logo of green ones and zeros overlaid on silhouettes of faces in profile

Register for Virtual WiDS Livermore on March 7

The annual Women in Data Science (WiDS) conference returns on Monday, March 7, which is International Women’s Day. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event will be entirely virtual and free. Everyone is welcome to attend. Registration is open until February 27.

Sponsored by the DSI and LLNL’s Office of Strategic Diversity and Inclusion Programs, WiDS Livermore will include a livestream of the Stanford conference and networking opportunities. Returning this year is the popular “speed mentoring” session, where mentees...

Read Volume 15
 3x2 grid of circular portraits of the 6 council members

Happy New Year from the DSI Council

The start of a new year is an exciting time because of the opportunity to appraise our data science community’s myriad accomplishments as well as preview upcoming projects and events. Like other areas of LLNL, the DSI has adapted to evolving pandemic restrictions and workplace policies to prioritize safety.

We were pleased to sponsor and contribute to multiple activities in modified or virtual formats: our monthly seminar series, the fourth annual Women in Data Science (WiDS) Livermore event, a new career panel series inspired by WiDS, the Machine Learning for Industry Forum (ML4I), and a...

Read Volume 14
semi-cyclical diagram showing training of a recurrent neural network leading to a sample, Gaussian processes, extraction, and combination of results

Five Papers Accepted to NeurIPS 2021

The annual Conference on Neural Information Processing Systems (NeurIPS) returns December 6–14. LLNL work has been accepted at the prestigious machine learning conference in past years; in 2021 researchers have five accepted papers. Preprints are linked here.

Read Volume 13
Four images of celebrities’ faces with differences between bald and not bald, smiling and not smiling, which show results of the DISC inversion method

Counterfactual Generators for Deep Models

LLNL’s research into machine learning (ML) interpretability continues with an investigation of counterfactual explanations—those that synthesize a hypothetical result based on small, interpretable changes to a given query image. Existing approaches rely extensively on pre-trained generative models or access to training data to create plausible counterfactuals that support users’ hypotheses.

LLNL’s Jayaraman Thiagarajan and colleagues from Arizona State University, Stanford University, and IBM Research have developed a technique called DISC—Deep Inversion for Synthesizing Counterfactuals...

Read Volume 12
three rows of thoracic CT images with slight variations

4D Computed Tomography Reconstructions

Computed tomography (CT) is a type of x-ray imaging technology with a range of applications for clinical diagnosis, non-destructive evaluation in industry, baggage inspection, and cargo screening. CT scanners capture a sequence of angles around an object. Reconstruction algorithms then estimate the scene from these measured images. 2D and 3D CT imaging of static objects are well-studied problems with theoretical and practical algorithms. However, reconstruction of scene changes and measurements over time, known as dynamic 4D CT, can yield spatiotemporal ambiguities. (Image at left: 4D CT of...

Read Volume 11
4x2 images showing point spread functions

New Bayesian ML Code Released

A new Bayesian machine learning (ML) code, MuyGPyS (pronounced my-jee-pies) has been developed as part of a Laboratory Directed Research and Development Strategic Initiative to address needs for native uncertainty quantification in ML predictions, learning with bounded training times, support for combining ML and model-based Bayesian inference frameworks, and extending the data sizes allowable in Gaussian process (GP) models.

The MuyGPyS code and algorithm offer best-in-class performance on community GP regression benchmarks, as well as image classification performance competitive with or...

Read Volume 10
diagram of workflow including synthesis, analysis, classification, and performance property

Deep Learning for Materials Discovery

Deep Learning (DL) models are proving useful for a number of materials science applications including materials discovery, microstructure analysis, and property predictions. In a recent paper in ACS Omega, LLNL researchers propose a unified framework that leverages the predictive uncertainty from deep neural networks to answer challenging questions materials scientists usually encounter in machine learning (ML)–based material application workflows.

Specifically, the team demonstrates that predictive uncertainty from uncertainty-aware DL approaches (particularly deep ensembles) can be used...

Read Volume 9
abstract art showing molecules and crystalline density

Research in Feedstock Optimization

A long-held goal by chemists across many industries, including energy, pharmaceutics, energetics, food additives, and organic semiconductors, is to imagine the chemical structure of a new molecule and predict how it will function for a desired application. In practice, this vision is difficult to realize, often requiring extensive laboratory works to be able to synthesize, isolate, purify, and characterize newly designed molecules to obtain the desired information.

A team of LLNL materials and computer scientists have brought this vision to fruition for energetic molecules by creating...

Read Volume 8
diagram of pruned neural network

New Research in Machine Learning Robustness

LLNL postdoctoral researcher James Diffenderfer and computer scientist Bhavya Kailkhura are co-authors on a paper that offers a novel and unconventional way to train deep neural networks (DNNs). The LLNL team shows both empirically and theoretically that it is possible to learn highly accurate NNs simply by compressing (i.e., pruning and binarizing) randomized NNs without ever updating the weights. This is in sharp contrast to prevailing weight-training paradigm—i.e., iteratively learning the values of the weights by stochastic gradient descent. In this process, Diffenderfer and Kailkhura...

Read Volume 7
Example of sampling an expression from the team’s recurrent neural network, which is used to emit a distribution over tractable mathematical expressions.

Spotlight: New Research Ranked Among Top AI Papers

Symbolic regression is the ML task of discovering tractable mathematical expressions to fit a dataset, yet the AI community has not fully explored deep learning approaches that explore this challenging space. In a paper accepted as an Oral Presentation at the upcoming International Conference on Learning Representations (ICLR), an LLNL research team proposes a framework that leverages deep reinforcement learning for symbolic regression via a simple idea—use a large model (neural network) to search the space of small models (expressions). With an Oral acceptance rate of only 1.5%, the team’s...

Read Volume 6