Welcome to the DSI Newsletter
Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more.
Data Science Challenge Tackles ML-Assisted Heart Modeling
For the first time, students from the University of California’s Merced and Riverside campuses joined forces for the two-week Data Science Challenge at LLNL, tackling a real-world problem in machine learning (ML)-assisted heart modeling. Held during July 10–21, the event brought together 35 UC students—ranging from undergraduates to graduate-level students from a diversity of majors—to work in groups to solve four key tasks, using actual electrocardiogram data to predict heart health. According to organizers, the purpose of the challenge was to give students a taste of the broad scope of work...

Open Data Initiative Adds CFD Simulation Dataset
The DSI’s Open Data Initiative (ODI) recently added a new project to the catalog: Computational Fluid Dynamics Simulation Data of Spatial Deposition. In fluid mechanics problems, computational fluid dynamics (CFD) uses data structures and numerical analysis to investigate the flow of liquids and gases. For instance, CFD models can simulate atmospheric transport and dispersion, as in this dataset’s simulations of wind-driven pollutant dispersion and deposition. The data is then used to train machine learning models that, in turn, can predict spatial patterns with high accuracy.
This new...

The DSI Turns Five!
Since the DSI’s founding in 2018, the Lab has seen tremendous growth in its data science community and has invested heavily in related research. Five years later, the DSI has found its stride with a multipronged strategy of raising awareness about the field, encouraging partnerships across the community, supporting researchers, and nurturing the next generation of data scientists. A new article chronicles the DSI’s successes and influence both at the Lab and in the external data science community.
Michael Goldman, who directed the DSI until this April, says, “The accomplishments of DSI are...

Brian Giera Named New DSI Director
After five years as the DSI’s director, Michael Goldman is passing the baton to Brian Giera, a materials and manufacturing researcher in LLNL’s Engineering Directorate. “The DSI is a thriving organization, so I am excited for the impact we will have given all the positive momentum,” says Giera. Goldman adds, “Brian will lead the DSI into a promising new phase, given how tremendously the Lab’s workforce and capabilities have grown since we established the DSI in 2018.”
Giera joined LLNL in 2014 as a postdoctoral researcher and currently leads the Analytics for Advanced Manufacturing group in...

Register for Hybrid WiDS Livermore on March 8
The annual Women in Data Science (WiDS) conference returns on Wednesday, March 8. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event is free and will be presented in a hybrid format at the Livermore Valley Open Campus (LVOC) and via WebEx. Everyone is welcome to attend. Register by February 27.
Along with plenty of food and networking opportunities, WiDS Livermore will include a livestream of the Stanford conference where LLNL WiDS Ambassador Marisa Torres has been invited to speak to the global audience. Returning this year...

LLNL Achieves Fusion Ignition…with Help from Data Science
On December 13, the Department of Energy (DOE) and National Nuclear Security Administration (NNSA) announced the achievement of fusion ignition at LLNL—a major scientific breakthrough decades in the making that will pave the way for advancements in national defense and the future of clean power. In the early hours of December 5, a team at LLNL’s National Ignition Facility (NIF) conducted the first controlled fusion experiment in history to reach this milestone, also known as scientific energy breakeven, meaning it produced more energy from fusion than the laser energy used to drive it. This...

Award-Winning Papers
LLNL’s data science community continues to receive accolades for ground-breaking research and techniques. PDFs or full-text web pages are linked where available.
The 2022 IEEE VIS Test of Time Awards recognize papers that are “still vibrant and useful today and have had a major impact and influence within and beyond the visualization community” (read more at LLNL News). The conference is premier forum for advances in visualization and visual analytics.
- 25-year award (published in 1997): ROAMing Terrain: Real-Time Optimally Adapting Meshes – Mark Miller and collaborators
- 14-year award...

Leadership Changes with New Fiscal Year
Coinciding with LLNL’s new fiscal year (FY23) beginning on October 1, a few personnel changes took effect for the DSI and Data Science Summer Institute (DSSI). Dan Merl, who leads the Machine Intelligence Group in LLNL’s Center for Applied Scientific Computing, joined the DSI Council to advise on computing and data initiatives. Goran Konjevod, from LLNL’s Computational Engineering Division, moved from his DSSI directorship to the Council to further promote education and workforce initiatives. Statistician Amanda Muyskens joined Nisha Mulakken in co-directing the DSSI. (Read more about Muyskens...

Top AI Award at International Symbolic Regression Competition
An LLNL team claimed a top prize at an inaugural international symbolic regression competition for an artificial intelligence (AI) framework they developed capable of explaining and interpreting real-life COVID-19 data. Hosted by the open-source SRBench project at the 2022 Genetic and Evolutionary Computation Conference, the competition invited teams to submit their best symbolic regression algorithms. Organizers trained the models on datasets, assigned “trust ratings,” and evaluated them for accuracy and simplicity.
The team’s “Unified Deep Symbolic Regression” (uDSR) algorithm beat 12...

Open Data Initiative Adds Simulated Cardiac Signals Dataset
Building off LLNL’s Cardioid code, which simulates the electrophysiology of the human heart, a research team has conducted a computational study to generate a dataset of cardiac simulations at high spatiotemporal resolutions. The dataset—which is publicly available for further cardiac machine learning (ML) research via the DSI’s Open Data Initiative—was built using real cardiac bi-ventricular geometries and clinically inspired endocardial activation patterns under different physiological and pathophysiological conditions.
The dataset consists of pairs of computationally simulated...

Open Data Initiative Adds X-Ray CT Dataset for Additive Manufacturing
The DSI’s Open Data Initiative (ODI) recently added a new project to its catalog: X-Ray CT Data of Additively Manufactured Octet Lattice Structures. Computed Tomography (CT) is a common imaging modality used at LLNL for non-destructive evaluation in a wide range of applications. For example, CT imaging can highlight defects in additively manufactured (AM) structures, which aids in fine-tuning subsequent iterations of development. This new addition to the ODI catalog consists of seven datasets: simulations containing models of x-ray CT simulations showing AM lattice structures with common...

LLNL Wins PacificVis Best Paper Award
Three LLNL computer scientists and University of Utah colleagues have won the 2022 PacificVis Best Paper award. Harsh Bhatia, Peer-Timo Bremer, and Peter Lindstrom co-authored “AMM: Adaptive Multilinear Meshes” (see the PDF and GitHub repository). AMM provides users with a resolution-precision-adaptive representation technique that reduces mesh sizes, thereby reducing the memory and storage footprints of large scientific datasets. The approach combines two solutions into one—reducing data precision and adapting data resolution—to improve the performance and efficiency of data processing and...

Open Data Initiative Adds Neuroimaging Dataset
The DSI’s Open Data Initiative recently added a new project to its catalog: Derived Products from HCP-YA fMRI. The Human Connectome Project–Young Adult (HCP-YA) dataset includes multiple neuroimaging modalities from 1,200 healthy young adults. These modalities include functional magnetic resonance imaging (fMRI), which measures the blood oxygenation fluctuations that occur with brain activity.
The fMRI data were recorded in multiple sessions per subject: during rest and a set of tasks, designed to evoke specific brain activity. Each fMRI run is a sequence of 3D volumes, and processing these...

Livermore WiDS Provides Forum for Women in Data Science
LLNL celebrated the 2022 global Women in Data Science (WiDS) conference on March 7 with its fifth annual regional event, featuring workshops, mentoring sessions, and a discussion with LLNL Director Kim Budil, the first woman to hold that role. The all-day event attracted women data scientists and students inside and outside the Lab, who gathered to share coding tips and swap stories of their experiences in growing their careers. Attendees tuned in to view presentations by LLNL women data scientists, engage in breakout sessions, and view a livestream of the global WiDS conference hosted by...

Register for Virtual WiDS Livermore on March 7
The annual Women in Data Science (WiDS) conference returns on Monday, March 7, which is International Women’s Day. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event will be entirely virtual and free. Everyone is welcome to attend. Registration is open until February 27.
Sponsored by the DSI and LLNL’s Office of Strategic Diversity and Inclusion Programs, WiDS Livermore will include a livestream of the Stanford conference and networking opportunities. Returning this year is the popular “speed mentoring” session, where mentees...

Happy New Year from the DSI Council
The start of a new year is an exciting time because of the opportunity to appraise our data science community’s myriad accomplishments as well as preview upcoming projects and events. Like other areas of LLNL, the DSI has adapted to evolving pandemic restrictions and workplace policies to prioritize safety.
We were pleased to sponsor and contribute to multiple activities in modified or virtual formats: our monthly seminar series, the fourth annual Women in Data Science (WiDS) Livermore event, a new career panel series inspired by WiDS, the Machine Learning for Industry Forum (ML4I), and a...

Five Papers Accepted to NeurIPS 2021
The annual Conference on Neural Information Processing Systems (NeurIPS) returns December 6–14. LLNL work has been accepted at the prestigious machine learning conference in past years; in 2021 researchers have five accepted papers. Preprints are linked here.
- A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness – James Diffenderfer, Brian Bartoldson, Shreya Chaganti, Jize Zhang, and Bhavya Kailkhura
- Designing Counterfactual Generators using Deep Model Inversion – Jayaraman Thiagarajan and colleagues from Arizona State University, IBM Research, and Stanford...

Counterfactual Generators for Deep Models
LLNL’s research into machine learning (ML) interpretability continues with an investigation of counterfactual explanations—those that synthesize a hypothetical result based on small, interpretable changes to a given query image. Existing approaches rely extensively on pre-trained generative models or access to training data to create plausible counterfactuals that support users’ hypotheses.
LLNL’s Jayaraman Thiagarajan and colleagues from Arizona State University, Stanford University, and IBM Research have developed a technique called DISC—Deep Inversion for Synthesizing Counterfactuals...

4D Computed Tomography Reconstructions
Computed tomography (CT) is a type of x-ray imaging technology with a range of applications for clinical diagnosis, non-destructive evaluation in industry, baggage inspection, and cargo screening. CT scanners capture a sequence of angles around an object. Reconstruction algorithms then estimate the scene from these measured images. 2D and 3D CT imaging of static objects are well-studied problems with theoretical and practical algorithms. However, reconstruction of scene changes and measurements over time, known as dynamic 4D CT, can yield spatiotemporal ambiguities. (Image at left: 4D CT of...

New Bayesian ML Code Released
A new Bayesian machine learning (ML) code, MuyGPyS (pronounced my-jee-pies) has been developed as part of a Laboratory Directed Research and Development Strategic Initiative to address needs for native uncertainty quantification in ML predictions, learning with bounded training times, support for combining ML and model-based Bayesian inference frameworks, and extending the data sizes allowable in Gaussian process (GP) models.
The MuyGPyS code and algorithm offer best-in-class performance on community GP regression benchmarks, as well as image classification performance competitive with or...