Welcome to the DSI Newsletter

Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more.
students and mentors strike casual poses in the UCLCC meeting room

Data Science Challenge Tackles ML-Assisted Heart Modeling

For the first time, students from the University of California’s Merced and Riverside campuses joined forces for the two-week Data Science Challenge at LLNL, tackling a real-world problem in machine learning (ML)-assisted heart modeling. Held during July 10–21, the event brought together 35 UC students—ranging from undergraduates to graduate-level students from a diversity of majors—to work in groups to solve four key tasks, using actual electrocardiogram data to predict heart health. According to organizers, the purpose of the challenge was to give students a taste of the broad scope of work...

Read Volume 29
rainbow-colored plume of pollution marked with wind speed and direction as well as colormap deposition pattern

Open Data Initiative Adds CFD Simulation Dataset

The DSI’s Open Data Initiative (ODI) recently added a new project to the catalog: Computational Fluid Dynamics Simulation Data of Spatial Deposition. In fluid mechanics problems, computational fluid dynamics (CFD) uses data structures and numerical analysis to investigate the flow of liquids and gases. For instance, CFD models can simulate atmospheric transport and dispersion, as in this dataset’s simulations of wind-driven pollutant dispersion and deposition. The data is then used to train machine learning models that, in turn, can predict spatial patterns with high accuracy.

This new...

Read Volume 28
fireworks exploding over a city at night

The DSI Turns Five!

Since the DSI’s founding in 2018, the Lab has seen tremendous growth in its data science community and has invested heavily in related research. Five years later, the DSI has found its stride with a multipronged strategy of raising awareness about the field, encouraging partnerships across the community, supporting researchers, and nurturing the next generation of data scientists. A new article chronicles the DSI’s successes and influence both at the Lab and in the external data science community.

Michael Goldman, who directed the DSI until this April, says, “The accomplishments of DSI are...

Read Volume 27
screen shot from a video showing Brian in his lab speaking to the camera

Brian Giera Named New DSI Director

After five years as the DSI’s director, Michael Goldman is passing the baton to Brian Giera, a materials and manufacturing researcher in LLNL’s Engineering Directorate. “The DSI is a thriving organization, so I am excited for the impact we will have given all the positive momentum,” says Giera. Goldman adds, “Brian will lead the DSI into a promising new phase, given how tremendously the Lab’s workforce and capabilities have grown since we established the DSI in 2018.”

Giera joined LLNL in 2014 as a postdoctoral researcher and currently leads the Analytics for Advanced Manufacturing group in...

Read Volume 26
WiDS Livermore logo of green and black silhouettes of women’s faces in profile

Register for Hybrid WiDS Livermore on March 8

The annual Women in Data Science (WiDS) conference returns on Wednesday, March 8. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event is free and will be presented in a hybrid format at the Livermore Valley Open Campus (LVOC) and via WebEx. Everyone is welcome to attend. Register by February 27.

Along with plenty of food and networking opportunities, WiDS Livermore will include a livestream of the Stanford conference where LLNL WiDS Ambassador Marisa Torres has been invited to speak to the global audience. Returning this year...

Read Volume 25
the word IGNITION on a black background with agency logos

LLNL Achieves Fusion Ignition…with Help from Data Science

On December 13, the Department of Energy (DOE) and National Nuclear Security Administration (NNSA) announced the achievement of fusion ignition at LLNL—a major scientific breakthrough decades in the making that will pave the way for advancements in national defense and the future of clean power. In the early hours of December 5, a team at LLNL’s National Ignition Facility (NIF) conducted the first controlled fusion experiment in history to reach this milestone, also known as scientific energy breakeven, meaning it produced more energy from fusion than the laser energy used to drive it. This...

Read Volume 24
graphic of a medal with the words “director’s science and technology excellence publication award” on a background of blue hexagons

Award-Winning Papers

LLNL’s data science community continues to receive accolades for ground-breaking research and techniques. PDFs or full-text web pages are linked where available.

The 2022 IEEE VIS Test of Time Awards recognize papers that are “still vibrant and useful today and have had a major impact and influence within and beyond the visualization community” (read more at LLNL News). The conference is premier forum for advances in visualization and visual analytics.

Read Volume 23
round portraits of eight people

Leadership Changes with New Fiscal Year

Coinciding with LLNL’s new fiscal year (FY23) beginning on October 1, a few personnel changes took effect for the DSI and Data Science Summer Institute (DSSI). Dan Merl, who leads the Machine Intelligence Group in LLNL’s Center for Applied Scientific Computing, joined the DSI Council to advise on computing and data initiatives. Goran Konjevod, from LLNL’s Computational Engineering Division, moved from his DSSI directorship to the Council to further promote education and workforce initiatives. Statistician Amanda Muyskens joined Nisha Mulakken in co-directing the DSSI. (Read more about Muyskens...

Read Volume 22
the letters DSO rendered as 3D blocks

Top AI Award at International Symbolic Regression Competition

An LLNL team claimed a top prize at an inaugural international symbolic regression competition for an artificial intelligence (AI) framework they developed capable of explaining and interpreting real-life COVID-19 data. Hosted by the open-source SRBench project at the 2022 Genetic and Evolutionary Computation Conference, the competition invited teams to submit their best symbolic regression algorithms. Organizers trained the models on datasets, assigned “trust ratings,” and evaluated them for accuracy and simplicity.

The team’s “Unified Deep Symbolic Regression” (uDSR) algorithm beat 12...

Read Volume 21
 high-resolution simulation of the electrical activation map in a human’s heart

Open Data Initiative Adds Simulated Cardiac Signals Dataset

Building off LLNL’s Cardioid code, which simulates the electrophysiology of the human heart, a research team has conducted a computational study to generate a dataset of cardiac simulations at high spatiotemporal resolutions. The dataset—which is publicly available for further cardiac machine learning (ML) research via the DSI’s Open Data Initiative—was built using real cardiac bi-ventricular geometries and clinically inspired endocardial activation patterns under different physiological and pathophysiological conditions.

The dataset consists of pairs of computationally simulated...

Read Volume 20
cube-shaped lattice structure with inset images showing close details as well as types of manually inserted defects

Open Data Initiative Adds X-Ray CT Dataset for Additive Manufacturing

The DSI’s Open Data Initiative (ODI) recently added a new project to its catalog: X-Ray CT Data of Additively Manufactured Octet Lattice Structures. Computed Tomography (CT) is a common imaging modality used at LLNL for non-destructive evaluation in a wide range of applications. For example, CT imaging can highlight defects in additively manufactured (AM) structures, which aids in fine-tuning subsequent iterations of development. This new addition to the ODI catalog consists of seven datasets: simulations containing models of x-ray CT simulations showing AM lattice structures with common...

Read Volume 19
three sponge-like simulated shapes resulting from data reduction, with the middle shape representing the original dataset and the left and right shapes for comparison

LLNL Wins PacificVis Best Paper Award

Three LLNL computer scientists and University of Utah colleagues have won the 2022 PacificVis Best Paper award. Harsh Bhatia, Peer-Timo Bremer, and Peter Lindstrom co-authored “AMM: Adaptive Multilinear Meshes” (see the PDF and GitHub repository). AMM provides users with a resolution-precision-adaptive representation technique that reduces mesh sizes, thereby reducing the memory and storage footprints of large scientific datasets. The approach combines two solutions into one—reducing data precision and adapting data resolution—to improve the performance and efficiency of data processing and...

Read Volume 18
multicolored grid showing hemispherical brain region activity with fMRI

Open Data Initiative Adds Neuroimaging Dataset

The DSI’s Open Data Initiative recently added a new project to its catalog: Derived Products from HCP-YA fMRI. The Human Connectome Project–Young Adult (HCP-YA) dataset includes multiple neuroimaging modalities from 1,200 healthy young adults. These modalities include functional magnetic resonance imaging (fMRI), which measures the blood oxygenation fluctuations that occur with brain activity.

The fMRI data were recorded in multiple sessions per subject: during rest and a set of tasks, designed to evoke specific brain activity. Each fMRI run is a sequence of 3D volumes, and processing these...

Read Volume 17
3x5 video chat screens on a green background

Livermore WiDS Provides Forum for Women in Data Science

LLNL celebrated the 2022 global Women in Data Science (WiDS) conference on March 7 with its fifth annual regional event, featuring workshops, mentoring sessions, and a discussion with LLNL Director Kim Budil, the first woman to hold that role. The all-day event attracted women data scientists and students inside and outside the Lab, who gathered to share coding tips and swap stories of their experiences in growing their careers. Attendees tuned in to view presentations by LLNL women data scientists, engage in breakout sessions, and view a livestream of the global WiDS conference hosted by...

Read Volume 16
WiDS Livermore logo of green ones and zeros overlaid on silhouettes of faces in profile

Register for Virtual WiDS Livermore on March 7

The annual Women in Data Science (WiDS) conference returns on Monday, March 7, which is International Women’s Day. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event will be entirely virtual and free. Everyone is welcome to attend. Registration is open until February 27.

Sponsored by the DSI and LLNL’s Office of Strategic Diversity and Inclusion Programs, WiDS Livermore will include a livestream of the Stanford conference and networking opportunities. Returning this year is the popular “speed mentoring” session, where mentees...

Read Volume 15
 3x2 grid of circular portraits of the 6 council members

Happy New Year from the DSI Council

The start of a new year is an exciting time because of the opportunity to appraise our data science community’s myriad accomplishments as well as preview upcoming projects and events. Like other areas of LLNL, the DSI has adapted to evolving pandemic restrictions and workplace policies to prioritize safety.

We were pleased to sponsor and contribute to multiple activities in modified or virtual formats: our monthly seminar series, the fourth annual Women in Data Science (WiDS) Livermore event, a new career panel series inspired by WiDS, the Machine Learning for Industry Forum (ML4I), and a...

Read Volume 14
semi-cyclical diagram showing training of a recurrent neural network leading to a sample, Gaussian processes, extraction, and combination of results

Five Papers Accepted to NeurIPS 2021

The annual Conference on Neural Information Processing Systems (NeurIPS) returns December 6–14. LLNL work has been accepted at the prestigious machine learning conference in past years; in 2021 researchers have five accepted papers. Preprints are linked here.

Read Volume 13
Four images of celebrities’ faces with differences between bald and not bald, smiling and not smiling, which show results of the DISC inversion method

Counterfactual Generators for Deep Models

LLNL’s research into machine learning (ML) interpretability continues with an investigation of counterfactual explanations—those that synthesize a hypothetical result based on small, interpretable changes to a given query image. Existing approaches rely extensively on pre-trained generative models or access to training data to create plausible counterfactuals that support users’ hypotheses.

LLNL’s Jayaraman Thiagarajan and colleagues from Arizona State University, Stanford University, and IBM Research have developed a technique called DISC—Deep Inversion for Synthesizing Counterfactuals...

Read Volume 12
three rows of thoracic CT images with slight variations

4D Computed Tomography Reconstructions

Computed tomography (CT) is a type of x-ray imaging technology with a range of applications for clinical diagnosis, non-destructive evaluation in industry, baggage inspection, and cargo screening. CT scanners capture a sequence of angles around an object. Reconstruction algorithms then estimate the scene from these measured images. 2D and 3D CT imaging of static objects are well-studied problems with theoretical and practical algorithms. However, reconstruction of scene changes and measurements over time, known as dynamic 4D CT, can yield spatiotemporal ambiguities. (Image at left: 4D CT of...

Read Volume 11
4x2 images showing point spread functions

New Bayesian ML Code Released

A new Bayesian machine learning (ML) code, MuyGPyS (pronounced my-jee-pies) has been developed as part of a Laboratory Directed Research and Development Strategic Initiative to address needs for native uncertainty quantification in ML predictions, learning with bounded training times, support for combining ML and model-based Bayesian inference frameworks, and extending the data sizes allowable in Gaussian process (GP) models.

The MuyGPyS code and algorithm offer best-in-class performance on community GP regression benchmarks, as well as image classification performance competitive with or...

Read Volume 10