Volume 23

Nov. 21, 2022

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

graphic of a medal with the words “director’s science and technology excellence publication award” on a background of blue hexagons

Award-Winning Papers

LLNL’s data science community continues to receive accolades for ground-breaking research and techniques. PDFs or full-text web pages are linked where available.

The 2022 IEEE VIS Test of Time Awards recognize papers that are “still vibrant and useful today and have had a major impact and influence within and beyond the visualization community” (read more at LLNL News). The conference is premier forum for advances in visualization and visual analytics.

The 2022 Director’s S&T Excellence in Publication Awards honor outstanding scientific and technical publications by LLNL staff. These papers are noted as having an especially significant impact on the Lab’s missions and/or external research community.

Brian Spears holds the award with Brian Van Essen on his left and Timo Bremer on his right

HPCwire Award for Cognitive Simulation Application

The high-performance computing (HPC) publication HPCwire announced LLNL as the winner of its Editor’s Choice award for Best Use of HPC in Energy for applying cognitive simulation (CogSim) methods to inertial confinement fusion (ICF) research. The award was presented on November 14 at SC22, the largest supercomputing conference in the world, and recognizes the team for progress in their ML-based approach to modeling ICF experiments performed at the National Ignition Facility and elsewhere, which has led to the creation of faster and more accurate models of ICF implosions. Emerging at LLNL over the past several years, the CogSim technique uses the Lab’s cutting-edge HPC machines to combine deep neural networks with the massive databases of historical ICF experiments to calibrate the models. Applying CogSim to ICF research has resulted in faster, better-performing models that can predict experimental outcome with higher accuracy than simulations alone and with fewer experiments, according to researchers.

Members of the CogSim team include LLNL researchers Brian Spears, Timo Bremer, Luc Peterson, Kelli Humbird, Rushil Anirudh, Brian Van Essen, Shusen Liu, Jim Gaffney, Bogdan Kustowski, Gemma Anderson, Francisco Beltran, Michael Kruse, Sam Ade Jacobs, David Hysom, Jae-Sung Yeom, Peter Robinson, Jessica Semler, Ben Bay, Scott Brandon, Vic Castillo, David Domyancic, Richard Klein, John Field, Steve Langer, Joe Koning, Michael Kruse, Dave Munro, and Robert Hatarik.

screen shot from video showing Amanda speaking in front of a starry night sky background, overlaid with text of her name and job title

Video: Understanding the Universe with Applied Statistics

In a new video posted to the Lab’s YouTube channel, statistician Amanda Muyskens describes MuyGPs, her team’s innovative and computationally efficient Gaussian Process hyperparameter estimation method for large data. The method has been applied to space-based image classification and released for open-source use in the Python package MuyGPyS. MuyGPs will help astronomers and astrophysicists working with the massive amounts of data gathered from the Vera C. Rubin Observatory Legacy Survey of Space and Time (also known as LSST), as well as numerous other laboratory and science applications.

rainbow-colored heatmap overlaying the image of a collapsed building indicates where the model pays attention when identifying damaged buildings

Using Social Media Data to Inform Seismology

Researchers often mine crowdsourced data—such as images of damage posted after an earthquake—from social media platforms to better understand natural disasters and guide rescue efforts. In a new Scientific Reports paper, LLNL seismologist Qingkai Kong and UC Berkeley collaborators introduce a transfer learning method that detects damaged buildings in earthquake-aftermath images. The team manually labeled 6,500 images from social medial platforms and trained a deep learning model via transfer learning to recognize damaged buildings. They also visualized the features that are important for the model to make decisions. For example, the damaged building shown at left is highlighted with important features for the model’s decision making.

The team’s model achieved good performance when tested on newly acquired images of earthquakes at different locations, and when run in near real-time on a Twitter feed after the 2020 Aegean Sea earthquake (magnitude 7.0). A future goal is for users to upload images after earthquakes to the MyShake app and for the model to identify the images containing damaged buildings, helping to keep the regional community informed about damage location and severity. The method described in the paper could also be expanded to extract social media images after other types of disasters. “Machine learning models like this will provide us with more information about natural hazards so we can prepare for the next one,” states Kong.

bar graph showing 10 components on the x-axis and probability percentages on the y-axis; bars are colored in various shares of peach and red

Recent Research

left: grid with circles at intersections, red background, and a blue shape left of center; right: neural representation shown as a blue oval on a red background with an arrow pointing to a neural network

Improving Visualization of Large-Scale Datasets

Researchers are starting work on a three-year project aimed at improving methods for visual analysis of large heterogeneous datasets as part of a recent Department of Energy (DOE) funding opportunity. The joint project, titled “Neural Field Processing for Visual Analysis,” will be led at LLNL by co-PI Andrew Gillette, with colleagues from Vanderbilt University and the University of Arizona. The newly funded project will explore methods for processing implicit neural representations (INRs)—datasets that incorporate coordinate-based neural networks to represent scientific datasets efficiently and compactly. Currently, traditional processing algorithms and visual analysis techniques cannot be applied to INRs directly.

“It’s an honor to have been selected to carry out this research for the DOE,” Gillette said. “Fast and accurate visualization is essential for a wide variety of activities underway at DOE laboratories. My goal over the next three years is to partner closely with application domain specialists and demonstrate how advances in visualization methodologies can directly benefit scientific inquiry.”

DeepRacer car positioned on the blue track with a dashed yellow center line and white lines marking the outside of the lane

Hackathon Puts Machine Learning in the Driver’s Seat

After 10 years, the “try something new” spirit of LLNL’s seasonal hackathon is alive and well. The fall 2022 event featured an Amazon Web Services (AWS) DeepRacer machine learning competition, in which participants used a cloud-based racing simulator to train an autonomous race car with reinforcement learning algorithms. Sponsored by LLNL’s Office of the Chief Information Officer and the Computing Directorate, the hackathon provided a unique opportunity to combine cloud, data science, and computing technologies.

Working in teams or individually, drivers trained their cars for a time trial—the fastest car wins—and submitted their models ahead of race day. The AWS team set up the physical track in the parking lot of the Livermore Valley Open Campus, where drivers took turns running their models with the DeepRacer car. Data scientist Mary Silva recalled, “We were cheering for each other like at a sporting event. Everyone was kind of holding their breath to see who would win.” The world record run for the event’s specific track layout is 7 seconds. The Lab’s winning time was 8.873 seconds.

portrait of Alexander next to the seminar series icon

Virtual Seminar Explores Data Dimensionality

In the DSI’s November virtual seminar, Alexander Cloninger of UC San Diego presented “Networks that Adapt to Intrinsic Dimensionality Beyond the Domain.” His talk focused on central questions in deep learning: the minimum size of the network needed to approximate a certain class of functions, and how the dimensionality of the data affects the number of points needed to learn such a network. He discussed his work in the context of two-sample testing, manifold autoencoders, and data generation.

Cloninger is an associate professor in the Department of Mathematical Sciences and the Halıcıoğlu Data Science Institute at UC San Diego. He received his PhD in Applied Mathematics and Scientific Computation from the University of Maryland. He researches problems in the area of geometric data analysis and applied harmonic analysis. A recording of the seminar will be posted to the YouTube playlist. The next seminar, scheduled for December 1, will be the DSI’s first in a hybrid format.

DSSI logo

Student Internship Application Deadline

The Data Science Summer Institute (DSSI) application window is now open through January 31. The 2023 program will run for 12 weeks and is open to both undergraduate and graduate students. Visit the DSSI website for information about how to apply, including a list of FAQs—or share this link with students who may be interested in an internship.

Class of 2022 intern Jonathan Anzules said of the program, “New technological advances and the cheapening of data acquisition have vastly expanded what is possible in bioinformatics. Things like predicting protein folding and interactions, which I previously believed impossible, are not anymore. My experience at LLNL has changed what I think is possible.”