Volume 26

May 2, 2023

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

screen shot from a video showing Brian in his lab speaking to the camera

Brian Giera Named New DSI Director

After five years as the DSI’s director, Michael Goldman is passing the baton to Brian Giera, a materials and manufacturing researcher in LLNL’s Engineering Directorate. “The DSI is a thriving organization, so I am excited for the impact we will have given all the positive momentum,” says Giera. Goldman adds, “Brian will lead the DSI into a promising new phase, given how tremendously the Lab’s workforce and capabilities have grown since we established the DSI in 2018.”

Giera joined LLNL in 2014 as a postdoctoral researcher and currently leads the Analytics for Advanced Manufacturing group in the Lab’s Materials Engineering Division. He has worked on a variety of machine learning (ML) projects such as optimizing the production of carbon capture technology, quality detection of metal additive manufacturing, and speeding up the time to deployment in the advanced manufacturing development cycle. A recipient of multiple LLNL Diversity & Inclusion Director’s Awards, Giera holds a PhD in Chemical Engineering from UC Santa Barbara.

As director, Giera will oversee the DSI’s efforts to strengthen the Lab’s data science workforce pipeline, research directions, and community outreach. He notes, “What sets data science at the Lab apart from everywhere else is the scale, high consequence of error, and national/global relevance of the problems we are tackling. We have rich, sometimes sparse, but always precious datasets originating from a broad range of national security challenges.” Read more about his background and thoughts on data science at LLNL.

Nisha and Dona converse in front of a fireplace video

WiDS Livermore Celebrates Achievements and Supports Women

Marking International Women’s Day on March 8, LLNL women data scientists, Lab employees, and other attendees interested in the field gathered for the annual Livermore Women in Data Science (WiDS) regional event hosted in conjunction with the global WiDS conference. Held for the sixth year, but for the first time in a hybrid format, WiDS Livermore provided a space for women to share their career paths and experiences, hear advice on navigating a male-dominated field, learn keys to achieving work-life balance, discuss the importance of mentorship, and demonstrate to women in the data science field that they’re not alone. Videos of the speakers and other materials are posted to the WiDS Livermore web page.

Attendees met online and in-person for the forum, highlighting women in computing and the data sciences, to network, listen to technical talks by Lab scientists and others, engage in a fireside chat (pictured at left) with former LLNL Computation Associate Director Dona Crawford, and watch as LLNL Bioinformatics Group Leader Marisa Torres spoke about her work applying ML tools to drug discovery live (video begins at 3:05:11) from the WiDS worldwide conference at Stanford University.

Jacqueline Alvarez, a PhD student who participated in the 2021 LLNL/UC Merced Data Science Challenge and will return this summer to intern at the Lab, said she enjoyed and listening to women speak about their journeys. “We all share this common experience of being one of the only females in the room. It’s reassuring that if they can do it, I could do it and I shouldn’t be doubting myself. I could keep going and probably be in the same positions as they are in the future,” she said.

cutaway of a 3D red ovoid shape with simulated movement inside

Codes and Simulations Behind Ignition

Harkening back to the genesis of LLNL’s inertial confinement fusion (ICF) program, codes have played an essential role in simulating the complex physical processes that take place in an ICF target and the facets of each experiment that must be nearly perfect. Many of these processes are too complicated, expensive, or even impossible to predict through experiments alone. With only a few National Ignition Facility laser shots per year to test target and experimental designs, computer modeling provides designers with valuable insights into which ideas are more likely to work. With the road to ignition beset with multiple hurdles, a host of codes were brought to bear on the problem. Codes are used to help design the experiment, to model the target capsule and iterate on designs, to answer questions from experimentalists, and to understand what occurred post-experiment. The knowledge gained from each shot goes back into the codes to improve them.

When the El Capitan supercomputer comes online, researchers will be able to run high-fidelity 3D ensemble simulations to answer multiple scientific questions at once and perform unprecedented uncertainty quantification and ML studies. The capability opens the door to ML-backed design optimization, giving researchers an expanded design space exploration tool to create more robust targets. For the first time, scientists could also create 3D ML surrogate models, trained on thousands of ICF simulations, to perform “inverse design” of target capsules, where artificial intelligence (AI) techniques are used to back-engineer optimal target initial conditions and drives based on the desired yield output.

screen shot of a simulated cylinder cutaway

Nondestructive Evaluation Meets Data Science Techniques

Led by Joe Tringe, LLNL’s Nondestructive Evaluation (NDE) group has an array of techniques at its disposal for inspecting objects’ interiors without disturbing them: computed tomography, optical laser interferometry, and ultrasound, which can be used alone or in combination to gauge whether a component’s physical and material properties fall within allowed tolerances. In one project, the team of NDE specialists and university collaborators tackled integration of data from multiple modalities into composite images, a task more complex than merely overlaying multiple images. The research group has also transformed multimodal data into true-to-life digital twins, ultra-precise computed replicas of a component’s structural and material properties. Using digital twins, scientists can simulate properties including heat transfer and physical deformation to ascertain a component’s performance and longevity.

In a project with the University of Utah, the team developed a library of algorithms to combine data from various imaging methods and visualize 3D component reconstructions. Using the numerical technique of forward projection to estimate unseen features, preliminary reconstructions were iteratively refined and rebuilt through an ML model that adjusted geometry and reduced modality-specific artifacts. The team first tested their method on exemplars to further tune the computing process before applying it to additively manufactured components. The resulting model is more than the sum of its parts, reflecting the most confident readings obtained from each modality individually. They then transformed statistics into a comprehensible, user-friendly form using the open-source OpenViSUS software to organize, analyze, and visualize massive datasets (image at right). Read more about these NDE projects in Science & Technology Review.

rendering of red and gray molecules connecting

Machine Learning to Understand Hydrogen Production

Through ML, an LLNL scientist has a better grasp of understanding materials used to produce hydrogen fuel. The interaction of water with TiO2 (titanium oxide) surfaces is especially important in various scientific fields and applications, from photocatalysis for hydrogen production to photooxidation of organic pollutants to self-cleaning surfaces and biomedical devices. However, the surface chemistry of TiO2 in contact with water has been the subject of several debates among scientists. The challenge is: If one really wants to understand how water is converted to H2 and O2 by TiO2, one must first understand the chemistry of the region where this reaction takes place.

In a paper appearing in the Proceedings of the National Academy of Sciences, LLNL researchers used molecular simulations that accurately reproduce the ab initio results for water interacting with a prototypical TiO2(110) surface. The results have implications for improved production of hydrogen fuel. “Our fundamental understanding of the TiO2 interface with water will help us to find more efficient and affordable materials to produce hydrogen in a clean and renewable fashion,” said Marcos Calegari Andrade, materials scientist in the Quantum Simulations Group and co-author of the paper. The work highlights how powerful ML is to study chemical reactions at solid-liquid interfaces (such as the TiO2-water interface). ML allows for long and large-scale atomistic simulations with the accuracy of quantum mechanics but at orders of magnitude lower computational costs.

three panels showing progressive granularity of protein simulation

Efficient Simulations of Protein Interactions Linked to Cancer

LLNL scientists have developed a theoretical model for more efficient molecular-level simulations of cell membranes and their lipid-protein interactions. Developed under an ongoing collaboration by the Department of Energy (DOE) and the National Cancer Institute (NCI) aimed at modeling cell membrane interactions with RAS—a protein whose mutations are tied to about 30% of human cancers—the new model addresses a problem in simulating RAS behavior, where conventional methods come up short of reaching the time- and length-scales needed to observe biological processes of RAS-related cancers. The work appears in the latest issue of the journal Physical Review Research.

The new model, based on dynamic density functional theory, enables simulations that can access micron-level length-scales and timescales on the order of seconds, while maintaining resolution close to the current gold standard of molecular dynamics models. Development of the framework was part of the NCI/DOE Joint Design of Advanced Computing Solutions for Cancer Pilot 2 project focused on developing a greater understanding of RAS-RAF-driven cancer initiation and growth by combining ML, high-performance computing, and state-of-the-art experimental capabilities. The follow-on project, ADMIRRAL (AI-Driven Multiscale Investigation of the RAS/RAF Activation Lifecycle), is extending the capability to model RAS biology developed under Pilot 2 to explore a much longer timescale and to address signal-activation pathways.

a stack of printed documents

Understanding Long Documents with DRC (Detect, Retrieve, Comprehend)

A multi-laboratory team is developing prototype AI tools and system architectures to extract and interpret key phrases and statements in long technical documents. The project uses a three-step novel deep learning natural language processing (NLP) framework that relies on text detection, context retrieval, and statement-context comprehension using an encoded guide. The resulting recommendations can be interpreted through syntactic highlighting that is mapped back to the document’s original text. LLNL’s role in the project focuses on hybrid ML models that combine NLP deep learning models with an emulated human expert’s mental process of document reviewing strategies. The team received a Department of Energy Secretary’s Honor Award in 2022, and presented their work at the AAAI 2023 Workshop on Scientific Document Understanding in February. Future work includes automatic encoding of guides, and the incorporation of large language models.

portraits of Amanda, Philip, and Katiana

Seminar Roundup

Dr. Amanda Randles from Duke University spoke at the DSI’s March 14 seminar, entitled “Using Data Science to Advance the Impact of Vascular Digital Twins in Medicine.” Building a detailed, realistic model of human blood flow is a formidable mathematical and computational challenge. Combining physics-based modeling with data science approaches is critical to addressing open questions in personalized medicine. Randles discussed building and using high-resolution digital twins of patients’ vascular anatomy to inform the treatment of a range of human diseases. She presented the associated data challenges and identified key areas where data science can play a role in advancing the work.

The April 6 seminar featured Sandia National Labs’ Dr. Philip Kegelmeyer, who was the inaugural speaker when the series launched in 2018. In his talk, “Adversarial Machine Learning: Categories, Concepts, and Current Landscape,” Kegelmeyer provided an overview of the three main categories of ML vulnerabilities, speaking to how an adversary might subvert the original training data to manipulate the resulting model, change the test data in order to evade the correct outcome from the model, or cause the model to reveal details of its training data or its structure that it did not intend to reveal. The presentation also included a brief survey of recent work, focusing on edge cases that don’t smoothly fit into the subvert/evade/reveal categorization.

The next seminar is scheduled for Thursday, May 18 and will feature Katiana Kontolati of Johns Hopkins University speaking about “Leveraging Latent Representations for Predictive Physics-Based Modeling and Uncertainty Quantification.” Speakers’ biographies and abstracts are available on the seminar series web page, and many recordings are posted to the YouTube playlist. To become or recommend a speaker for a future seminar, or to request a WebEx link for an upcoming seminar if you’re outside LLNL, contact datascience [at] llnl.gov (datascience[at]llnl[dot]gov). (Pictured left to right: Randles, Kegelmeyer, and Kontolati.)

screen shot of Priyadip standing outside and speaking to the camera

Meet an LLNL Data Scientist

Priyadip Ray came to LLNL out of a desire to do impactful work. Growing up in India, Ray was inspired by his father, a physicist, to pursue science and discovered the potential that engineering had to change lives. After finishing his undergraduate and graduate work in India, Ray obtained his PhD in electrical and computer engineering from Syracuse University and completed a stint as a postdoc at Duke University before joining the Lab. At LLNL, Ray applies ML and AI to clinical data and electronic health records to create predictive models of diseases including amyotrophic lateral sclerosis, sepsis, and COVID-19. Through these improved models, clinicians could potentially uncover novel therapeutics or detect signatures and diagnose diseases much earlier than they are currently able to, providing more lead time to develop countermeasures and prepare for future pandemics.

Ray enjoys engineering because it allows him to work on “Big Science” and multidisciplinary projects and said he has “never found a better set of colleagues anywhere else.” He encourages young engineers to seek out a specialized niche and get involved in research projects through internships, at companies or at universities to find the best fit. “Every research group needs diverse people, because everybody brings some unique strengths, and that contributes a lot. My advice would be to reach out and get involved in projects, even if they’re not directly in your area, but that give you that bigger picture view that will really help you succeed.” Watch him discuss his background and projects (2:15).