Volume 40

Sept. 24, 2024

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

Stuart Russell outside

Join Us on October 3 for Stuart Russell Colloquium

On Thursday, October 3, Dr. Stuart Russell will present a colloquium on general AI safety at LLNL. Co-sponsored by the Office of the Deputy Director for Science and Technology and the DSI, the seminar will begin at 10:05am Pacific, and external audiences are welcome to join remotely. Contact DSI-Seminars [at] llnl.gov (DSI-Seminars[at]llnl[dot]gov) to request a WebEx link.

Abstract: The media are agog with claims that recent advances in AI put artificial general intelligence (AGI) within reach. Is this true? If so, is that a good thing? Alan Turing predicted that AGI would result in the machines taking control. I will argue that Turing was right to express concern but wrong to think that doom is inevitable. Instead, we need to develop a new kind of AI that is provably beneficial to humans. Unfortunately, we are heading in the opposite direction and we need to take steps to correct this.

Dr. Stuart Russell is the Michael H. Smith and Lotfi A. Zadeh Chair in Engineering and a professor in UC Berkeley’s Division of Computer Science. His book Artificial Intelligence: A Modern Approach (with Peter Norvig) is the standard text in AI, with translations in 14 languages and use in 1,500 universities in 135 countries. His research covers a wide range of topics in AI including machine learning (ML), probabilistic reasoning, knowledge representation, planning, real-time decision making, multitarget tracking, computer vision, computational physiology, and philosophical foundations. In 2021, he was appointed by Her Majesty The Queen as an Officer of the Most Excellent Order of the British Empire. Read his full biography on the UC Berkeley website.


7x3 grid of maps showing central California dotted with a varying range of colors from dark blue to yellow corresponding to a scale bar of R2 score

Machine Learning Illuminates California Historical Streamflow Changes

Streamflow predictability is crucial in California, where climate variability has huge impacts on agriculture, hydropower, and urban development. Simulations from General Circulation Models (GCMs) can inform the regional climate but are often too coarse to resolve adequately for local basins and streams, so the climate modeling community is turning to ML models to help answer water resource and hydroclimate questions.

LLNL researchers Shiheng Duan, Giuliana Pallotta, and Céline Bonfils recently investigated the role of climate variability drivers on California’s historical streamflow patterns. Funded by the Laboratory Directed Research and Development Strategic Initiative focusing on Climate Resilience and published in Communications Earth & Environment, their study trained multiple ML models on GCMs and observation/reanalysis datasets, then applied those models to six GCMs to derive the influence from different sequences of internal climate variability such as Pacific Decadal Oscillation (PDO) on streamflow changes. “Here, we used an ensemble of ML models within a ‘science-centric’ framework—that is, deriving the common scientific findings from the different ML models to find out where most investigated models agree on,” explains Duan.

The team quantified the climate variability impacts by scoring the predictability of variables—such as PDO and carbon dioxide concentration—both separately and cumulatively. They found that the most important feature predictors were specific modes of the Pacific/North American (PNA) and PDO. Higher order modes like PNA-5 and PDO-5 revealed strong correlations with local water circulation patterns including streamflow variability. Duan adds, “Normally, people tend to focus on the first order mode of these climate indices as they are well-defined variability indices, whereas in our study we showed the importance of higher order modes to capture climate variability patterns in local hydroclimate systems.”

The image at left shows performance ML models (each in its own row) for different rivers in central California. The first six columns indicate the performance of ML models on the ensemble GCM dataset, scored according to a cross-validation of each GCM’s predictions. The last column depicts the best performance for each ML model with the reanalysis dataset. Triangles and circles represent the stations with peak in streamflow occurring in summer and winter, respectively.


shield icon with a lock icon on an abstract blue background with rays of lines fanning out from the bottom of the shield

Measuring Attack Vulnerability in AI/ML Models

LLNL is advancing the safety of AI/ML models in materials design, bioresilience, cyber security, stockpile surveillance, and many other areas. A key line of inquiry is model robustness, or how well it defends against adversarial attacks. A paper accepted to the renowned 2024 International Conference on Machine Learning explores this issue in detail.

In “Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies,” Brian Bartoldson, James Diffenderfer, Bhavya Kailkhura, and Konstantinos Parasyris studied the effect of scaling robust image classifiers—using a method called adversarial training—to develop the first scaling laws for robustness. The team’s adversarial training approach alters pixels where the model seems most vulnerable, thus providing the model with a more continuous view of the data distribution. Among the findings is that better data quality provides significant benefits to the robustness produced by adversarial training. The team improved on the state of the art to 74% adversarial robustness and outperformed it with a model three times smaller and that saw three times more data.

AI/ML model robustness and resilience will always be relevant in mission-critical settings. Visit LLNL Computing to learn more about this paper, and take a quiz to see how well you can identify adversarially perturbed images.


two researchers in lab coats in front of lab equipment

Researchers Unleash Machine Learning in Designing Advanced Lattice Structures

Characterized by their intricate patterns and hierarchical designs, lattice structures hold immense potential for revolutionizing industries ranging from aerospace to biomedical engineering, due to their versatility and customizability. However, the complexity of these structures and the vast design space they encompass have posed significant hurdles for engineers and scientists, and traditional methods of design exploration and optimization often fall short when faced with the sheer magnitude of possibilities within the lattice-design landscape.

LLNL scientists and engineers are looking to address these longstanding challenges by incorporating ML and AI to accelerate design of lattice structures with properties like low weight and high strength, that can be optimized with unprecedented speed and efficiency. In a recent study published by Scientific Reports, LLNL researchers fused ML-based approaches with traditional computational techniques in hopes of ushering in a new era in lattice design. By harnessing the power of ML algorithms, researchers are unlocking the ability to predict mechanical performance, optimize design variables, and speed up the computational design process for lattices that possess millions of potential design options.

“By leveraging ML-based approaches in the design workflow, we can accelerate the design process to truly leverage the design freedom afforded by lattice structures and take advantage of their diverse mechanical properties,” said lead author and engineer Aldair Gongora. “This work advances the field of design because it demonstrates a viable way of integrating iterative ML-based approaches in the design workflow and underscores the critical role ML and AI can play in accelerating design processes.”


a scientist in PPE performs an experiment with an apparatus labeled as Hudson Robotics

Rapid Response Lab and Supercomputing System Combine to Accelerate Biodefense

LLNL recently welcomed officials from the Department of Defense (DOD) and National Nuclear Security Administration (NNSA) to dedicate a new supercomputing system and Rapid Response Laboratory (RRL). DOD is working with NNSA to significantly increase the computing capability available to the national biodefense programs. The collaboration has enabled expanding systems of the same architecture as LLNL’s upcoming exascale supercomputer, El Capitan, featuring AMD’s cutting-edge MI300A processors. These systems will provide unique capabilities for large-scale simulation and AI-based modeling for a variety of biodefense activities, including biosurveillance, threat characterization, advanced materials development, and accelerated medical countermeasures. DOD and NNSA intend to allow the U.S. government interagency, international allies, and partners and academia and industry to access the supercomputing capability.

The RRL will leverage the recently dedicated supercomputing rack to enable researchers to rapidly design, test, and evaluate computationally derived protein designs, in hopes of accelerating the discovery and development of medical countermeasures for emerging or unknown biological threats. A short walk away from the computing facility, the RRL complements the DOD Chemical and Biological Defense Program’s Generative Unconstrained Intelligent Drug Engineering (GUIDE) program. GUIDE accelerates medical countermeasure design by leveraging ML-backed antibody design, experimental data, structural biology, bioinformatic modeling, and molecular simulations. The program includes dozens of LLNL researchers and collaborators from government and academia, including Los Alamos and Sandia.


five panelists sit in chairs in front of an audience

Register Now for DOE Data Days

The 2024 DOE Data Days (D3) Workshop returns to LLNL in person on October 22–24. Registration is open through October 14. Contact felker7 [at] llnl.gov (Lisa Felker) for a personalized invitation, and visit the D3 website for additional details.

D3 provides a forum to collaborate on data management ideas and strategies and build an actionable plan forward to drive innovation and progress across DOE and NNSA. The workshop brings together data management practitioners, researchers, and project managers from DOE and the national labs to promote data management as a means to higher quality and more efficient research and analysis. Presentations and posters from the DOE data management community will cover these themes:

  • Cloud and Hybrid Data Management
  • Data Intensive Computing
  • Data Curation and Governance

Breakout sessions will provide the opportunity to delve further into high interest topics in DOE data management and facilitate collaboration across institutions. Keynote speakers from DOE and other government agencies will highlight policy and implementation strategy success in their work.


3x5 grid of phases of shock waves, progressing from left to right and labeled according to pressure and time

Recent Research

Preprints:


Pawan Tripathi’s portrait next to the seminars icon

Seminar Explores Ontologies, Deep Learning, and AI

The DSI’s August 21 seminar was “Ontologies, Graph Deep Learning, & AI” presented by Dr. Pawan Tripathi of case Western Reserve University (CWRU). The integration of ontologies, semantic reasoning, and graph-based deep learning and AI signifies a paradigm shift in studying high-dimensional multimodal problems, particularly within advanced manufacturing, synchrotron science, and photovoltaics. Ontologies provide structured frameworks for knowledge representation, while graphs model complex relationships and interactions, enhancing AI’s reasoning and predictive capabilities. This talk discussed a paradigm shift in high-dimensional multimodal learning using data-driven digital twins for advanced manufacturing and photovoltaics.

Tripathi is a research assistant professor in the Department of Materials Science and Engineering at CWRU in Ohio. He leads projects related to materials data science at the DOE/NNSA-funded Center of Excellence for Materials Data Science for Stockpile Stewardship. His expertise lies in interface structural simulations and developing automated analysis pipelines for large multimodal datasets from diverse experiments.

Speakers’ biographies and abstracts are available on the seminar series web page, and many recordings are posted to the YouTube playlist. To become or recommend a speaker for a future seminar, or to request a WebEx link for an upcoming seminar if you’re outside LLNL, contact DSI-Seminars [at] llnl.gov (DSI-Seminars[at]llnl[dot]gov).


Paige’s portrait next to the highlights icon

Meet an LLNL Data Scientist

Paige Jones has been a software developer in LLNL’s Enterprise Application Services division for three years. She is responsible for the integration of commercial off-the-shelf tools and software into LLNL’s internal systems, the development and enhancement of web applications, and the exploration of cutting-edge technologies for potential use at Livermore. With a B.S. in Computer Information Systems from California State University, Chico, Jones is currently advancing her expertise with an M.S. in computer science at Georgia Tech. She is an avid advocate for outreach and STEM education and participates in recruitment, Girls Who Code, and Science Accelerating Girls Engagement. She strives to inspire the next generation in the diverse realms of science, technology, engineering, and mathematics. In pursuit of this goal, Jones recently served on the organizing committee for the Lab’s 2024 Women in Data Science (WiDS) datathon. “WiDS plays a critical role in building a supportive data science community, helps ensure that resources reach underrepresented groups, and empowers women in their technical endeavors,” Jones says. “I am grateful for the opportunity to participate in WiDS and plan the WiDS datathon, and I am excited for what the future holds for women in data science!”