Volume 32

Jan. 29, 2024

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

Brian and Cindy pose on the large LLNL letters outside on the Lab’s main campus

New Year, New Directions

Happy new year from the DSI! Although a new calendar year has begun, LLNL is already a quarter of the way into Fiscal Year 2024. Each FY brings new goals, challenges, and opportunities to all areas of the Lab, including the DSI. With increased emphasis and prioritization of data science, particularly artificial intelligence (AI) and machine learning (ML), at the national level, the U.S. Department of Energy and LLNL recognize the importance of investing in projects and programs that harness these technologies safely and securely.

Our community has grown inside and outside the Lab—this newsletter now reaches more than 550 subscribers!—and we are excited about the new directions the DSI is moving in. Our student programs are reorganizing for better scalability and alignment with LLNL workforce strategies, which necessitates expanding the leadership team (see next story). We’re also adding Data Science Ambassador roles to the DSI’s core team to ensure representation in more areas of the Lab and to provide new career opportunities for staff. (An upcoming news article will recap these personnel changes.)

Outreach is always a priority: This year we’re sponsoring a new ML training program for employees who are novices in the field; co-hosting LLNL’s annual Women in Data Science regional event (see “Save the Dates” story below); continuing our long-running monthly seminar series; and working with the Lab’s AI Innovation Incubator (AI3) to articulate a national security–focused vision for AI/ML safety, data sharing, and other crucial topics.

These are just some of the plans in progress for 2024. Thank you for participating in our data science community, and please don’t hesitate to contact us at datascience [at] llnl.gov (datascience[at]llnl[dot]gov) with any questions or ideas. We wish everyone a safe and happy new year.

—Brian Giera (DSI director) and Cindy Gonzales (DSI deputy director)

WiDS Livermore logo

Save the Dates for WiDS Events

Women in Data Science (WiDS) Livermore is back! The Lab is hosting two related events: First is a datathon—a hackathon with a dataset and challenge problem—on February 28. Then the annual regional conference will be held on March 13. These hybrid events are free and open to everyone. Look for registration links, agendas, and other details at data-science.llnl.gov/wids.

LLNL Datathon on Wednesday, February 28

  • Register by February 21
  • Hosted at the Livermore Valley Open Campus (LVOC) just outside the LLNL gates and virtually
  • Sponsored by the DSI and LLNL Computing
  • The worldwide WiDS conference hosts a competitive datathon in which participants can learn more about data science and hone their skills. This one-day event at LLNL provides an opportunity to collaborate, innovate, and investigate a challenging data science problem. The datathon is designed for data science enthusiasts who are discovering or building their data skills (beginner and intermediate levels). An LLNL data scientist will guide participants through the dataset and notebook.

WiDS Livermore conference on Wednesday, March 13

  • Register by March 1
  • Hosted at the University of California Livermore Collaboration Center (UCLCC) adjacent to the LLNL campus and virtually
  • Sponsored by the DSI and LLNL’s Office of Strategic Diversity and Inclusion Programs
  • This regional conference will include a tie-in with the datathon as well as keynote speakers, technical talks, career-focused panel discussions, speed mentoring, and a poster session.

This is the seventh year for WiDS Livermore, which is independently organized by LLNL to be part of the mission to increase participation of women in data science and to feature outstanding women doing outstanding work. Contact WiDS-Committee [at] llnl.gov (WiDS-Committee[at]llnl[dot]gov) with any questions.

the DSSI class of 2023 stands outside on the LLNL campus

Student Programs Synergy

The DSI’s two student programs—Data Science Summer Institute (DSSI) and Data Science Challenge (DSC)—have enjoyed many years of success and in 2024 are reorganizing with an expanded leadership team. Under a new banner of Data Science Student Internships, this strategy will help the programs scale and more closely coordinate with each other, while also providing students and mentors with the flexibility of year-round opportunities.

LLNL computer scientist Brian Gallagher, who has led the DSC since 2021, will oversee the combined programs with assistance from Omar DeGuchy, Amanda Muyskens, Kerianne Pruett, and Mary Silva—all of whom bring mentoring experience and data science expertise to the team. Gallagher states, “I’m really excited about working with this leadership team. We are all passionate about growing these programs, and this new organization allows us to devote more attention to developing challenge problems, evaluating the huge number of applications we receive, and opening more pathways to employment at LLNL. As always, providing students with the best possible experience remains a priority.”

two 3x4 grids of multicolored rectangles (spectral heatmaps), each a different combination of blues, greens, yellows, and reds

NeurIPS Paper Illuminates Neural Image Compression

An enduring question in ML concerns performance: How do we know if a model produces reliable results? The best models have explainable logic and can withstand data perturbations, but performance analysis tools and datasets that will help researchers meaningfully evaluate these models are scarce.

A team from LLNL’s Center for Applied Scientific Computing (CASC) is teasing apart performance measurements of ML-based neural image compression (NIC) models to inform real-world adoption. NIC models use ML algorithms to convert image data into numerical representations, providing lightweight data transmission in low-bandwidth scenarios. For example, a drone with limited battery power and storage must send compressed data to the operator without destroying or losing any important information.

CASC researchers James Diffenderfer and Bhavya Kailkhura, with collaborators from Duke University and Thomson Reuters Labs, co-authored a paper investigating the robustness of NIC methods. The research was accepted to the 2023 Conference on Neural Information Processing Systems (NeurIPS). Founded in 1987, the conference is one of the top annual events in ML and computer vision research.

“To the best of our knowledge, no one has studied NIC methods in this way before,” Diffenderfer points out. “It’s important to understand where these models are vulnerable or make mistakes in risky or dangerous scenarios. This type of analysis matters not just in image compression, but even with large language and computer vision models. Robustness is a concern everywhere.”

blue wavy shapes with four plots showing test results as blue and yellow dots and purple waves

Reinforcement Learning Optimizes Metamaterials

LLNL staff scientist Xiaoxing Xia collaborated with the Technical University of Denmark to integrate ML and 3D printing techniques. In a paper published in the Journal of Materials Chemistry A, the team detailed their ML approach for devising new shape-changing metamaterials by optimizing their storage capability. Metamaterials generally refer to artificial structures as opposed to those naturally occurring; incorporating these structures could allow lithium-ion batteries to operate more effectively.

Next-generation anodes composed of silicon could theoretically store up to ten times the volume of lithium than can the best graphite anodes available today. However, silicon anodes must be able to withstand lithiation-induced volume expansion to roughly three times their original volume, often causing silicon particles to fracture and thus jeopardizing battery function and longevity.

Xia’s team sought to pin down optimal deformation of the silicon structure to maximize its volume and, by extent, battery performance. Their reinforcement learning model found node-and-beam configurations conducive to maximized storage volume. “What makes this approach so widely applicable is that a user could choose any reward and optimize for different properties. Storage capacity was simply the most relevant parameter for us to focus on this time,” Xia says.

bar graph showing number of newly diagnosed veterans on the y-axis and years on the x-axis

LLNL-Led Team Uses ML to Impact Amyotrophic Lateral Sclerosis Treatment

LLNL researcher Priyadip Ray received a Department of Defense Amyotrophic Lateral Sclerosis Research Program (ALSRP) Therapeutic Idea Award to identify drugs that could be repurposed to treat amyotrophic lateral sclerosis (ALS). ALS is a progressive neurodegenerative disease that leads to the loss of upper and lower motor neurons in the motor cortex, brain stem, and spinal cord. For unknown reasons, veterans are 1.5 times more likely than the general population to develop ALS. The Veterans Affairs (VA) historical database is the largest longitudinal, comprehensive ALS dataset of over 21,000 Veterans diagnosed with ALS. The pathophysiological mechanisms underlying ALS onset and progression are still largely unknown. Currently, there are no effective treatment strategies for ALS.

Ray’s team is testing the hypothesis that drugs prescribed for other indications can alter an individual’s risk for ALS and/or its progression. Throughout this project, the team is using causal ML—a method of analyzing large amounts of data to identify cause-and-effect relationships underlying the information within a dataset. Along with Ray, the LLNL team includes Andrew Goncalves, Braden Soper, and Jose Cadena, who are joined by clinicians from VA Palo Alto and pharmacology researchers from Stanford University and UC Los Angeles. They have curated a dataset of longitudinal electronic health records from veterans with ALS and recently published a paper in Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration. (Image at left: Number of newly diagnosed ALS cases in VA electronic health records by year.)

collage of ones and zeros, lines forming a face in profile, and an image of the Earth

LLNL Contributes to AI for Climate Change Mitigation Roadmap

The Innovation for Cool Earth Forum (ICEF) annually publishes roadmaps that chart a course for clean energy transition. These documents bring together industrial, academic, and governmental insights to provide a realistic, fact-based pathway for stakeholders. Unveiled at the United Nations climate summit, the 2023 roadmap delves into the high-impact opportunities AI presents in combating climate change. It also addresses barriers, risks, and policy considerations for effective implementation.

LLNL senior staff researcher Ruben Glatt co-wrote the document’s “Road Transportation” chapter, which discusses ways AI can be used to reduce carbon dioxide emissions through battery design, biofuel development, and smart infrastructure components. Glatt states, “LLNL has a long history contributing to the annual roadmaps that help shape policy decisions on a global scale. I am honored I was invited to contribute this chapter by David Sandalow, former U.S. Under Secretary of Energy, based on my recent work on the growing symbiosis between the energy and transportation sector due to increasing vehicle electrification.”

San Diego skyline and bay as seen from above the Coronado Island bridge

Join LLNL at SPIE Applications of Machine Learning Conference

LLNL researchers are heavily involved in the planning of this year’s SPIE Optics + Photonics: Applications of Machine Learning Conference, which will take place in San Diego on August 18–22. The conference committee is led by Michael Zelinski alongside LLNL committee members Abdul Awwal, Cindy Gonzales, James Henrikson, Alan Kaplan, Nathan Mundhenk, and Priyadip Ray.

The conference casts a broad net over a multitude of topic areas in ML, and sessions will include remote sensing, healthcare and biomedicine, industrial applications, and physics. If you have applied ML work in imaging, physics, or photonics and want to publish/present it, paper abstracts are due February 7. Additional conference information and submission guidelines are available at the link above.

Daniel Arnold’s portrait next to the seminars icon

Seminar Explores Cybersecurity of Distributed Energy Resources

The DSI’s December seminar was presented by Dr. Daniel Arnold of Lawrence Berkeley National Lab (LBNL). The adoption of distributed energy resources (DER), such as rooftop solar systems, behind-the-meter batteries, and electric vehicles presents many challenges for system operators who are tasked with maintaining the safety and efficiency of the power grid. Internet of Things connectivity of these devices, coupled with emerging control paradigms being put forth in DER standards, makes it possible for these devices to be remotely accessed and utilized to disrupt the operation of the power system.

In “Operational Cybersecurity of Distributed Energy Resources using Optimization, Control Theory, and Machine Learning,” Arnold highlighted past, recent, and new research being led by LBNL and funded by the DOE looking at this issue. He showed how techniques from control theory, optimization, and ML can be used to detect and mitigate certain kinds of cyber-attacks on DER control systems. Finally, Arnold closed with an overview of cybersecurity-related challenges in power systems and some thoughts on how data science techniques can be used to address those challenges.

Arnold is a Research Scientist at LBNL and an Adjunct Professor of Civil and Environmental Engineering at UC Berkeley. He graduated from UC Berkeley with a PhD in Mechanical Engineering in 2015 and was an ITRI-Rosenfeld Postdoctoral Fellow at Lawrence Berkeley National Laboratory from 2016 to 2017. His interests are in the fields of control theory, optimization, and ML. His recent work focuses on the use of these techniques for cybersecurity of the electric power system and other critical infrastructure.

Speakers’ biographies and abstracts are available on the seminar series web page, and many recordings are posted to the YouTube playlist. To become or recommend a speaker for a future seminar, or to request a WebEx link for an upcoming seminar if you’re outside LLNL, contact DSI-Seminars [at] llnl.gov (DSI-Seminars[at]llnl[dot]gov).

Mason’s portrait next to the highlights icon

Meet an LLNL Data Science Engineer

Mason Sage’s zeal for interdisciplinary research was ignited in trade school, when he discovered mechatronics—a hands-on fusion of electrical engineering, mechanical engineering, and computer science. He went on to an Engineering/Computer Science degree, a robotics engineering stint at Tesla, and work in the semiconductor field before gravitating to LLNL’s high-stakes challenges in 2022. Sage is a staff research engineer supporting the Mechanical Engineering Department where he’s helping to build a system for the Modular Autonomous Research Systems (MARS) project that’s automated enough to perform specified processes and intelligent enough to make decisions based on experience. For the HAMMER project, he builds mechatronics elements. He especially likes designing processes for generating data and then finding automated processes to refine those processes, stating, “Automating projects opens up a lot of doors.” Sage has enjoyed success in his short time at the Lab and looks forward to applying his expertise to concepts for self-driving laboratories. “I like working on whatever is cutting edge and staying ahead of the curve,” he says. “Working in national security has been rewarding.”