Volume 8

June 10, 2021

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

abstract art showing molecules and crystalline density

Research in Feedstock Optimization

A long-held goal by chemists across many industries, including energy, pharmaceutics, energetics, food additives, and organic semiconductors, is to imagine the chemical structure of a new molecule and predict how it will function for a desired application. In practice, this vision is difficult to realize, often requiring extensive laboratory works to be able to synthesize, isolate, purify, and characterize newly designed molecules to obtain the desired information.

A team of LLNL materials and computer scientists have brought this vision to fruition for energetic molecules by creating machine learning (ML) models that can predict molecules’ crystalline properties from their chemical structures alone, such as molecular density which is strongly correlated to detonation performance. Predicting crystal structure descriptors (rather than the entire crystal structure) offers a new and efficient method to infer a material’s properties, thus expediting materials design and discovery.

One of the team’s most prominent ML models is capable of predicting the crystalline density of energetic and energetic-like molecules with a high degree of accuracy compared to previous ML-based methods. Even when compared to density-functional theory—a computationally expensive and physics-informed method for crystal structure and crystalline property prediction—the ML model boasts competitive accuracy while requiring a fraction of the computation time. Full details can be found in a recently accepted publication in the Journal of Chemical Information and Modeling.

Members of LLNL’s High Explosive Application Facility (HEAF) have already begun taking advantage of the model’s web interface with a goal to discover new insensitive energetic materials. By simply inputting molecules’ 2D chemical structure, HEAF chemists have been able to quickly determine the predicted crystalline density of those molecules, which is closely correlated with potential energetics’ performance metrics. Likewise, follow up efforts within LLNL’s Materials Science Division have used the ML model in conjunction with a generative model to search large chemical spaces quickly and efficiently for high density candidates.


slide showing Celeste's bio with video chat windows above

New Career Panel Series

The organizing committee that brought Women in Data Science (WiDS) Livermore to the Lab community in March has launched a new career-focused panel series for LLNL staff and students. Sponsored by the DSI, the inaugural June 10 panel highlighted women in management and leadership positions at the Lab, featuring moderator Marisol Gamboa and panelists Jessie Gaylord, Katie Lewis, Celeste Matarazzo, and Kathryn Mohror.

Data scientist and panel series organizer Cindy Gonzales stated, “I was so inspired when I heard the WiDS speakers and panelists share their journeys this year. We realized the value of sharing employee journeys to provide advice as well as highlight different career paths at the Lab. We kicked off this series by featuring women in leadership positions who make an impact and inspire those around them every day.”


seminar icon next to brenden's portrait

Virtual Seminars…

LLNL researcher Dr. Brenden Petersen (pictured at left) spoke at the DSI’s April virtual seminar. His team developed a new framework that leverages deep learning for symbolic regression by using a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. This work was accepted as an Oral Presentation at the 2021 International Conference on Learning Representations (ICLR).

In the May virtual seminar, UC Berkeley professor Dr. Yi Ma spoke about deep networks from first principles. His presentation offered a “white box" interpretation of deep convolution networks from the perspective of data compression and group invariance. The approach reveals a fundamental tradeoff between invariance and sparsity for class separability as well as a fundamental connection between deep networks and Fourier transform for group invariance. This research is a collaboration among UC Berkeley and Columbia University.


screen shot of Katie speaking to the camera

…and a Video Playlist

Since launching in 2018, the DSI has hosted more than three dozen speakers in its seminar series. These events invite researchers from academia, industry, and other institutions to discuss their work for an hour to an LLNL audience. In 2020, the series transitioned to a virtual format, and a video playlist of recently recorded seminars is available on the Livermore Lab Events YouTube channel.

Continuing the series virtually was an important decision. “Through video conferencing and now recordings of these seminars, we’ve been able to reach students and new staff, some of whom have never been onsite,” explains Dr. Kathleen Schmidt, technical coordinator for the seminars since 2019. “It’s not unusual for us to have a virtual seminar with over 100 people signing in to listen and watch.” Seminar recordings will be added to the playlist as they become available. Read more about the series on this website.


ML4I logo with neural network and starburst

Industry Opportunity

LLNL is looking for participants and attendees from industry, research institutions and academia for the first-ever Machine Learning for Industry Forum (ML4I), a three-day virtual event starting August 10 sponsored by LLNL’s High Performance Computing Innovation Center and the DSI. The deadline for submitting presentations or industry use cases is June 30. The deadline for attendee registration is July 29.


4x2 grid of plots in different colors

Research Highlights

  • ADAPD advances. The Advanced Data Analytics for Proliferation Detection program held a two-day virtual technical exchange meeting recently. The meeting highlighted science-based and data-driven analysis work to accelerate AI innovation and develop AI-enabled systems to enhance the United States’ capability to detect nuclear proliferation activities around the globe.
  • CVPR papers. The 2021 Conference on Computer Vision and Pattern Recognition featured two papers co-authored by LLNL computer scientist Bhavya Kailkhura and targeted at improving the understanding of robust ML models. Both papers examine the importance of data in building models, part of a Lab effort to develop foolproof artificial intelligence and machine learning systems.
  • Hundreds of downloads. The LLNL-authored paper “Uncovering interpretable relationships in high-dimensional scientific data through function preserving projections,” published in Machine Learning Science and Technology, reached a milestone of 500 downloads on May 7. As of this newsletter’s date, the number is well over 600. Authors are Shusen Liu, Rushil Anirudh, Jayaraman J. Thiagarajan, and Peer-Timo Bremer.
  • COVID-19 detection and analysis. LLNL bioinformaticist and DSSI co-director Nisha Mulakken sat down with The Data Standard Podcast to discuss the Lawrence Livermore Microbial Detection Array (LLMDA) system, which has detection capability for all variants of SARS-CoV-2. The video episode runs 14:17.

DSSI and UC Merced logos side by side

Internships Are Under Way

The DSSI welcomed 30 student interns from all over the U.S. beginning on May 24. Students are meeting with mentors, taking short courses in data science, connecting with other students and mentors, and tackling a real-world data science challenge problem. As with last year’s program, the class of 2021 are working remotely.

The annual Data Science Challenge with UC Merced took place virtually over three weeks beginning on May 17. Students worked with LLNL scientists to solve an exciting problem in astronomy for planetary defense. Near the end of the summer, we’re adding another campus to the Challenge: UC Riverside.


diagram of ML model with input of chemical structure and output of crystalline properties

Recent Publications


highlights icon with jose's portrait

Meet an LLNL Data Scientist

Jose Cadena enjoys the discovery process when analyzing new data sets, despite the difficulties in preparing data before building ML models. “Often a data set is incomplete or contains errors from different sources. Sometimes its size makes it difficult to extract knowledge,” he says. Cadena contributes to LLNL’s brain-on-a-chip project by studying complex networks among brain cells. He also investigates ways to detect anomalous activity in networks, finds clusters of under-vaccinated populations to inform public health resources, and this spring co-organized the DSI’s AI in Healthcare workshop. Formerly a three-time LLNL summer intern, Cadena values ongoing education: “I like to keep learning about different research domains while developing a data science skill set applicable to many problems of global importance.”