Volume 6

March 10, 2021

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

Example of sampling an expression from the team’s recurrent neural network, which is used to emit a distribution over tractable mathematical expressions.

Spotlight: New Research Ranked Among Top AI Papers

Symbolic regression is the ML task of discovering tractable mathematical expressions to fit a dataset, yet the AI community has not fully explored deep learning approaches that explore this challenging space. In a paper accepted as an Oral Presentation at the upcoming International Conference on Learning Representations (ICLR), an LLNL research team proposes a framework that leverages deep reinforcement learning for symbolic regression via a simple idea—use a large model (neural network) to search the space of small models (expressions). With an Oral acceptance rate of only 1.5%, the team’s paper also ranks fifth out of approximately 3,000 scored papers.

Lead author Brenden Petersen explains, “From the algorithmic perspective, our approach is not specific to the problem of symbolic regression. More broadly, our framework applies to discrete optimization problems where the user may want to incorporate some knowledge into the search. We are just now beginning to apply it to other tasks, such as finding interpretable reinforcement learning policies or optimizing amino acid sequences.”

Co-authors of “Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients” are Mikel Landajuela Larma, Nathan Mundhenk, and Claudio Santiago, along with former LLNL staff Soo Kim and Joanne Kim. Petersen adds, “The success of our approach relied heavily on our team’s broad expertise in optimization, mathematics, physics, deep learning, and reinforcement learning.” The research will be featured at an upcoming DSI virtual seminar. (Image: Example of sampling an expression from the team’s recurrent neural network, which is used to emit a distribution over tractable mathematical expressions.)

LLNL’s ML robustness research probes how a model works, how it responds to new data variables, how it behaves under attack, and other questions of interpretability.

Spotlight: Winter Conference Accolades

LLNL researchers are making the rounds at premier artificial intelligence (AI) and machine learning (ML) conferences, earning paper acceptances and awards along the way (preprints are linked here). The SPIE Medical Imaging Conference awarded Best Paper to LLNL computer scientist Jayaraman Thiagarajan and three IBM co-authors. The paper, “Self-training with improved regularization for sample-efficient chest x-ray classification,” describes a deep learning framework that utilizes a number of key components to enable robust modeling in challenging medical scenarios.

The IEEE Winter Conference on Application of Computer Vision recognized “Generative patch priors for practical compressive image recovery” by LLNL computer scientist Rushil Anirudh and two co-authors. The research details an approach that can recover a wide variety of natural images using a pre-trained patch generator, and received the conference’s Best Paper Honorable Mention award based on its potential impact to the field.

Three LLNL papers—all co-authored with Lab interns from Arizona State University—were accepted at the recent AAAI Conference on Artificial Intelligence. Computer scientists Thiagarajan, Anirudh, Bhavya Kailkhura, and Peer-Timo Bremer led this research on the integrity, or robustness, of ML models. The team’s papers tackle robustness from different angles: feature importance estimation under distribution shifts, attribute-guided adversarial training, and uncertainty matching in graph neural networks. Jeff Hittinger, director of LLNL’s Center for Applied Scientific Computing, states, “That our outstanding researchers receive this kind of recognition by the ML community demonstrates that we are not only applying machine learning to our challenging mission problems but also pushing the frontiers of ML methodologies.” (Image: LLNL’s ML robustness research probes how a model works, how it responds to new data variables, how it behaves under attack, and other questions of interpretability.)

highlights icon with kelli's portrait

ICF Researcher Honored by Alma Mater

LLNL design physicist Kelli Humbird was honored by Texas A&M University’s Department of Nuclear Engineering for her work at LLNL in combing machine learning with inertial confinement fusion (ICF) research. Humbird graduated from the university with a PhD in nuclear engineering in 2019. Since joining the Lab as an intern in 2016, she has made key contributions to the ICF program, creating a widely used neural network algorithm to help produce higher performing implosions and applying a technique called transfer learning to create a more predictive model of ICF experiments.

screen shot of Chris speaking in the Discovery Center

Data Science Highlights

  • New deep learning approach to designing emulators. LLNL scientists have developed a new deep learning approach to designing emulators for scientific process that is more accurate and efficient than existing models. This ‘learn-by-calibrating’ method could be used as proxies for far more computationally intensive simulators.
  • Video on exascale computing for stockpile stewardship. Chris Clouse (pictured at left), acting director of LLNL’s Weapon Simulation and Computing program, talks about how exascale computing will enable LLNL the safety, security, and reliability of the nation’s nuclear deterrent and better position the nation to respond to evolving national security threats.
  • Molecular crystal structures pack it in. This new work uses an efficient optimization algorithm that circumvents many problems found in previously proposed packing motif labeling methods, leading to new state-of-the-art results when tested on an LLNL-curated datasets.
  • Open Data Initiative adds new datasets. The DSI’s website now includes a dataset for material point method–based methods to simulate deformation of objects under mechanical loading as well as a video dataset for two-photon lithography additive manufacturing.

seminar icon next to gavin's portrait

Virtual Seminar

In the DSI’s first seminar of 2021, Gavin Hartnett of the RAND Corporation presented “Deep Generative Modeling in Network Science with Applications to Public Policy Research.” His talk explained why deep generative methods can be used to generate realistic synthetic networks useful for microsimulation and agent-based models capable of informing key public policy questions. He introduced a new generative framework that applies to large social contact networks commonly used in epidemiological modeling, comparing these recent neural network–based approaches with more traditional Exponential Random Graph Models.

cartoon of an asteroid in motion

Data Science Challenge to Feature Astronomy

For the third consecutive year, the DSI is teaming up with the University of California’s Merced campus to offer an intensive Data Science Challenge internship. For three weeks in May, students will work on an important data science problem while learning from experts, networking with peers, and developing skills for future internships. This year’s Challenge focuses on planetary defense. Students will apply ML methods to recent time-domain optical astronomy data to detect, distinguish, and characterize asteroids that may pass near Earth in the future.