Volume 21

Sept. 20, 2022

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

the letters DSO rendered as 3D blocks

Top AI Award at International Symbolic Regression Competition

An LLNL team claimed a top prize at an inaugural international symbolic regression competition for an artificial intelligence (AI) framework they developed capable of explaining and interpreting real-life COVID-19 data. Hosted by the open-source SRBench project at the 2022 Genetic and Evolutionary Computation Conference, the competition invited teams to submit their best symbolic regression algorithms. Organizers trained the models on datasets, assigned “trust ratings,” and evaluated them for accuracy and simplicity.

The team’s “Unified Deep Symbolic Regression” (uDSR) algorithm beat 12 other teams on the real-world track—a task to build an interpretable predictive model for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York state. The uDSR method is an updated version of the team’s earlier deep symbolic regression algorithm that finds short mathematical expressions to best fit experimental data and uncovers underlying equations or dynamics of physical processes. The uDSR team includes LLNL researchers Brenden Petersen, Mikel Landajuela, Chak Lee, Jiachen Yang, Ruben Glatt, Ignacio Aravena Solis, Claudio Santiago, and T. Nathan Mundhenk. (Read more about this research in the next story.)

neural network diagram showing tokens as circles related to their distributions and traversal of the tree; red, blue, and green circles correspond to the number of children for each token; white circles represent empty tokens; and numbers indicate the order in which tokens were sampled

Advances in Deep Symbolic Optimization

Traditional reinforcement learning (RL) methods learn a decision-making strategy that maximizes expected or average rewards. However, for some problem types, one might be interested in maximizing the best-case rewards, even if the strategy doesn’t perform as well on average—for example, trying to set a new high score at an arcade. An LLNL research team has developed a framework known as deep symbolic optimization (DSO) that adapts RL to learn these best-case rewards. In DSO, the team breaks down task solutions into sequences of discrete “tokens,” or building blocks. A sequence of tokens represents a possible solution to a symbolic optimization problem, and the goal is to find the sequence that optimizes a quality metric (i.e., the reward).

Underlying DSO is a recurrent neural network (RNN) that generates tokens sequentially and learns to recognize promising token sequences. The algorithm’s “risk-seeking” policy evaluates each expression by its rewards, then keeps a top percentage of them for subsequent training iterations, thereby learning to optimize only on the best rewards. Furthermore, users can incorporate domain knowledge by specifying priors and constraints into the training process that, respectively, bias and prune the search space.

The DSO problem-solving method has achieved state-of-the-art performance on symbolic regression when tested against baseline methods including Eureqa, a commercial product considered to be the gold standard for symbolic regression. More recently, a new version of the algorithm won the first-ever worldwide symbolic regression competition, held at the 2022 Genetic and Evolutionary Computation Conference (see previous story).

The DSO framework’s applicability extends beyond symbolic regression to other research efforts including healthcare decision making, power converter design and optimization, and antibody therapeutics development. Combining the expertise of LLNL’s Engineering and Computing Directorates, the multidisciplinary DSO team has presented their work at multiple international conferences since 2020, and most recently at the 2022 Adaptive and Learning Agents Workshop.

Image at left: (A) The RNN emits a categorical distribution over tokens, a token is sampled, and the parent and sibling of the next token are used as the next input to the RNN. Subsequent tokens are sampled autoregressively until the tree is complete. The resulting sequence of tokens is the tree’s pre-order traversal, which can be used to reconstruct the tree and instantiate its corresponding expression. (B) is the library of tokens and (C) is the expression tree sampled in (A).

collage of 17 students from the DSSI class of 2022

Data Science Summer Institute Completes Fifth Year

The Data Science Summer Institute (DSSI) class of 2022 wrapped up their internships in August. The students’ projects included surrogate models for tokamak divertor-plasma detachment control, deep learning for atmospheric and seismic applications, statistical modeling of high-dimensional fallout data, probabilistic programming assessment of SARS-CoV-2 antibody mutation regressions, data-driven physics-constrained reduced order modeling, machine learning (ML)–based viral protein fitness prediction, and more. Two students’ project presentations are available on YouTube:

  • Marina Dunn: “Visualizing Model Optimization for Orbital Debris Characterizations” (video 5:30)
  • Lance Fletcher: “Developing Large Network Centralities Utilizing Random Spanning Trees” (video 4:18)

This year’s DSSI cohort of 35 included 3 students sponsored by the Ministry of Japan and 2 Livermore Lab Foundation fellows. “As another year of DSSI concludes, I can’t help but reflect upon the impressive level of energy and talent displayed by the students,” stated co-director Nisha Mulakken. “Their diverse background skills came together like magic to deliver creative solutions to data science challenge group projects. Interacting with these students always gives me hope for the future!”

ten people pose as a group outside the Lab’s National Atmospheric Release Advisory Center

Japanese Delegation and the DSSI

LLNL hosted a group of government officials and representatives from Japan on August 16. The purpose of the visit was to provide an overview of the Lab’s programs, understand Japanese science and technology interests, and discuss future opportunities between LLNL and Japan. The group included Koji Aribayashi, chief of science section and science counselor from the Embassy of Japan, and Hajime Kishimori, acting consul general from the Consulate General of Japan in San Francisco. The Lab also welcomed officials from the Japan Science and Technology Agency, the Japan Agency for Medical Research and Development, and the National Institute of Information and Communications Technology.

During the visit, DSI director Michael Goldman presented the group with an update on the DSSI, including its history of outreach with underrepresented minority scholarship programs. Since 2020, seven Japanese students have participated in the DSSI, building their skills in network alignment and graph kernels, graph clustering with deep learning, natural language processing, multilevel graph neural networks, and more. “We’re excited to collaborate with international partners who share our interests in AI and solving cross-disciplinary problems, and we hope to expand our relationship with the Japanese government in the future,” said Goldman.

portraits of the five panelists arranged on a repeating background of the DSI and LLNL logos

Career Panel Spotlights Former Interns

The DSI’s career panel series continued on August 8 with a return of the “Former Interns Tell All!” theme—just in time for the conclusion of the Lab’s summer program. The panel consisted of former student interns who are now full-time staff in the Computing and Engineering Directorates. Moderated by Brian Bartoldson (shown here at center), the panelists were Sunshine Balingit, Alec Dunton, Denis Vashchenko, and Rebecca Haluska (clockwise from top left). They discussed how their internships helped pave the way for staff positions, described differences between being interns and employees, and offered advice for making the most of an internship. For instance, Dunton noted, “Don’t sweat the small things, and meet as many people as you can while you’re here.” The event included a lively audience Q&A session with many current student interns. Topics ranged from avoiding procrastination and finding meaningful projects to tips for job interviews and transitioning to a new research field.

four people pose in front of a workshop poster

26th Annual Workshop in Signal and Image Processing

LLNL’s Center for Advanced Signal and Image Sciences (CASIS) serves as a liaison between signal and image processing groups in industry, government, and academia. CASIS hosted its 26th annual workshop on September 7 for LLNL engineers, scientists, and students. In addition to a poster session, the workshop featured talks on ML/deep learning, inverse problems/remote and non-invasive sensing, non-destructive evaluation, collaborative autonomy, and signal and image processing at the National Ignition Facility. “The workshop examines current opportunities and challenges for the signal and image sciences [SIS] community and intends to enable a productive exchange of ideas on state-of-the-art technologies, recent developments, and innovative applications,” said workshop chair Ruben Glatt.

SIS play a crucial role in a wide range of LLNL research areas, such as reconstruction of computed tomography data, high-precision control of adaptive optics, and detection of radiation spectra. “Those of us working in the SIS area are dispersed across many programs in the Lab. The workshop provides a venue where we can describe ongoing problems, share new insights, and learn about new techniques,” explained CASIS acting director Dave Chambers.

After two years of virtual events, the workshop’s return to an in-person format at the Livermore Valley Open Campus was well received. Glatt noted, “Despite the heat wave, around 60 people attended throughout the day, which was a very good size for networking and talking to colleagues from different areas. Participants were happy to see others’ work and take the opportunity to discuss more details during the breaks.” Sponsored by the Engineering Directorate, CASIS hosts seminars and networking activities in addition to the workshop. (Photo at left: Chambers; Glatt; Anup Singh, LLNL’s associate director of Engineering; and Alysia Nieto, CASIS workshop co-organizer.)

collage of six national lab logos (Argonne, Lawarence Berkeley, Lawrence Livermore, Los Alamos, Sandia, and Oak Ridge) on a faded background of supercomputer racks

Workshop Series Focuses on AI for Science and Security

This summer, six Department of Energy (DOE) labs hosted three workshops aimed at defining the AI and related high-performance computing (HPC) areas needed to create AI capabilities and applications for science, engineering, and national security. These Advanced Research Directions in AI for Science and Security (AI4SS) workshops were organized around six multidisciplinary approaches relevant to the DOE and National Nuclear Security Administration’s mission space.

To increase engagement with academic and underrepresented groups, the events were held at minority-serving institutions and historically Black colleges and universities. Government, academic, and industry partners were invited, and agendas included plenary talks and breakout sessions for scientific domains such as biomedicine, physical sciences, agriculture, energy infrastructure, and engineered systems. Later this year, the group will publish a research and development roadmap along with a report detailing plans for data generation, data integration, and data management to drive AI models, the integration of AI with modeling and simulation, and the integration of AI with DOE user facilities.

  • June 14–16, Tennessee State University. Themes: AI surrogates for HPC; AI for prediction and control of complex engineered systems. Co-organized by Oak Ridge National Laboratory and Los Alamos National Laboratory.
  • July 26–28, University of California, Davis. Themes: AI for advanced property inference and inverse design; foundational AI for scientific knowledge discovery, integration, and synthesis (agenda). Co-organized by Lawrence Berkeley National Laboratory and LLNL.

August 16–18, Bowie State University. Themes: AI and robotics for autonomous discovery; AI for programming and software engineering. Co-organized by Argonne National Laboratory and Sandia National Laboratory.

three panels showing steps in a progression through problem formation (output type and target domain), model development (data curation, evaluation and diagnosis, features and models), and real-world performance (model generalization, success)

Machine Learning “How To” for Materials Scientists

In a new Chemistry of Materials paper, LLNL researchers Piyush Karande, Brian Gallagher, and Yong Han provide a practical guide for using ML effectively in materials science problems. “The use of ML techniques has grown tremendously in domain sciences,” says lead author Karande. “But it can be difficult for subject matter experts to navigate through the sea of ML modeling approaches and training techniques. Our goal was to illustrate the various paths a researcher can take and answer the ‘why’ behind every decision on the way.”

Through a real-world case study of TATB (2,4,6-triamino-1,3,5-trinitrobenzene) sample evaluation, the team describes how to leverage ML for scientific data. The paper focuses on four aspects of the ML pipeline:

  1. Problem formation: Which ML technique(s) to use depends on the type or desired precision of the output as well as how model training will occur.
  2. Data curation: ML model performance often depends on the size and diversity of data available for training.
  3. Feature representation and model selection: Domain knowledge helps determine an appropriate—and successful—combination of data representations and modeling approaches.
  4. Model generalizability and real-world performance: Models that can generalize well to unseen data are likely to produce more accurate predictions when used in practice.

“In the process of applying ML approaches to solving the problem of predicting the uniaxial compressive strength of TATB, we gained a lot of experience in investigating various aspects of the ML pipeline and wading through the space of decisions,” Karande continues. “This paper shares our experience with the wider scientific community and help them take a domain knowledge–guided approach to their problems.” The work highlights the Lab’s multidisciplinary collaborations, in this case joining expertise from the Engineering, Computing, and Physical and Life Sciences directorates. (Image at left: Schematic representation of the steps in a systematic evaluation of a data-driven approach to solving a scientific problem. The schema is divided into three main blocks interacting with one another at a high level and their internal components guiding the flow of decisions.)

4x3 grid of earthquake aftermath photos of buildings highlighted in certain areas with rainbow colors

Recent Research

portrait of Tina next to the seminar series icon

Virtual Seminar Explores Machine Learning in High-Stakes Scenarios

The DSI’s August 22 seminar featured Tina Eliassi-Rad, professor of Computer Science at Northeastern University and faculty member at Northeastern's Network Science Institute and the Institute for Experiential AI. Her talk, “Just Machine Learning,” explored ML in high-stakes situations such as criminal justice, law enforcement, employment decisions, credit scoring, health care, public eligibility assessment, and school assignments. She discussed the popular task of risk assessment and impossibility results for group fairness, where one cannot simultaneously satisfy desirable probabilistic measures of fairness, and considered how ML can be used to generate aspirational data. A recording of the talk will be posted to a YouTube playlist.

Eliassi-Rad also serves as an external faculty member at the Santa Fe Institute and the Vermont Complex Systems Center. Her research lies at the intersection of data mining, ML, and network science, and her work has been applied to personalized search on the web, statistical indices of large-scale scientific simulation data, fraud detection, mobile ad targeting, cyber situational awareness, drug discovery, democracy and online discourse, and ethics in ML.

portrait of Emilia next to the highights icon

Meet an LLNL Data Scientist

With Biomedical Engineering degrees from Duke University, Emilia Grzesiak contributes to LLNL’s COVID-19 research by comparing simulations to bioassays that measure the binding affinity between the virus’s variants and antibody candidates. She also builds analysis and visualization tools to identify antibody designs that could be useful drug candidates. “I’m excited to help with therapeutic design decision-making and speed up the drug-design process,” she says. Grzesiak joined LLNL’s Global Security Computing Applications Division in 2021 after interning with DSSI the previous year. Now, as a first-time mentor, she states, “I’m figuring out when to let go of the reins and when to step in more. Establishing trust and open communication is important, as making those judgment calls becomes easier when you understand how your intern approaches problems and what kind of advice they respond best to.” Grzesiak recently shared her career journey and research highlights during a DSI-sponsored panel discussion and a seminar for the DSSI’s class of 2022.