Welcome to the DSI Newsletter

Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more.
3x5 video chat screens on a green background

Livermore WiDS Provides Forum for Women in Data Science

LLNL celebrated the 2022 global Women in Data Science (WiDS) conference on March 7 with its fifth annual regional event, featuring workshops, mentoring sessions, and a discussion with LLNL Director Kim Budil, the first woman to hold that role. The all-day event attracted women data scientists and students inside and outside the Lab, who gathered to share coding tips and swap stories of their experiences in growing their careers. Attendees tuned in to view presentations by LLNL women data scientists, engage in breakout sessions, and view a livestream of the global WiDS conference hosted by...

Read Volume 16
WiDS Livermore logo of green ones and zeros overlaid on silhouettes of faces in profile

Register for Virtual WiDS Livermore on March 7

The annual Women in Data Science (WiDS) conference returns on Monday, March 7, which is International Women’s Day. LLNL will again host a regional event in conjunction with the worldwide conference. The all-day WiDS Livermore event will be entirely virtual and free. Everyone is welcome to attend. Registration is open until February 27.

Sponsored by the DSI and LLNL’s Office of Strategic Diversity and Inclusion Programs, WiDS Livermore will include a livestream of the Stanford conference and networking opportunities. Returning this year is the popular “speed mentoring” session, where mentees...

Read Volume 15
 3x2 grid of circular portraits of the 6 council members

Happy New Year from the DSI Council

The start of a new year is an exciting time because of the opportunity to appraise our data science community’s myriad accomplishments as well as preview upcoming projects and events. Like other areas of LLNL, the DSI has adapted to evolving pandemic restrictions and workplace policies to prioritize safety.

We were pleased to sponsor and contribute to multiple activities in modified or virtual formats: our monthly seminar series, the fourth annual Women in Data Science (WiDS) Livermore event, a new career panel series inspired by WiDS, the Machine Learning for Industry Forum (ML4I), and a...

Read Volume 14
semi-cyclical diagram showing training of a recurrent neural network leading to a sample, Gaussian processes, extraction, and combination of results

Five Papers Accepted to NeurIPS 2021

The annual Conference on Neural Information Processing Systems (NeurIPS) returns December 6–14. LLNL work has been accepted at the prestigious machine learning conference in past years; in 2021 researchers have five accepted papers. Preprints are linked here.

Read Volume 13
Four images of celebrities’ faces with differences between bald and not bald, smiling and not smiling, which show results of the DISC inversion method

Counterfactual Generators for Deep Models

LLNL’s research into machine learning (ML) interpretability continues with an investigation of counterfactual explanations—those that synthesize a hypothetical result based on small, interpretable changes to a given query image. Existing approaches rely extensively on pre-trained generative models or access to training data to create plausible counterfactuals that support users’ hypotheses.

LLNL’s Jayaraman Thiagarajan and colleagues from Arizona State University, Stanford University, and IBM Research have developed a technique called DISC—Deep Inversion for Synthesizing Counterfactuals—that...

Read Volume 12
three rows of thoracic CT images with slight variations

4D Computed Tomography Reconstructions

Computed tomography (CT) is a type of x-ray imaging technology with a range of applications for clinical diagnosis, non-destructive evaluation in industry, baggage inspection, and cargo screening. CT scanners capture a sequence of angles around an object. Reconstruction algorithms then estimate the scene from these measured images. 2D and 3D CT imaging of static objects are well-studied problems with theoretical and practical algorithms. However, reconstruction of scene changes and measurements over time, known as dynamic 4D CT, can yield spatiotemporal ambiguities. (Image at left: 4D CT of...

Read Volume 11
4x2 images showing point spread functions

New Bayesian ML Code Released

A new Bayesian machine learning (ML) code, MuyGPyS (pronounced my-jee-pies) has been developed as part of a Laboratory Directed Research and Development Strategic Initiative to address needs for native uncertainty quantification in ML predictions, learning with bounded training times, support for combining ML and model-based Bayesian inference frameworks, and extending the data sizes allowable in Gaussian process (GP) models.

The MuyGPyS code and algorithm offer best-in-class performance on community GP regression benchmarks, as well as image classification performance competitive with or...

Read Volume 10
diagram of workflow including synthesis, analysis, classification, and performance property

Deep Learning for Materials Discovery

Deep Learning (DL) models are proving useful for a number of materials science applications including materials discovery, microstructure analysis, and property predictions. In a recent paper in ACS Omega, LLNL researchers propose a unified framework that leverages the predictive uncertainty from deep neural networks to answer challenging questions materials scientists usually encounter in machine learning (ML)–based material application workflows.

Specifically, the team demonstrates that predictive uncertainty from uncertainty-aware DL approaches (particularly deep ensembles) can be used to...

Read Volume 9
abstract art showing molecules and crystalline density

Research in Feedstock Optimization

A long-held goal by chemists across many industries, including energy, pharmaceutics, energetics, food additives, and organic semiconductors, is to imagine the chemical structure of a new molecule and predict how it will function for a desired application. In practice, this vision is difficult to realize, often requiring extensive laboratory works to be able to synthesize, isolate, purify, and characterize newly designed molecules to obtain the desired information.

A team of LLNL materials and computer scientists have brought this vision to fruition for energetic molecules by creating machine...

Read Volume 8
diagram of pruned neural network

New Research in Machine Learning Robustness

LLNL postdoctoral researcher James Diffenderfer and computer scientist Bhavya Kailkhura are co-authors on a paper that offers a novel and unconventional way to train deep neural networks (DNNs). The LLNL team shows both empirically and theoretically that it is possible to learn highly accurate NNs simply by compressing (i.e., pruning and binarizing) randomized NNs without ever updating the weights. This is in sharp contrast to prevailing weight-training paradigm—i.e., iteratively learning the values of the weights by stochastic gradient descent. In this process, Diffenderfer and Kailkhura...

Read Volume 7
Example of sampling an expression from the team’s recurrent neural network, which is used to emit a distribution over tractable mathematical expressions.

Spotlight: New Research Ranked Among Top AI Papers

Symbolic regression is the ML task of discovering tractable mathematical expressions to fit a dataset, yet the AI community has not fully explored deep learning approaches that explore this challenging space. In a paper accepted as an Oral Presentation at the upcoming International Conference on Learning Representations (ICLR), an LLNL research team proposes a framework that leverages deep reinforcement learning for symbolic regression via a simple idea—use a large model (neural network) to search the space of small models (expressions). With an Oral acceptance rate of only 1.5%, the team’s...

Read Volume 6
coronavirus molecule on teal background

Spotlight: Research Team Recognized for COVID-19 Model

A machine learning model developed by a team of LLNL scientists to aid in COVID-19 drug discovery efforts was a finalist for the Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research. Using the Sierra supercomputer, the team created a more accurate and efficient generative model to enable COVID-19 researchers to produce novel compounds that could possibly treat the disease.

The team trained the model on an unprecedented 1.6 billion small molecule compounds and 1 million additional promising compounds for COVID-19, which reduced the model training time from 1 day to...

Read Volume 5
portraits of Jay and Timo side by side

Spotlight: Special Recognition for Researchers

Since joining LLNL as a postdoc in 2013, Jayaraman Thiagarajan’s research has grown to include multiple related fields. This exploration ranges from deep learning–based graph analysis to machine learning (ML) and artificial intelligence (AI) solutions for computer vision, healthcare, language modeling, and scientific applications. Thiagarajan recently received an LLNL Director’s Early Career Recognition Award for his authoritative work and key contributions. He earned a PhD in Electrical Engineering from Arizona State University.

Peer-Timo Bremer has accepted the role as LLNL’s Point of...

Read Volume 4
diagram of TPL system including data curation and DL model

3D Printing Meets Machine Learning

Two-photon lithography (TPL)—a widely used 3D nanoprinting technique that uses laser light to create 3D objects—has shown promise in research applications but has yet to achieve widespread industry acceptance due to limitations on large-scale part production and time-intensive setup.

LLNL scientists and collaborators are using machine learning (ML) to address two key barriers to industrialization of TPL: monitoring of part quality during printing and determining the right light dosage for a given material. The team developed an ML algorithm trained on thousands of video images of TPL builds...

Read Volume 3
molecular structure in red, blue, and silver

Spotlight: Mentoring the Next Generation

For the second year in a row, the DSI teamed up with the University of California at Merced to offer a two-week Data Science Challenge at the beginning of June. The intensive program provided mentors, assignments, virtual tours, and seminars. Under the direction of LLNL’s Marisol Gamboa and UC Merced’s Suzanne Sindi, 21 students worked from their homes through video conferencing and chat programs to develop machine learning (ML) models capable of differentiating potentially explosive materials from other types of molecules.

The UC Merced students were divided into five teams, each led by a...

Read Volume 2
5x2 grid of circular grayscale images

Spotlight: Materials Science Meets AI

LLNL scientists have taken a step forward in the design of future materials with improved performance by analyzing its microstructure using AI. The work recently appeared in the journal Computational Materials Science.

Technological progress in materials science applications spanning electronic, biomedical, alternate energy, electrolyte, catalyst design, and beyond is often hindered by a lack of understanding of complex relationships between the underlying material microstructure and device performance. But AI-driven data analytics provide opportunities that can accelerate materials design...

Read Volume 1