Volume 38

July 16, 2024

DSI logo cropped FY22

Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across LLNL. Our newsletter is a compendium of breaking news, the latest research, outreach efforts, and more. Past volumes of our newsletter are available online.

side-by-side portraits of Mike and Barry

Congratulations to Distinguished Members of Technical Staff

Former DSI director Michael Goldman and DSI Council member Barry Chen were recently named as Distinguished Members of Technical Staff (DMTS) for their extraordinary scientific and technical contributions to the Lab. DMTS is the highest technical staff level achievable by a Livermore scientist or engineer.

Goldman founded the DSI in 2018 and was its first director until 2023, establishing a strong, forward-leaning vision and consistent funding stream. He has been a technical lead on several Global Security projects and programs in computer vision, imagery, machine learning (ML), large language models, and adversarial artificial intelligence (AI). He currently serves as the associate program leader for the Advanced Exploitation program, which brings ML to many of the Lab’s national security customers. Goldman consistently leverages his passion for workforce development to establish new pipelines through university partnerships, build mentoring and upskilling programs for existing staff, and create opportunities to strengthen the Lab’s data science community. He received his M.S. from UC Davis in electrical and computer engineering.

Chen is an ML researcher with over 19 years of experience in developing and applying novel algorithms to a wide variety of projects and applications at LLNL predominantly in the Global Security Directorate. With expertise in neural networks, random forests, and probabilistic graphical models, Chen has helped advance AI to enhance threat detection, prediction, and analysis capabilities. He currently leads several research teams developing new neural network learning algorithms with scientific and security applications. Chen holds a PhD from UC Berkeley and has been a member of the DSI Council since its inception.

“Mike and Barry are changemakers and leaders in the data science field, both in research and in their engagement with the broader community. Their research and contributions over the years have significantly elevated data science, internally and externally, and the DSI would not exist without their hard work and dedication to our field,” says DSI deputy director Cindy Gonzales.


three scientists with different colored abstract backgrounds and the words water, fire, and air at the top

Video: The Surprising Places You’ll Find Machine Learning

LLNL data scientists are applying ML to real-world applications on multiple scales. A new DSI-funded video highlights innovative research at the nanoscale (developing better water treatment methods by predicting the behavior of water molecules under the extremely confined conditions of nanotubes); mesoscale (determining the likelihood and location of a dangerous wildfire-causing phenomenon called arcing); and macroscale (simulating methods for increasing efficiency for the removal of carbon from the atmosphere). Watch as researchers Anh Pham, Indra Chakraborty, and Youngsoo Choi explain why problems like water filtration, wildfires, and carbon capture are becoming more solvable thanks to groundbreaking data science methodologies on some of the world’s fastest computers.


three people stand in front of a window and brick wall

AI Roundtable with Silicon Valley Leadership Group

DSI deputy director Cindy Gonzales and AI Innovation Incubator director Brian Spears recently participated in an AI roundtable discussion with California state senator Anna Caballero and the Silicon Valley Leadership Group (SVLG). Among the discussion topics was the need to educate legislators and their staffers on AI/ML concepts and technologies, which would help inform policy decisions. Another need the group recognized was education/retraining programs for California residents with nontraditional backgrounds, which could present an opportunity for the DSI’s Data Science Challenge (DSC) program.

“The DSC is a perfect fit for this model. As we plan to expand our training programs next fiscal year, we could feasibly offer DSC to someone with little to no background in the field and help them reach to a level of comfortability in working with data and using well-known tools,” Gonzales notes. “The potential is endless, and partnering with local government officials to understand the needs and how we can help is a part of our larger collective responsibility.”


plot showing pink dots and green squares climbing in accuracy until reaching the 4:1 ratio where the pink line drops to the bottom while the green continues to climb

Finding the Sweet Spot in AI Model Optimization

It’s almost an understatement to say that expectations are high for AI systems. For example, AI models need to accurately detect anomalous data compared to the training dataset, and they need to generalize to previously unseen (out-of-distribution, or OOD) data. Optimizing models for one of these goals often comes at the expense of the other. However, a Livermore-led research team has figured out how to balance this tradeoff by leveraging model anchoring.

The team’s paper, “The Double-Edged Sword of AI Safety: Balancing Anomaly Detection and OOD Generalization via Model Anchoring,” was accepted to the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The authors are Vivek Narayanaswamy and Jayaraman Thiagarajan alongside former LLNL computer scientist Rushil Anirudh, now at Amazon.

Anchoring is a methodology for training deep neural networks that re-parameterizes the input into anchor–residual pairs. Here the anchor is drawn at random from the training dataset itself and the residual is the difference between the input and the anchor. The team further modified this method with perturbed anchoring (PA) and residual regularization (RR) techniques. PA corrupts the anchor without modifying the residual, while RR masks the anchor after computing the residual. Controlling individual components of the pairs enables finer control of the model’s dependency on data distributions.

As shown in the image at left, the team’s two-pronged anchoring method improves both anomaly detection and OOD generalization when PA and RR are used judiciously. The x-axis shows the ratio of how many times each method is invoked during training, while the y-axis tracks accuracy (left) and 100-FPR (right) scores against standard benchmarking datasets. Using only RR (left-hand pink area) does not provide the level of optimizations as with using both PA and RR (light green area). As the PA:RR ratio climbs, however, detection accuracy dramatically drops off and only generalization accuracy continues to improve (right-hand pink area).

“We found that we can selectively improve anomaly detection and generalization via a novel anchored training mechanism without exposing models to additional outlier data or incurring additional computational cost,” explains Narayanaswamy. “There’s a nontrivial relationship between these objectives, which are crucial to AI interpretability and trustworthiness.”


attendees talking at tables and in the foreground

Data Science Consultants Spread the Word

Approximately 60 LLNL employees turned out for the Data Science Institute Consulting Service’s (DSICS) “mini road shows”—events designed to raise awareness about DSI-funded consulting services, solicit projects, and recruit new consultants. The road shows described the consulting process from different perspectives, including how projects can request help and how individuals can become consultants. To cap the presentations, consultant Mike Boyle shared his story of quickly developing a database of molecular weights to enable an analysis by the Forensic Science Center (see story in newsletter volume 35). 

Giselle Fernandez, staff scientist and a DSICS deputy director, notes, “Our recent road shows promoting our consulting services were successful, with enthusiastic attendance that exceeded our expectations and generated strong interest from potential consultants and consultees. A highlight was Mike’s testimony, whose exceptional contributions as a consultant earned him a Global Security Bronze Award, exemplifying the powerful collaboration between domain science and data science.”

Attendees who expressed interest in becoming consultants have already had the opportunity to do so. Tyler Alcorn, a health physicist with a background and continued interest in data science, recently joined a consulting project on analyzing scintillator data. “I’m thankful for the opportunity to work with DSICS as it allows me to collaborate with other programs and scientists, and to be a part of interesting projects outside my current job scope. DSICS also gives me a chance to flex my data science skillset and to be mentored by more senior data scientists. It's been a very positive experience!” he says.

The road show will likely become a recurring event to help identify new LLNL projects in need of data science expertise and to inspire future consultants. DSICS leaders are also planning a follow-up event to train new consultants.


Ruby supercomputer overlaid with a network pattern and mutated cell

Drug Design Milestone Relies on AI and HPC

In a substantial milestone for supercomputing-aided drug design, LLNL and BridgeBio Oncology Therapeutics announced that clinical trials have begun for a first-in-class medication that targets specific genetic mutations implicated in many types of cancer. The development of the new drug—BBO-8520—is the result of collaboration among LLNL, BridgeBio, and the National Cancer Institute (NCI)’s RAS Initiative at the Frederick National Laboratory for Cancer Research (FNL). In a first for a DOE national laboratory, the drug was discovered through DOE’s leadership in high-performance computing (HPC) for mission applications, combined with an LLNL-developed platform integrating AI and traditional physics-based drug discovery, and effective partnership with the FNL and NCI.

The drug candidate has shown promise in laboratory testing for inhibiting mutations of KRAS proteins linked to about 30% of all cancers—targets long considered “undruggable” by cancer researchers. The achievement provides hope for broad impact on cancer patients whose tumors harbor susceptible KRAS mutations. This indicates that a computational/AI drug design approach could unlock new insights into the disease and the future of cancer treatment.

In addition to advancing cancer research, LLNL representatives said the milestone is validation that integrating supercomputing with AI- and physics-based computational platforms has the potential to further accelerate small-molecule drug discovery and equip DOE, the National Nuclear Security Administration, and LLNL with the ability to quickly and routinely develop medical countermeasures for disease or future pandemics, aligning with broader mission focus areas in biosecurity, bioresilience, and national security.


collage of networking and speaking activities at the workshop

Workshop Spotlights Signal and Image Sciences

Nearly 150 members of the signal and image sciences community recently came together to discuss the latest advances in the field and connect with colleagues, friends, and potential collaborators at the 28th annual Center for Advanced Signal and Image Sciences (CASIS) workshop. The event featured more than 50 technical contributions across six workshop tracks and a parallel tutorials session, including 40 talks and 23 posters that helped encourage discussions. This year’s topics included remote and noninvasive sensing, nondestructive evaluation, signal and image sciences at the National Ignition Facility (NIF), AI/ML, quantum sensing, and quantum computing and energy applications. Signal and image sciences enable efficient and accurate processing, generation, analysis, and interpretation of signals and images in fields such as telecommunications, medical imaging, computer vision, and more. At the Lab, they are the backbone of NIF diagnostics, nondestructive evaluation and characterization, advanced sensing, AI/ML, and various other critical mission roles. CASIS and the DSI recently co-sponsored an AI safety workshop and seminar (see next story).


Yoshua Bengio speaks from a lectern in a conference room

Seminar Video: How Could We Design Aligned and Provably Safe AI?

On April 19, Dr. Yoshua Bengio presented “How Could We Design Aligned and Provably Safe AI?” His seminar was co-sponsored by the DSI and CASIS. A Turning Award winner, Bengio is recognized as one of the world’s leading AI experts, known for his pioneering work in deep learning. He is a full professor at the University of Montreal, and the founder and scientific director of the Mila–Quebec AI Institute. In 2022, Bengio became the most-cited computer scientist in the world.

Bengio discussed his AI research program based on risk evaluation, Bayesian priors, and uncertainty, as well as how amortized inference with large neural networks could be made to estimate the required quantities. A video of his talk is now available on YouTube. Seminar speakers’ biographies and abstracts are available on the seminar series web page, and many recordings are posted to the YouTube playlist. To become or recommend a speaker for a future seminar, or to request a WebEx link for an upcoming seminar if you’re outside LLNL, contact DSI-Seminars [at] llnl.gov (DSI-Seminars[at]llnl[dot]gov).


two square plots with accuracy shown on a cream to blue spectrum; left plot is cream colored with dark blue square highlighted in top left corner; right plot is all dark blue

Recent Research


gold circle on black background with text “R&D 100 awards 2023 winners”

Award-Winning Data Science Solutions

LLNL’s HPC and data science capabilities play a significant role in international science research and innovation, and Lab researchers have won 10 R&D 100 Awards in the Software–Services category in the past decade. The latest issue of Science & Technology Review features several award-winning projects, including ZFP and CANDLE.

ZFP introduces a new method of compressing large datasets while maintaining high-speed, on-demand access to the compressed data for both reading and writing applications—a capability not found in any other compressor. Researchers can continue to work with the data in real time while it remains compressed, whether they are analyzing it or producing visualizations. ZFP is downloaded more than 1.5 million times per year by users from across the DOE and other government and nongovernment agencies, and its scientific applications include geographic information systems, climate science, seismology, and tornado simulations, among others.

An early adopter of using ML for scientific applications, the Cancer Distributed Learning Environment (CANDLE) provides ML capabilities for applications related to cancer research. In particular, CANDLE enables capabilities for extracting key information and finding relationships within large, disconnected datasets to help solve cancer-specific drug challenges. CANDLE is a collaboration among Lawrence Livermore, Los Alamos, Oak Ridge, and Argonne national laboratories; the Frederick National Laboratory for Cancer Research; the National Institutes of Health; and the National Cancer Institute.


WiDS Livermore logo

Watch the WiDS Livermore Video Playlist

If you missed our Women in Data Science (WiDS) datathon in February or the WiDS Livermore conference in March, videos of the technical talks, panel discussions, and a dataset tutorial are posted to the playlist on YouTube. There are 10 new videos in the playlist, which also includes highlights from our 2022 and 2023 events.

The global WiDS Conference aims to inspire and educate data scientists worldwide, regardless of gender, and to support women in the field. WiDS Livermore is independently organized by LLNL to be part of the mission to increase participation of women in data science and to feature outstanding women doing outstanding work. Learn more on the WiDS Livermore web page.


portrait of Aneesha next to the highlights icon

Meet an LLNL Data Scientist

Aneesha Devulapally is a data scientist in the Global Security Computing Applications Division. She is particularly interested in the interdisciplinary field of bioinformatics and applications of ML in systems biology. As a part of LLNL’s GUIDE program, she develops ML frameworks and pipelines and performs data analysis on antibody–antigen complexes. “Data science brings subject matter experts from various domains together to collaborate in solving complex problems, especially at Livermore where we’re working towards solving problems for the betterment of humankind,” Devulapally says. She joined LLNL after earning her master’s in computer science with a specialization in data science from the University of Texas at Dallas after earning undergraduate and graduate degrees at IIIT Bangalore in India. Devulapally recently served on the organizing committee for Livermore’s 2024 Women in Data Science event. “The enthusiasm of the participants and their interest in learning data science concepts made the experience incredibly rewarding,” she says.