AI/ML Research Spotlight

Lawrence Livermore National Laboratory is at the forefront of artificial intelligence (AI) and machine learning (ML) research for national security applications. LLNL researchers are investigating fundamental issues around the safety, security, and performance of these technologies, while simultaneously developing award-winning demonstrations of AI/ML-accelerated scientific discovery in topics ranging from additive manufacturing and inertial confinement fusion to cancer biology and antibody design.

LLNL combines state-of-the-art AI/ML technology with world-class experimental facilities, high-performance computing systems, and large-scale datasets to advance science and enhance national security.

ai3 [at] (Contact us to learn more)

LLNL’s AI researchers are prolific, with innovations and results regularly published in top journals and acknowledged at premier international conferences. Below is a selection of high-impact publications. (Browse a longer list at Scopus.)


AI for Mission-Critical Science

Mariscal D.A., Djordjević B.Z., Anirudh R., et al. (2023). A Flexible Proton Beam Imaging Energy Spectrometer (PROBIES) for High Repetition Rate or Single-Shot high Energy Density (HED) Experiments. Proceedings of the 24th Topical Conference on High-Temperature Plasma Diagnostics. []

The PROBIES diagnostic is a new, highly flexible, imaging and energy spectrometer designed for laser-accelerated protons. The diagnostic can detect low-mode spatial variations in the proton beam profile while resolving multiple energies on a single detector or more. When a radiochromic film stack is employed for “single-shot mode,” the energy resolution of the stack can be greatly increased while reducing the need for large numbers of films; for example, a recently deployed version allowed for 180 unique energy measurements spanning ∼3 to 75 MeV with <0.4 MeV resolution using just 20 films vs 180 for a comparable traditional film and filter stack. When utilized with a scintillator, the diagnostic can be run in high-rep-rate (>Hz rate) mode to recover nine proton energy bins. We also demonstrate a deep learning-based method to analyze data from synthetic PROBIES images with greater than 95% accuracy on sub-millisecond timescales and retrained with experimental data to analyze real-world images on sub-millisecond time-scales with comparable accuracy.

Liu J., Anirudh R., Thiagarajan J.J., et al. (2023). DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction. IEEE International Conference on Computer Vision (ICCV). []

Limited-Angle Computed Tomography (LACT) is a non-destructive 3D imaging technique used in a variety of applications ranging from security to medicine. The limited angle coverage in LACT is often a dominant source of severe artifacts in the reconstructed images, making it a challenging imaging inverse problem. Diffusion models are a recent class of deep generative models for synthesizing realistic images using image denoisers. In this work, we present DOLCE as the first framework for integrating conditionally-trained diffusion models and explicit physical measurement models for solving imaging inverse problems. DOLCE achieves the SOTA performance in highly ill-posed LACT by alternating between the data-fidelity and sampling updates of a diffusion model conditioned on the transformed sinogram. We show through extensive experimentation that unlike existing methods, DOLCE can synthesize high-quality and structurally coherent 3D volumes by using only 2D conditionally pre-trained diffusion models. We further show on several challenging real LACT datasets that the same pre-trained DOLCE model achieves the SOTA performance on drastically different types of images.

Gaffney J.A., Humbird K.D., Jones M., et al. (2023). Data-Driven Prediction of Scaling and Ignition of Inertial Confinement Fusion Experiments. 65th Annual Meeting of the APS Division of Plasma Physics. []

Recent advances in inertial confinement fusion (ICF), including ignition and energy gain, are enabled by a close coupling between experiments and high-fidelity simulations. High-fidelity radiation-hydrodynamics simulations are used to specify target and laser parameters, interpret data, and to explore novel designs for future experiments. These tasks are approached by a combination of post-shot analysis, in which simulations are calibrated to existing NIF data through a range of tuning parameters, and pre-shot simulation where the calibrated model is applied to future experiments. Quantifying the uncertainty in post-shot and pre-shot analyses is critical in measuring the state of our understanding of given experimental platform, assigning confidence to predictions for a new design, and reliably comparing physics hypotheses; this is, however, a significant challenge because NIF experiments are sparse, incompletely diagnosed, and subject to unknown random shot-to-shot variations. We have developed a data-driven approach to uncertainty quantification for post-shot and pre-shot analysis that combines large ensembles of simulations with Bayesian inference and deep learning. The approach builds a predictive statistical model for performance parameters that is jointly informed by data from multiple NIF shots and the simulations. The prediction distribution captures experimental uncertainty, expert priors, design changes and shot-to-shot variations to provide a new capability to make uncertain performance predictions for experimental designs before they are performed at NIF.

Kustowski B., Gaffney J.A., Spears B.K., et al. (2022). Suppressing Simulation Bias in Multi-Modal Data using Transfer Learning. Machine Learning: Science and Technology. []

Many problems in science and engineering require making predictions based on few observations. To build a robust predictive model, these sparse data may need to be augmented with simulated data, especially when the design space is multi-dimensional. Simulations, however, often suffer from an inherent bias. Estimation of this bias may be poorly constrained not only because of data sparsity, but also because traditional predictive models fit only one type of observed outputs, such as scalars or images, instead of all available output data modalities, which might have been acquired and simulated at great cost. To break this limitation and open up the path for multi-modal calibration, we propose to combine a novel, transfer learning technique for suppressing the bias with recent developments in deep learning, which allow building predictive models with multi-modal outputs. First, we train an initial neural network model on simulated data to learn important correlations between different output modalities and between simulation inputs and outputs. Then, the model is partially retrained, or transfer learned, to fit the experiments; a method that has never been implemented in this type of architecture. Using fewer than 10 inertial confinement fusion experiments for training, transfer learning systematically improves the simulation predictions while a simple output calibration, which we design as a baseline, makes the predictions worse. We also offer extensive cross-validation with real and carefully designed synthetic data. The method described in this paper can be applied to a wide range of problems that require transferring knowledge from simulations to the domain of experiments.

Hiszpanski A.M. (2022). Representing Polymers as Periodic Graphs with Learned Descriptors for Accurate Polymer Property Predictions. Journal of Chemical Information and Modeling. []

Accurately predicting new polymers’ properties with machine learning models apriori to synthesis has potential to significantly accelerate new polymers’ discovery and development. However, accurately and efficiently capturing polymers’ complex, periodic structures in machine learning models remains a grand challenge for the polymer cheminformatics community. Specifically, there has yet to be an ideal solution for the problems of how to capture the periodicity of polymers, as well as how to optimally develop polymer descriptors without requiring human-based feature design. In this work, we tackle these problems by utilizing a periodic polymer graph representation that accounts for polymers’ periodicity and coupling it with a message-passing neural network that leverages the power of graph deep learning to automatically learn chemically relevant polymer descriptors. Remarkably, this approach achieves state-of-the-art performance on 8 out of 10 distinct polymer property prediction tasks. These results highlight the advancement in predictive capability that is possible through learning descriptors that are specifically optimized for capturing the unique chemical structure of polymers.

Lapointe S., Guss G., Reese Z., et al. (2022). Photodiode-Based Machine Learning for Optimization of Laser Powder Bed Fusion Parameters in Complex Geometries. Additive Manufacturing. []

The quality of parts produced through laser powder bed fusion additive manufacturing can be irregular, with complex geometries sometimes exhibiting dimensional inaccuracies and defects. For optimal part quality, laser process parameters should be selected carefully prior to printing and adjusted during the print if necessary. This is challenging since approaches to control and optimize the build parameters need to take into account the part geometry, the material, and the complex physics of laser powder bed fusion. This work describes a data-driven approach using experimental diagnostics for the optimization of laser process parameters prior to printing. A training dataset is generated by collecting high speed photodiode signal data while printing simple parts containing key geometry features with various process parameter strategies. Supervised learning approaches are employed to train both a forward model and an inverse model. The forward model takes as inputs track-wise geometry features and laser parameters and outputs the photodiode signal along the scan path. The inverse model takes as inputs the geometry features and photodiode signal and predicts the laser parameters. Given the part geometry and a desired photodiode signal, the inverse model can thus determine the required laser parameters. Two test parts which contain defect-prone features are used to assess the validity of the inverse model. The use of the model leads to improved part quality (higher dimensional accuracy, reduced dross, reduced distortion) for both test geometries.

Zhong X., Gallagher G., Liu S., et al. (2022). Explainable Machine Learning in Materials Science. npj Computational Materials. []

Machine learning models are increasingly used in materials studies because of their exceptional accuracy. However, the most accurate machine learning models are usually difficult to explain. Remedies to this problem lie in explainable artificial intelligence (XAI), an emerging research field that addresses the explainability of complicated machine learning models like deep neural networks (DNNs). This article attempts to provide an entry point to XAI for materials scientists. Concepts are defined to clarify what explain means in the context of materials science. Example works are reviewed to show how XAI helps materials science research. Challenges and opportunities are also discussed.

Humbird K.D., Peterson J.L., Salmonson J., Spears B.K. (2021). Cognitive Simulation Models for Inertial Confinement Fusion: Combining Simulation and Experimental Data. Physics of Plasmas. []

The design space for inertial confinement fusion (ICF) experiments is vast, and experiments are extremely expensive. Researchers rely heavily on computer simulations to explore the design space in search of high-performing implosions. However, ICF multiphysics codes must make simplifying assumptions, and thus deviate from experimental measurements for complex implosions. For more effective design and investigation, simulations require input from past experimental data to better predict future performance. In this work, we describe a cognitive simulation method for combining simulation and experimental data into a common, predictive model. This method leverages a machine learning technique called “transfer learning,” the process of taking a model trained to solve one task, and partially retraining it on a sparse dataset to solve a different, but related task. In the context of ICF design, neural network models are trained on large simulation databases and partially retrained on experimental data, producing models that are far more accurate than simulations alone. We demonstrate improved model performance for a range of ICF experiments at the National Ignition Facility and predict the outcome of recent experiments with less than 10% error for several key observables. We discuss how the methods might be used to carry out a data-driven experimental campaign to optimize performance, illustrating the key product—models that become increasingly accurate as data are acquired.

Anirudh R., Thiagarajan J.J., Bremer P.-T., Spears B.K. (2020). Improved Surrogates in Inertial Confinement Fusion with Manifold and Cycle Consistencies. Proceeding of the National Academy of Sciences. []

Neural networks have become the method of choice in surrogate modeling because of their ability to characterize arbitrary, high-dimensional functions in a data-driven fashion. This paper advocates for the training of surrogates that are 1) consistent with the physical manifold, resulting in physically meaningful predictions, and 2) cyclically consistent with a jointly trained inverse model; i.e., backmapping predictions through the inverse results in the original input parameters. We find that these two consistencies lead to surrogates that are superior in terms of predictive performance, are more resilient to sampling artifacts, and tend to be more data efficient. Using inertial confinement fusion (ICF) as a test-bed problem, we model a one-dimensional semianalytic numerical simulator and demonstrate the effectiveness of our approach.

Antoniuk E.R., Li P., Kailkhura B., et al. (2019). A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer. International Conference for High Performance Computing, Networking, Storage and Analysis (SC19). []

Computational models can define the functional dynamics of complex systems in exceptional detail. However, many modeling studies face seemingly incommensurate requirements: to gain meaningful insights into some phenomena requires models with high resolution (microscopic) detail that must nevertheless evolve over large (macroscopic) length- and time-scales. Multiscale modeling has become increasingly important to bridge this gap. Executing complex multiscale models on current petascale computers with high levels of parallelism and heterogeneous architectures is challenging. Many distinct types of resources need to be simultaneously managed, such as GPUs and CPUs, memory size and latencies, communication bottlenecks, and filesystem bandwidth. In addition, robustness to failure of compute nodes, network, and filesystems is critical. We introduce a first-of-its-kind, massively parallel Multiscale Machine-Learned Modeling Infrastructure (MuMMI), which couples a macro scale model spanning micrometer length- and millisecond time-scales with a micro scale model employing high-fidelity molecular dynamics (MD) simulations. MuMMI is a cohesive and transferable infrastructure designed for scalability and efficient execution on heterogeneous resources. A central workflow manager simultaneously allocates GPUs and CPUs while robustly handling failures in compute nodes, communication networks, and filesystems. A hierarchical scheduler controls GPU-accelerated MD simulations and in situ analysis. We present the various MuMMI components, including the macro model, GPU-accelerated MD, in situ analysis of MD data, machine learning selection module, a highly scalable hierarchical scheduler, and detail the central workflow manager that ties these modules together. In addition, we present performance data from our runs on Sierra, in which we validated MuMMI by investigating an experimentally intractable biological system: the dynamic interaction between RAS proteins and a plasma membrane. We used up to 4000 nodes of the Sierra supercomputer, concurrently utilizing over 16,000 GPUs and 176,000 CPU cores, and running up to 36,000 different tasks. This multiscale simulation includes about 120,000 MD simulations aggregating over 200 milliseconds, which is orders of magnitude greater than comparable studies.

Anirudh R., Kim H., Thiagarajan J.J., et al. (2018). Lose the Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). []

Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180˚ view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180˚, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180˚ measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively.

Robustness & Safety

Bartoldson B.R., Diffenderfer J., Parasyris K., Kailkhura B. (2024). Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies. International Conference on Machine Learning (ICML). []

This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR-10 as an example, SOTA clean accuracy is about 100%, but SOTA robustness to ℓ∞-norm bounded perturbations barely exceeds 70%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with 20% (70%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving 74% AutoAttack accuracy (+3% gain). However, our scaling laws also predict robustness slowly grows then plateaus at 90%: dwarfing our new SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this predicted limit, we carry out a small-scale human evaluation on the AutoAttack data that fools our top-performing model. Concerningly, we estimate that human performance also plateaus near 90%, which we show to be attributable to ℓ∞-constrained attacks’ generation of invalid images not consistent with their original labels. Having characterized limiting roadblocks, we outline promising paths for future research.

Debenedetti E., Wan Z., Andriushchenko M., et al. (2024). Scaling Compute Is Not All You Need for Adversarial Robustness. International Conference on Learning Representations (ICLR). []

The last six years have witnessed significant progress in adversarially robust deep learning. As evidenced by the CIFAR10 dataset category in RobustBench benchmark, the accuracy under ℓ∞ adversarial perturbations improved from 44% in Madry et al. [25] to 71% in Peng et al. [29]. Although impressive, existing state-of-the-art is still far from satisfactory. It is further observed that best-performing models are often very large models adversarially trained by industrial labs with significant computational budgets. In this paper, we aim to understand: “how much longer can computing power drive adversarial robustness advances?” To answer this question, we derive scaling laws for adversarial robustness which can be extrapolated in the future to provide an estimate of how much cost we would need to pay to reach a desired level of robustness. We show that increasing the FLOPs needed for adversarial training does not bring as much advantage as it does for standard training in terms of performance improvements. Moreover, we find that some of the top-performing techniques are difficult to exactly reproduce, suggesting that they are not robust enough for minor changes in the training setup. Our analysis also uncovers potentially worthwhile directions to pursue in future research. Finally, we make our benchmarking framework (built on top of timm [41]) publicly available to facilitate future analysis in efficient robust deep learning.

Fridovich-Keil S., Bartoldson B., Diffenderfer J., et al. (2022). Models Out of Line: A Fourier Lens on Distribution Shift Robustness. Conference on Neural Information Processing Systems (NeurIPS). []

Improving the accuracy of deep neural networks on out-of-distribution (OOD) data is critical to an acceptance of deep learning in real world applications. It has been observed that accuracies on in-distribution (ID) versus OOD data follow a linear trend and models that outperform this baseline are exceptionally rare (and referred to as “effectively robust”). Recently, some promising approaches have been developed to improve OOD robustness: model pruning, data augmentation, and ensembling or zero-shot evaluating large pretrained models. However, there still is no clear understanding of the conditions on OOD data and model properties that are required to observe effective robustness. We approach this issue by conducting a comprehensive empirical study of diverse approaches that are known to impact OOD robustness on a broad range of natural and synthetic distribution shifts of CIFAR-10 and ImageNet. In particular, we view the “effective robustness puzzle” through a Fourier lens and ask how spectral properties of both models and OOD data correlate with OOD robustness. We find this Fourier lens offers some insight into why certain robust models, particularly those from the CLIP family, achieve OOD robustness. However, our analysis also makes clear that no known metric is consistently the best explanation of OOD robustness. Thus, to aid future research into the OOD puzzle, we address the gap in publicly-available models with effective robustness by introducing a set of pretrained CIFAR-10 models—RobustNets—with varying levels of OOD robustness.

Subramanyam R., Narayanaswamy V., Naufel M., et al. (2022). Improved StyleGAN-v2 Based Inversion for Out-of-Distribution Images. Proceedings of Machine Learning Research. []

Inverting an image onto the latent space of pre-trained generators, e.g., StyleGAN-v2, has emerged as a popular strategy to leverage strong image priors for ill-posed restoration. Several studies have showed that this approach is effective at inverting images similar to the data used for training. However, with out-of-distribution (OOD) data that the generator has not been exposed to, existing inversion techniques produce sub-optimal results. In this paper, we propose SPHInX (StyleGAN with Projection Heads for Inverting X), an approach for accurately embedding OOD images onto the StyleGAN latent space. SPHInX optimizes a style projection head using a novel training strategy that imposes a vicinal regularization in the StyleGAN latent space. To further enhance OOD inversion, SPHInX can additionally optimize a content projection head and noise variables in every layer. Our empirical studies on a suite of OOD data show that, in addition to producing higher quality reconstructions over the state-of-the-art inversion techniques, SPHInX is effective for ill-posed restoration tasks while offering semantic editing capabilities.

Wu F., Li L., Xu C., et al. (2022). COPA: Certifying Robust Policies for Offline Reinforcement Learning Against Poisoning Attacks. International Conference on Learning Representations (ICLR). []

As reinforcement learning (RL) has achieved near human-level performance in a variety of tasks, its robustness has raised great attention. While a vast body of research has explored test-time (evasion) attacks in RL and corresponding defenses, its robustness against training-time (poisoning) attacks remains largely unanswered. In this work, we focus on certifying the robustness of offline RL in the presence of poisoning attacks, where a subset of training trajectories could be arbitrarily manipulated. We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated regarding different certification criteria. Given the complex structure of RL, we propose two certification criteria: per-state action stability and cumulative reward bound. To further improve the certification, we propose new partition and aggregation protocols to train robust policies. We further prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems. We leverage COPA to certify three RL environments trained with different algorithms and conclude: (1) The proposed robust aggregation protocols such as temporal aggregation can significantly improve the certifications; (2) Our certification for both per-state action stability and cumulative reward bound are efficient and tight; (3) The certification for different training algorithms and environments are different, implying their intrinsic robustness properties. All experimental results are available at

Yang Z., Li L., Xu X., et al. (2022). On the Certified Robustness for Ensemble Models and Beyond. International Conference on Learning Representations (ICLR). []

Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.

Diffenderfer J., Bartoldson B., Chaganti S., et al. (2021). A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness. Conference on Neural Information Processing Systems (NeurIPS). []

Successful adoption of deep learning (DL) in the wild requires models to be: (1) compact, (2) accurate, and (3) robust to distributional shifts. Unfortunately, efforts towards simultaneously meeting these requirements have mostly been unsuccessful. This raises an important question: “Is the inability to create Compact, Accurate, and Robust Deep neural networks (CARDs) fundamental?” To answer this question, we perform a large-scale analysis of popular model compression techniques which uncovers several intriguing patterns. Notably, in contrast to traditional pruning approaches (e.g., fine tuning and gradual magnitude pruning), we find that “lottery ticket-style” approaches can surprisingly be used to produce CARDs, including binary-weight CARDs. Specifically, we are able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness—simply by pruning and (optionally) quantizing. Leveraging the compactness of CARDs, we develop a simple domain-adaptive test-time ensembling approach (CARD-Deck) that uses a gating module to dynamically select appropriate CARDs from the CARD-Deck based on their spectral-similarity with test samples. The proposed approach builds a “winning hand” of CARDs that establishes a new state-of-the-art [8] on CIFAR-10-C accuracies (i.e., 96.8% standard and 92.75% robust) and CIFAR-100-C accuracies (i.e., 80.6% standard and 71.3% robust) with better memory usage than non-compressed baselines (pretrained CARDs available at [8]). Finally, we provide theoretical support for our empirical findings.

Diffenderfer J., Kailkhura B. (2021). Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning a Randomly Weighted Network. International Conference on Learning Representations (ICLR). []

Recently, Frankle & Carbin (2019) demonstrated that randomly-initialized dense networks contain subnetworks that once found can be trained to reach test accuracy comparable to the trained dense network. However, finding these high performing trainable subnetworks is expensive, requiring iterative process of training and pruning weights. In this paper, we propose (and prove) a stronger Multi-Prize Lottery Ticket Hypothesis: A sufficiently over-parameterized neural network with random weights contains several subnetworks (winning tickets) that (a) have comparable accuracy to a dense target network with learned weights (prize 1), (b) do not require any further training to achieve prize 1 (prize 2), and (c) is robust to extreme forms of quantization (i.e., binary weights and/or activation) (prize 3). This provides a new paradigm for learning compact yet highly accurate binary neural networks simply by pruning and quantizing randomly weighted full precision neural networks. We also propose an algorithm for finding multi-prize tickets (MPTs) and test it by performing a series of experiments on CIFAR-10 and ImageNet datasets. Empirical results indicate that as models grow deeper and wider, multi-prize tickets start to reach similar (and sometimes even higher) test accuracy compared to their significantly larger and full-precision counterparts that have been weight-trained. Without ever updating the weight values, our MPTs-1/32 not only set new binary weight network state-of-the-art (SOTA) Top-1 accuracy—94.8% on CIFAR-10 and 74.03% on ImageNet—but also outperform their full-precision counterparts by 1.78% and 0.76%, respectively. Further, our MPT-1/1 achieves SOTA Top-1 accuracy (91.9%) for binary neural networks on CIFAR-10. Code and pre-trained models are available at:

Long Y., Wang B., Yang Z., et al. (2021). G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators. Conference on Neural Information Processing Systems (NeurIPS). []

Recent advances in machine learning have largely benefited from the massive accessible training data. However, large-scale data sharing has raised great privacy concerns. In this work, we propose a novel privacy-preserving data Generative model based on the PATE framework (G-PATE), aiming to train a scalable differentially private data generator that preserves high generated data utility. Our approach leverages generative adversarial nets to generate data, combined with private aggregation among different discriminators to ensure strong privacy guarantees. Compared to existing approaches, G-PATE significantly improves the use of privacy budgets. In particular, we train a student data generator with an ensemble of teacher discriminators and propose a novel private gradient aggregation mechanism to ensure differential privacy on all information that flows from teacher discriminators to the student generator. In addition, with random projection and gradient discretization, the proposed gradient aggregation mechanism is able to effectively deal with high-dimensional gradient vectors. Theoretically, we prove that G-PATE ensures differential privacy for the data generator. Empirically, we demonstrate the superiority of G-PATE over prior work through extensive experiments. We show that G-PATE is the first work being able to generate high-dimensional image data with high data utility under limited privacy budgets (ε≤1). Our code is available at

Mehra A., Kailkhura B., Chen P.Y., Hamm J. (2021). How Robust Are Randomized Smoothing Based Defenses to Data Poisoning?IEEE Conference on Computer Vision and Pattern Recognition (CVPR). []

Predictions of certifiably robust classifiers remain constant in a neighborhood of a point, making them resilient to test-time attacks with a guarantee. In this work, we present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality in achieving high certified adversarial robustness. Specifically, we propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Unlike other poisoning attacks that reduce the accuracy of the poisoned models on a small set of target points, our attack reduces the average certified radius (ACR) of an entire target class in the dataset. Moreover, our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods such as Gaussian data augmentation, MACER, and SmoothAdv that achieve high certified adversarial robustness. To make the attack harder to detect, we use clean-label poisoning points with imperceptible distortions. The effectiveness of the proposed method is evaluated by poisoning MNIST and CIFAR10 datasets and training deep neural networks using previously mentioned training methods and certifying the robustness with randomized smoothing. The ACR of the target class, for models trained on generated poison data, can be reduced by more than 30%. Moreover, the poisoned data is transferable to models trained with different training methods and models with different architectures.

Mehra A., Kailkhura B., Chen P.Y., Hamm J. (2021). Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning. Conference on Neural Information Processing Systems (NeurIPS). []

Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from that of the target. However, UDA is not always successful and several accounts of `negative transfer' have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data in the source domain. The failure of these UDA methods demonstrates their limitations at guaranteeing cross-domain generalization consistent with our lower bound. Thus, evaluating UDA methods in adversarial settings such as data poisoning provides a better sense of their robustness to data distributions unfavorable for UDA.

Pan B., Yang Y., Liang K., et al. (2020). Adversarial Mutual Information for Text Generation. International Conference on Machine Learning (ICML). []

Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.

Xu K., Shi Z., Zhang H., et al. (2020). Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond. Conference on Neural Information Processing Systems (NeurIPS). []

Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods focus on simple feedforward networks and need particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior works. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an opensource library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a provably flat optimization landscape by applying LiRPA to network parameters. Our open source library is available at

Yang J., Petersen B., Zha H., Faissol D. (2020). Single Episode Policy Transfer in Reinforcement Learning. International Conference on Learning Representations (ICLR). []

Transfer and adaptation to new unknown environmental dynamics is a key challenge for reinforcement learning (RL). An even greater challenge is performing near-optimally in a single attempt at test time, possibly without access to dense rewards, which is not addressed by current methods that require multiple experience rollouts for adaptation. To achieve single episode transfer in a family of environments with related dynamics, we propose a general algorithm that optimizes a probe and an inference model to rapidly estimate underlying latent variables of test dynamics, which are then immediately used as input to a universal control policy. This modular approach enables integration of state-of-the-art algorithms for variational inference or RL. Moreover, our approach does not require access to rewards at test time, allowing it to perform in settings where existing adaptive approaches cannot. In diverse experimental domains with a single episode test constraint, our method significantly outperforms existing adaptive approaches and shows favorable performance against baselines for robust transfer.

Mundhenk T.N., Ho D., Chen B.Y. (2018). Improvements to Context Based Self-Supervised Learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). []

We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberration as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and "linear tests" on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability. All data, models and programs are available at

Generalization & Uncertainty

Trivedi P., Heimann M., Anirudh R., et al. (2024). Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks. International Conference on Learning Representations (ICLR). []

While graph neural networks (GNNs) are widely used for node and graph representation learning tasks, the reliability of GNN uncertainty estimates under distribution shifts remains relatively under-explored. Even post-hoc calibration strategies that can improve in-distribution calibration, are not guaranteed to be effective under distribution shift. However, techniques that produce GNNs with better intrinsic uncertainty estimates are particularly valuable, as they can always be combined with post-hoc strategies later. Therefore, in this work, we propose G-ΔUQ, a novel training framework designed to improve intrinsic GNN uncertainty estimates. Our framework adapts the principle of stochastic data centering to graph data through novel graph anchoring strategies, and is able to support partially stochastic GNNs. While the prevalent wisdom is that fully stochastic networks are necessary to obtain reliable estimates, we find that the functional diversity induced by our anchoring strategies when sampling hypotheses renders this unnecessary and allows us to support G-ΔUQ on pretrained models. Indeed, through extensive evaluation under covariate, concept and graph size shifts, we show that G-ΔUQ leads to better calibrated GNNs for node and graph classification. Further, it also improves performance on safety evaluation protocols such as out-of-distribution detection and generalization gap estimation. Overall, our work provides insights into uncertainty estimation for GNNs, and demonstrates the utility of G-ΔUQ.

Chen A., Zhang Y., Jia J., et al. (2024). DeepZero: Scaling Up Zeroth-Order Optimization for Deep Model Training. International Conference on Learning Representations (ICLR). []

Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinate-wise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsity-induced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box. Codes are available at

Jain N., Chiang P.-Y., Wen Y., et al. (2024). NEFTune: Noisy Embeddings Improve Instruction Finetuning. International Conference on Learning Representations (ICLR). []

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune. Particularly, we see these improvements on the conversational abilities of the instruction model and not on traditional tasks like those on the OpenLLM Leaderboard, where performance is the same.

Thopalli K., Subramanyam R., Turaga P., Thiagarajan J.J. (2023). Target-Aware Generative Augmentations for Single-Shot Adaptation. Proceedings of Machine Learning Research. []

In this paper, we address the problem of adapting models from a source domain to a target domain, a task that has become increasingly important due to the brittle generalization of deep neural networks. While several test-time adaptation techniques have emerged, they typically rely on synthetic toolbox data augmentations in cases of limited target data availability. We consider the challenging setting of single-shot adaptation and explore the design of augmentation strategies. We argue that augmentations utilized by existing methods are insufficient to handle large distribution shifts, and hence propose a new approach SiSTA, which first fine-tunes a generative model from the source domain using a single-shot target, and then employs novel sampling strategies for curating synthetic target data. Using experiments on a variety of benchmarks, distribution shifts and image corruptions, we find that SiSTA produces significantly improved generalization over existing baselines in face attribute detection and multi-class object recognition. Furthermore, SiSTA performs competitively to models obtained by training on larger target datasets. Our codes can be accessed at

Zhang F., Song J., Bowden J., et al. (2023). Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation. Proceedings of Machine Learning Research. []

We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks.

Thiagarajan J.J., Anirudh R., Narayanaswamy V., Bremer P.-T. (2022). Single Model Uncertainty Estimation via Stochastic Data Centering. Conference on Neural Information Processing Systems (NeurIPS). []

We are interested in estimating the uncertainties of deep neural networks, which play an important role in many scientific and engineering problems. In this paper, we present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models, where the differences in predictions are a strong indicator of epistemic uncertainties. Using the neural tangent kernel (NTK), we demonstrate that this phenomenon occurs in part because the NTK is not shift-invariant. Since this is achieved via a trivial input transformation, we show that this behavior can therefore be approximated by training a single neural network—using a technique that we call ∆−UQ—that estimates uncertainty around prediction by marginalizing out the effect of the biases during inference. We show that ∆−UQ ’s uncertainty estimates are superior to many of the current methods on a variety of benchmarks—outlier rejection, calibration under distribution shift, and sequential design optimization of black box functions. Code for ∆−UQ can be accessed at

Trivedi P., Lubana E.S., Heimann M., et al. (2022). Analyzing Data-Centric Properties for Graph Contrastive Learning. Conference on Neural Information Processing Systems (NeurIPS). []

Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy these properties. This raises the question: how do graph SSL methods, such as contrastive learning (CL), work well? To systematically probe this question, we perform a generalization analysis for CL when using generic graph augmentations (GGAs), with a focus on data-centric properties. Our analysis yields formal insights into the limitations of GGAs and the necessity of task-relevant augmentations. As we empirically show, GGAs do not induce task-relevant invariances on common benchmark datasets, leading to only marginal gains over naïve, untrained baselines. Our theory motivates a synthetic data generation process that enables control over task-relevant information and boasts pre-defined optimal augmentations. This flexible benchmark helps us identify yet unrecognized limitations in advanced augmentation techniques (e.g., automated methods). Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.

Anirudh R., Thiagarajan J.J. (2022). Out of Distribution Detection with Neural Network Anchoring. Proceedings of Machine Learning Research. []

Our goal in this paper is to exploit heteroscedastic temperature scaling as a calibration strategy for out of distribution (OOD) detection. Heteroscedasticity here refers to the fact that the optimal temperature parameter for each sample can be different, as opposed to conventional approaches that use the same value for the entire distribution. To enable this, we propose a new training strategy called anchoring that can estimate appropriate temperature values for each sample, leading to state-of-the-art OOD detection performance across several benchmarks. Using NTK theory, we show that this temperature function estimate is closely linked to the epistemic uncertainty of the classifier, which explains its behavior. In contrast to some of the best-performing OOD detection approaches, our method does not require exposure to additional outlier datasets, custom calibration objectives, or model ensembling. Through empirical studies with different OOD detection settings—far OOD, near OOD, and semantically coherent OOD—we establish a highly effective OOD detection approach. Code and models can be accessed here at

Jia R., Wu F., Sun X., et al. (2021). Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification? IEEE Conference on Computer Vision and Pattern Recognition (CVPR). []

Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption. One simple idea is to use the leave-one-out error of each training point to indicate its importance. Recent work has also proposed to use the Shapley value, as it defines a unique value distribution scheme that satisfies a set of appealing properties. However, calculating Shapley values is often expensive, which limits its applicability in real-world applications at scale. Multiple heuristics to improve the scalability of calculating Shapley values have been proposed recently, with the potential risk of compromising their utility in real-world applications. How well do existing data quantification methods perform on existing workflows? How do these methods compare with each other, empirically and theoretically? Must we sacrifice scalability for the utility in these workflows when using these methods? In this paper, we conduct a novel theoretical analysis comparing the utility of different importance quantification methods, and report extensive experimental studies on existing and proposed workflows such as noisy label detection, watermark removal, data summarization, data acquisition, and domain adaptation. We show that Shapley value approximation based on a KNN surrogate over pre-trained feature embeddings obtains comparable utility with existing algorithms while achieving significant scalability improvement, often by orders of magnitude. Our theoretical analysis also justifies its advantage over the leave-one-out error. The code is available at

Bartoldson B.R., Morcos A.S., Barbu A., Erlebacher G. (2020). The Generalization-Stability Tradeoff in Neural Network Pruning. Conference on Neural Information Processing Systems (NeurIPS). []

Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this "generalization-stability tradeoff" is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal. These results explain the compatibility of pruning-based generalization improvements and the high generalization recently observed in overparameterized networks.

Kailkhura B., Thiagarajan J.J., Li, Qunwei Z., et al. (2020). A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning. Conference on Neural Information Processing Systems (NeurIPS). []

In this paper, we present a statistical mechanics framework to understand the effect of sampling properties of training data on the generalization gap of machine learning (ML) algorithms. We connect the generalization gap to the spatial properties of a sample design characterized by the pair correlation function (PCF). In particular, we express generalization gap in terms of the power spectra of the sample design and that of the function to be learned. Using this framework, we show that space-filling sample designs, such as blue noise and Poisson disk sampling, which optimize spectral properties, outperform random designs in terms of the generalization gap and characterize this gain in a closed-form. Our analysis also sheds light on design principles for constructing optimal task-agnostic sample designs that minimize the generalization gap. We corroborate our findings using regression experiments with neural networks on: a) synthetic functions, and b) a complex scientific simulator for inertial confinement fusion (ICF).

Zhang J., Kailkhura B., Han T.Y. (2020). Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning. International Conference on Machine Learning (ICML). []

This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data-efficiency and expressive power while provably maintaining the classification accuracy of the original classifier. Mix-n-Match strategies are generic in the sense that they can be used to improve the performance of any off-the-shelf calibrator. We also reveal potential issues in standard evaluation practices. Popular approaches (e.g., histogram-based expected calibration error (ECE)) may provide misleading results especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency. Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks in most of the experimental settings. Our codes are available at

Liu S., Kailkhura B., Chen P.Y., et al. (2018). Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization. Conference on Neural Information Processing Systems (NeurIPS). []

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order O(1/b), where b is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.

Interpretability & Trust

Sun L., Huang Y., Wang H., et al. (2024). TrustLLM: Trustworthiness in Large Language Models. International Conference on Machine Learning (ICML). []

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

Hong J., Duan J., Zhang C., et al. (2024). Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression. International Conference on Machine Learning (ICML). []

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SOTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SOTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs.

Thiagarajan J.J., Narayanaswamy V., Trivedi P., Anirudh R. (2024). PAGER: Accurate Failure Characterization in Deep Regression Models. International Conference on Machine Learning (ICML). []

Safe deployment of AI models requires proactive detection of failures to prevent costly errors. To this end, we study the important problem of detecting failures in deep regression models. Existing approaches rely on epistemic uncertainty estimates or inconsistency w.r.t the training data to identify failure. Interestingly, we find that while uncertainties are necessary they are insufficient to accurately characterize failure in practice. Hence, we introduce PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regressors. Built upon the principle of anchored training in deep models, PAGER unifies both epistemic uncertainty and complementary manifold nonconformity scores to accurately organize samples into different risk regimes.

Olson M., Liu S., Anirudh R., et al. (2023). Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). []

Generative Adversarial Networks (GANs) are notoriously difficult to train especially for complex distributions and with limited data. This has driven the need for tools to audit trained networks in human intelligible format, for example, to identify biases or ensure fairness. Existing GAN audit tools are restricted to coarse-grained, model data comparisons based on summary statistics such as FID or recall. In this paper, we propose an alternative approach that compares a newly developed GAN against a prior baseline. To this end, we introduce Cross-GAN Auditing (xGA) that, given an established “reference” GAN and a newly proposed “client” GAN, jointly identifies intelligible attributes that are either common across both GANs, novel to the client GAN, or missing from the client GAN. This provides both users and model developers an intuitive assessment of similarity and differences between GANs. We introduce novel metrics to evaluate attribute-based GAN auditing approaches and use these metrics to demonstrate quantitatively that xGA outperforms baseline approaches. We also include qualitative results that illustrate the common, novel and missing attributes identified by xGA from GANs trained on a variety of image datasets.

Landajuela M., Lee C.S., Yang J., et al. (2022). A Unified Framework for Deep Symbolic Regression. Conference on Neural Information Processing Systems (NeurIPS). []

The last few years have witnessed a surge in methods for symbolic regression, from advances in traditional evolutionary approaches to novel deep learning-based systems. Individual works typically focus on advancing the state-of-the-art for one particular class of solution strategies, and there have been few attempts to investigate the benefits of hybridizing or integrating multiple strategies. In this work, we identify five classes of symbolic regression solution strategies---recursive problem simplification, neural-guided search, large-scale pre-training, genetic programming, and linear models---and propose a strategy to hybridize them into a single modular, unified symbolic regression framework. Based on empirical evaluation using SRBench, a new community tool for benchmarking symbolic regression methods, our unified framework achieves state-of-the-art performance in its ability to (1) symbolically recover analytical expressions, (2) fit datasets with high accuracy, and (3) balance accuracy-complexity trade-offs, across 252 ground-truth and black-box benchmark problems, in both noiseless settings and across various noise levels. Finally, we provide practical use case-based guidance for constructing hybrid symbolic regression algorithms, supported by extensive, combinatorial ablation studies.

Landajuela M., Petersen B.K., Kim S., et al. (2021). Discovering Symbolic Policies with Deep Reinforcement Learning. Proceedings of Machine Learning Research. []

Deep reinforcement learning (DRL) has proven successful for many difficult control problems by learning policies represented by neural networks. However, the complexity of neural network-based policies—involving thousands of composed non-linear operators—can render them problematic to understand, trust, and deploy. In contrast, simple policies comprising short symbolic expressions can facilitate human understanding, while also being transparent and exhibiting predictable behavior. To this end, we propose deep symbolic policy, a novel approach to directly search the space of symbolic policies. We use an autoregressive recurrent neural network to generate control policies represented by tractable mathematical expressions, employing a risk-seeking policy gradient to maximize performance of the generated policies. To scale to environments with multi-dimensional action spaces, we propose an “anchoring” algorithm that distills pre-trained neural network-based policies into fully symbolic policies, one action dimension at a time. We also introduce two novel methods to improve exploration in DRL-based combinatorial optimization, building on ideas of entropy regularization and distribution initialization. Despite their dramatically reduced complexity, we demonstrate that discovered symbolic policies outperform seven state-of-the-art DRL algorithms in terms of average rank and average normalized episodic reward across eight benchmark environments.

Mundhenk T.N., Santiago C.P., Landajuela M., et al. (2021). Symbolic Regression via Neural-Guided Genetic Programming Population Seeding. Conference on Neural Information Processing Systems (NeurIPS). []

Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks. Source code is provided at

Petersen B.K., Larma M.L., Mundhenk T.N., et al. (2021). Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients. International Conference on Learning Representations (ICLR). []

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.

Thiagarajan J.J., Narayanaswamy V., Rajan D., et al. (2021). Designing Counterfactual Generators using Deep Model Inversion. Conference on Neural Information Processing Systems (NeurIPS). []

Explanation techniques that synthesize small, interpretable changes to a given image while producing desired changes in the model prediction have become popular for introspecting black-box models. Commonly referred to as counterfactuals, the synthesized explanations are required to contain discernible changes (for easy interpretability) while also being realistic (consistency to the data manifold). In this paper, we focus on the case where we have access only to the trained deep classifier and not the actual training data. While the problem of inverting deep models to synthesize images from the training distribution has been explored, our goal is to develop a deep inversion approach to generate counterfactual explanations for a given query image. Despite their effectiveness in conditional image synthesis, we show that existing deep inversion methods are insufficient for producing meaningful counterfactuals. We propose DISC (Deep Inversion for Synthesizing Counterfactuals) that improves upon deep inversion by utilizing (a) stronger image priors, (b) incorporating a novel manifold consistency objective and (c) adopting a progressive optimization strategy. We find that, in addition to producing visually meaningful explanations, the counterfactuals from DISC are effective at learning classifier decision boundaries and are robust to unknown test-time corruptions.

Thiagarajan J.J., Sattigeri P., Rajan D., Venkatesh B. (2020). Calibrating Healthcare AI: Towards Reliable and Interpretable Deep Predictive Models. Preprint. []

The wide-spread adoption of representation learning technologies in clinical decision making strongly emphasizes the need for characterizing model reliability and enabling rigorous introspection of model behavior. While the former need is often addressed by incorporating uncertainty quantification strategies, the latter challenge is addressed using a broad class of interpretability techniques. In this paper, we argue that these two objectives are not necessarily disparate and propose to utilize prediction calibration to meet both objectives. More specifically, our approach is comprised of a calibration-driven learning method, which is also used to design an interpretability technique based on counterfactual reasoning. Furthermore, we introduce reliability plots, a holistic evaluation mechanism for model reliability. Using a lesion classification problem with dermoscopy images, we demonstrate the effectiveness of our approach and infer interesting insights about the model behavior.

Liu S., Gaffney J., Peterson L., et al. (2019). Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications. IEEE Transactions on Visualization and Computer Graphics. []

With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications.

Additional Resources

silhouette of a face made out of circles and lines


Through LLNL's Innovations and Partnerships Office (, we have filed more than 45 patents since 2019.

cubes of data

Open Data Initiative

We share unique datasets with academic and industry collaborators through the Open Data Initiative.

lines of code

Open-Source Software

We provide transparency in software development. Filter our catalog by AI & Machine Learning.