Measuring attack vulnerability in AI/ML models
LLNL is advancing the safety of AI/ML models in materials design, bioresilience, cyber security, stockpile surveillance, and many other areas. A key line of inquiry is model robustness, or how well it defends against adversarial attacks. A paper accepted to the renowned 2024 International Conference on Machine Learning explores this issue in detail. In “Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies,” LLNL researchers studied the effect of scaling robust image classifiers—using a method called adversarial training—to develop the first scaling laws for robustness. The team’s adversarial training approach alters pixels where the model seems most vulnerable, thus providing the model with a more continuous view of the data distribution. Among the findings is that better data quality provides significant benefits to the robustness produced by adversarial training. The team improved on the state of the art to 74% adversarial robustness and outperformed it with a model three times smaller and that saw three times more data. Visit LLNL Computing to learn more about this paper, and take a quiz to see how well you can identify adversarially perturbed images.