Oct. 31, 2024
Stuart Russell Highlights Importance of AI Safety in Colloquium
LLNL’s Office of the Deputy Director for Science and Technology and the DSI co-sponsored a colloquium on October 3 featuring Stuart Russell, a distinguished professor at UC Berkeley. Russell, known for his pioneering work in artificial intelligence (AI), presented a compelling vision of the future of AI safety. He discussed AI safety principles, emphasizing human control and the need to plan for AI systems that may surpass human capabilities. He introduced the "loophole principle," indicating that efforts to correct misaligned AI might be futile. He referenced aviation and nuclear power as examples of fields prioritizing safety over innovation.
Russell also addressed the limitations of deep learning, noting the high computational demands and AI's struggles with complex patterns, using the game Go as an example. Russell compared theories for multi-human and multi-robot assistance games, stressing the importance of measurable preferences and utility. While scaling laws suggest artificial general intelligence could emerge by 2027, challenges remain in understanding human preferences and misaligned objectives.
When discussing approaches to AI safety, Russell posed the query, “How do we retain power over entities more powerful than us, forever?” He suggested methods like formal logical/probabilistic oracle systems, debate and self-play for argument optimization, and iterative amplification for enhancing AI systems. A video of the lecture will be released to LLNL’s YouTube channel in the coming weeks.
ICECap Uses Exascale Fusion Simulations to Pioneer Digital Design
A groundbreaking multidisciplinary team of LLNL researchers is combining the power of exascale computing with AI, advanced workflows, and graphics processor-acceleration to advance scientific innovation and revolutionize digital design. The project, called ICECap (Inertial Confinement on El Capitan), is a transformative approach to inertial confinement fusion (ICF) design optimization targeted primarily for El Capitan.
At the core of ICECap is discovering the next generation of robust, high-yield ICF designs, expanding the possibilities of computational science and shaping the future of plasma science through emerging technologies. ICF has implications for National Nuclear Security Administration’s (NNSA) stockpile stewardship mission, as well as future viable fusion power plants.
Described in a paper published by the journal Physics of Plasmas, ICECap represents a leap forward in high-performance computing, setting the stage for an era where “hero runs”—large-scale multiphysics simulations—become routine. With a focus on data-driven approaches to digital design and computational modeling, ICECap team members said the project could not only potentially accelerate science but transform the way scientists conduct research and drive advancements across various disciplines, offering new solutions to previously untenable problems.
“With ICECap, we’re trying to see how we can leverage AI to really change the way we do scientific discovery,” said principal investigator Luc Peterson. “We have supercomputers that can do fantastic simulations, but how can we use AI to help us take advantage of them to find new things? We’re doing this on El Capitan because we think we’re at the point where we can actually do both breadth and depth in computing, so you can search lots of parameter spaces to find what you're looking for and do it all in extremely high fidelity.”
DSSI Student Internship Application Deadline
The DSSI application window is now open through January 31. The 2025 program will run for two 12-week sessions and is open to both undergraduate and graduate students. Visit the DSSI website for information about how to apply, including a list of FAQs—or share this link with students who may be interested in an internship.
Class of 2023 intern Jocelyn Ornelas Muñoz shared, “Last summer, I had the chance to tour the National Ignition Facility and learn about the amazing work that went into the Lab's historic ignition shot. As part of DSSI, I worked with a group of interns to develop and present a data-driven approach to reconstructing electro-anatomical maps of the heart. My main focus is developing, implementing, and testing deep learning computer vision models.”
We encourage qualified undergraduates and graduate students to apply to the respective job postings: undergraduate and graduate. The next two stories highlight some of our outstanding students from the class of 2024.
Kristen Hallas Wins Student Poster Award
Kristen Hallas, a DSSI intern and PhD student at the University of Texas Rio Grande Valley, won an outstanding poster presentation award at the Empowering Excellence: Women in Mathematics Workshop in San Antonio, Texas (WIMSATX). The poster discussed her research on developing a machine learning (ML) model for quickly generating equation-of-state tables for mixtures of materials at new mixture compositions. Titled "Mapping Multiphase Multicomponent Mixtures via Neural Networks," Hallas’s work demonstrated that deep neural networks could predict the phase of a mixture of materials with an accuracy greater than 95%.
Collaborating with LLNL researchers Jason Bernstein and Philip Myint, Hallas is continuing her internship through the NNSA’s Minority Serving Institution Internship Program. She will contribute to two Laboratory Directed Research and Development (LDRD) projects: first by developing algorithms for machine vision and robotics control for ARMOR (Advanced Robotics for Materials and Manufacturing Optimization and Research), then by applying these algorithms to accelerate discovery for (Autonomous Alloy Prediction and EXperimentation). The projects are led by Aldair Gongora and Mason Sage, respectively.
Reflecting on her experience at LLNL, Hallas shared, “Being at Livermore this year has been especially exciting with the upcoming deployment of NNSA’s first exascale supercomputer, El Capitan. It’s so cool to be given an opportunity to improve my skills by running AI applications on state-of-the-art supercomputing systems!” She also praised the supportive environment at LLNL, saying, “I found the work culture at Livermore to be very people-focused. I was amazed by how approachable everyone was. I felt comfortable enough to reach out to anyone, even people I expected to be way too busy to chat with an intern about their work and life at the Lab.”
Students Step into Consulting Role for Defense Project
The DSI’s Consulting Service strikes again! Two interns recently stepped into the consulting role, contributing to a project sponsored by the Defense Advanced Research Projects Agency (DARPA). Patrick Mchugh (left) and Kevin Zhu (right) worked under the mentorship of Jeff Drocco, deputy group leader of LLNL’s Genomics group.
A critical part of the U.S. Department of Defense, DARPA is concerned with warfighter readiness prediction because many stressors, such as sleep deprivation or caloric restriction, can have dramatic effects on battlefield performance. Scientific questions of interest include which biomarkers should be considered from the tens of thousands available, how control and stress-tested groups can be compared, and how model predictive performance should be evaluated.
DARPA engaged LLNL as an independent verification and validation partner in its Measuring Biological Aptitude and Smart Non-Invasive Assays of Physiology programs, with Livermore providing impartiality in assessing solutions from other contracted parties as well as expertise in the quantitative life sciences. The DSSI students identified biomarkers that may indicate an individual’s readiness to perform a strenuous physical or cognitive task.
“Working on this research project as an intern has been an amazing experience. I’ve expanded my knowledge of statistics and machine learning and was able to apply them to a new biomarkers dataset,” said Zhu, an undergraduate at the Massachusetts Institute of Technology. Patrick Mchugh, a statistic PhD student at Ohio State University, added, “I had a great experience working on this consulting project and contributing during my first few weeks at the Lab. It was a lot of fun applying my academic knowledge of statistics, such as bootstrapping and prediction interval coverage, to a real-world small-sample problem.”
Measuring Failure Risk and Resiliency in AI/ML Models
The widespread use of AI/ML reveals not only the technology’s potential but also its pitfalls, such as how likely these models are to be inaccurate. AI/ML models can fail in unexpected ways even when not under attack, and they can fail in scenarios differently from how humans perform. Knowing when and why failure occurs can prevent costly errors and reduce the risk of erroneous predictions—a particularly urgent requirement in high-consequence situations.
LLNL researcher Vivek Narayanaswamy and collaborators tackled the problem of detecting failures in a paper accepted to the 2024 International Conference on Machine Learning. In “PAGER: Accurate Failure Characterization in Deep Regression Models,” the team categorized model risk into three regimes: in distribution, out of support (data similar to training data), and out of distribution (unforeseen data). Their analysis spawned the PAGER framework—Principled Analysis of Generalization Errors in Regressors—which systematically detects failures and quantifies the risk of failure in these regimes.
Livermore scientists are beginning to use the PAGER framework in an autonomous multiscale simulation project, funded via the LDRD program. In a large-scale multiphysics simulation, an ML model can act as a surrogate for distinct time steps in the overall computation. Standing in for subscale calculations, these surrogates will rely on PAGER to detect failures in real time. If a failure is detected, the simulation can pivot to call the physics code for that time step, then move on to the next one. If there’s no failure, the simulation can progress seamlessly. Read more about PAGER at LLNL Computing.
Shusen Liu Receives DOE Early Career Award
Seven LLNL scientists are recipients of the Department of Energy’s (DOE) Office of Science Early Career Research Program award. Among them is Shusen Liu, a computer scientist in the Center for Applied Scientific Computing. His work focuses on understanding and interpreting the inner mechanisms of neural networks and integrating human domain knowledge with machine capabilities to advance scientific discovery.
“I feel very fortunate to receive this recognition and funding, which will allow me to continue developing the research agenda I have been pursuing at LLNL,” he said. “My work here focuses on the intersection of human-computer interaction and fundamental machine learning research.” After joining the Lab as a postdoc in 2017, Liu advanced to a staff scientist in 2019. With his award he plans to delve deeper into uncovering how concepts and other human-understandable structures are organized and represented in neural networks.
“By leveraging this understanding, we can facilitate machine-human knowledge exchanges at various granularities and for different audiences,” Liu said, who is pictured here second from left in the bottom row. “The proposed research is expected to yield long-term impacts in AI-driven discovery, the development of scientific foundation models and AI safety.” Read the full article here.
Using Game Theory for Automated Evaluation of LLMs’ Reasoning
Large language models (LLMs) with a billion or even a trillion data parameters are a popular industry technology, but the safety and effectiveness of these models in mission-critical domains has yet to be determined. Accordingly, Livermore researchers are developing evaluation pipelines like GTBench, which provides a set of competitive game theory tasks to demonstration where LLM reasoning succeeds or fails.
Developed by LLNL computer scientists James Diffenderfer and Bhavya Kailkhura and colleagues from six universities, GTBench assesses LLM reasoning according to how they respond to different types of scenarios: complete versus incomplete, deterministic versus probabilistic, static versus dynamic. In a new paper accepted to the 38th Annual Conference on Neural Information Processing Systems (NeurIPS), the team explores the factors essential to LLMs’ planning & strategic reasoning abilities alongside first-of-its-kind performance benchmarking.
“Prior to GTBench, many benchmarks relied on manually curated datasets to measure the reasoning abilities of LLMs. However, as such, settings are not scalable and are static, we curated a selection of game-theoretic tasks where an LLM can be pitched against another LLM (or traditional solvers), thus, automatically measuring their strategic reasoning blind spots. The tasks in our Game-Theoretic Benchmark [GTBench] have clear rules and guidelines making them easy to understand but challenging without reasoning capabilities,” Diffenderfer states.
For example, in the dynamic gaming task called the Iterated Prisoner’s Dilemma, models cooperate with or deceive each other, then learn from previous rounds. Models must engage in sequential decision-making before making the next move in the game. One performance measurement involves comparing the actual outcome with the best possible outcome—a metric known as regret.
The team evaluated three frontier open-source and two commercial LLMs, discovering that models exhibit intrinsic failure in complete and deterministic games, though they are more task proficient in incomplete and probabilistic games. “Our analysis offers a more nuanced understanding of LLMs’ capabilities and limitations, highlighting that despite significant advancements in AI, many challenges remain unresolved, necessitating further efforts to develop reliable AI solutions for the DOE mission space,” notes Kailkhura. (Image at left: Performance of four LLMs are compared for two types of gaming tasks. In the static Blind Auction task, the models simultaneously submit bids without knowing the other bids. See also Table 5 in the paper linked above, showing completion rates of LLMs and agents across all the games.)
Meet an LLNL Data Scientist
Leno da Silva is a reinforcement learning researcher in LLNL’s Computational Engineering Division where he intends to make an impact with his work. Leno works primarily on the Generative Unconstrained Intelligent Drug Engineering (GUIDE) project developing ML approaches for the rapid design of antibody therapeutics. Leno says he is grateful to be working on data science and AI at such a pivotal time, noting, “It is exciting and challenging to keep up with the accelerated pace of research on AI and contribute with advances in applications of relevance to national defense.” Leno has contributed to several other projects including smart transportation, AI-powered power converter design, and AI-based sepsis treatments, and he now coordinates the DSI’s technical outreach activities including the seminar series. Leno also mentors students and sees the experience as a way to share knowledge and learn with a new generation of researchers. Before his time at LLNL, Leno completed his PhD at the University of São Paulo, Brazil, and worked as a postdoc at the Advanced Institute for AI. He published several papers during the past year and co-organized the Lab’s AI Safety Workshop in April.