Seminar Series

Hosted onsite at LLNL—and now virtually—on an ongoing basis, our seminars feature speakers from other institutions around the Bay Area and beyond. We host these events to introduce new ideas and potential collaborators to the Lab. We are pleased to share seminar information here with the broader data science community. View our YouTube playlist.

Deep Generative Modeling in Network Science with Applications to Public Policy Research

Gavin Hartnett
Gavin Hartnett | Information Scientist | RAND Corporation

Network data is increasingly being used in quantitative, data-driven public policy research. These are typically very rich datasets that contain complex correlations and inter-dependencies. This richness promises to be quite useful for policy research, while at the same time poses a challenge for the useful extraction of information from these datasets —a challenge that calls for new data analysis methods. We formulate a research agenda of key methodological problems whose solutions would enable progress across many areas of policy research. We then review recent advances in applying deep learning to network data and show how these methods may be used to address many of the identified methodological problems. We particularly emphasize deep generative methods, which can be used to generate realistic synthetic networks useful for microsimulation and agent-based models capable of informing key public policy questions. We extend these recent advances by developing a new generative framework that applies to large social contact networks commonly used in epidemiological modeling. For context, we also compare these recent neural network–based approaches with the more traditional Exponential Random Graph Models. Lastly, we discuss some open problems where more progress is needed. This talk will be mainly based on our recent report. See the project's GitHub repository.

Gavin Hartnett is an Information Scientist at the RAND Corporation and a professor at the Pardee RAND Graduate School, where he serves as the Tech and Narrative Lab AI Co-Lead. As a theoretical physicist turned machine learning (ML) researcher, his research centers around the application of ML to a diverse range of public policy areas. Hartnett's recent work includes investigations into COVID-19 vaccination strategies, applications of graph neural networks to agent-based modeling, applications of natural language processing to official U.S. government policy documents, and the implications of adversarial examples in defense scenarios. He has also worked on applications of AI/ML in the physical sciences, with a particular emphasis on spin-glass systems in theoretical physics and computer science. Prior to joining RAND, Hartnett studied black holes in string theory as a postdoc at the Southampton Theory Astrophysics and Gravitation Research Centre in the UK, and before that he was a PhD student at UCSB. His research focused on the existence and stability of black holes, and in using properties of black holes to understand phenomena in strongly coupled gauge theories through the gauge/gravity duality. As an undergraduate at Syracuse University, he researched gravitational waves as part of the LIGO collaboration, the expansion of the early universe, as well as topological defects in liquid crystals. Watch Hartnett's talk on YouTube.


Replication or Exploration? Sequential Design for Stochastic Simulation Experiments

Robert Gramacy
Robert Gramacy | Professor of Statistics | Virginia Polytechnic

We investigate the merits of replication and provide methods that search for optimal designs (including replicates), in the context of noisy computer simulation experiments. We first show that replication offers the potential to be beneficial from both design and computational perspectives, in the context of Gaussian process surrogate modeling. We then develop a look-ahead based sequential design scheme that can determine if a new run should be at an existing input location (i.e., replicate) or at a new one (explore). When paired with a newly developed heteroskedastic Gaussian process model, our dynamic design scheme facilitates learning of signal and noise relationships which can vary throughout the input space. We show that it does so efficiently, on both computational and statistical grounds. In addition to illustrative synthetic examples, we demonstrate performance on two challenging real-data simulation experiments, from inventory management and epidemiology.

Dr. Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech/VT) and affiliate faculty in VT's Computational Modeling and Data Analytics program. Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty. Dr. Gramacy recently published a book on surrogate modeling of computer experiments. Watch Gramacy's talk on YouTube.


Data Sketching as a Tool for High Performance Computing

Benjamin Priest
Benjamin W. Priest | Computing Scientist | Lawrence Livermore National Laboratory

High-throughput and high-volume data pipelines are prevalent  throughout data science. Additionally, many data problems consider structured data that is representable as graphs,  matrices, or tensors. Although modern high performance  software solutions are sufficient to solve many important  problems, the highly un-uniform structure of many realistic data  sets, such as scale-free graphs, can result in high latency and  poor resource utilization in distributed memory codes. In this  talk we will introduce distributed data sketching – the  deployment of composable, fixed-size data summaries – as a  mechanism for approximately querying distributed structured  data while minimizing memory and communication overhead. We will describe several specific sketch data structures,  including cardinality sketches and subspace embeddings,  while providing concrete examples of their application to HPC-scale computations – including local k-neighborhood  estimation and vertex embedding for clustering. We will also  introduce a broad cross-section of sketches and applications  from the theory of computing literature, and outline their  potential future applications to high performance numerical  linear algebra and graph analysis codes.

 

Benjamin Priest is a staff member in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. He received his PhD in 2019 from the Thayer School of Engineering at Dartmouth College. His areas of interest include streaming and sketching algorithms, high performance computing, graph analysis, numerical linear algebra, and machine  learning. His recent research foci are the development of high performance algorithms and codes for the sub-linear analysis of  graphs and for the scalable approximation of Gaussian processes.


Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients

Brenden Petersen
Brenden Petersen | Group Leader | Lawrence Livermore National Laboratory

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.

A team of LLNL scientists collaborated on this effort. Brenden Petersen, Mikel Landajuela Larma, Nathan Mundhenk, Claudio Santiago, Soo Kim, and Joanne Kim. ICLR 2021 Publication.

Brenden Petersen is the group leader of the Operations Research and Systems Analysis group at Lawrence Livermore National Laboratory. He received his PhD in 2016 at a joint appointment at the University of California, Berkeley and University of California, San Francisco. His PhD background is in biological modeling and simulation. Since joining the Lab almost 5 years ago, his research explores the intersection of simulation and machine learning. His current research interests include deep reinforcement learning for simulation control and discrete optimization.