Seminar Series

Our seminar series features talks from innovators from academia, industry, and the Lab. These talks provide a forum for thought leaders to share their work, discuss trends. and stimulate collaboration. These monthly seminars are held onsite and virtually. Recordings are posted to a YouTube playlist.

Deep Generative Modeling in Network Science with Applications to Public Policy Research

Gavin Hartnett
Gavin Hartnett | Information Scientist | RAND Corporation

Network data is increasingly being used in quantitative, data-driven public policy research. These are typically very rich datasets that contain complex correlations and inter-dependencies. This richness promises to be quite useful for policy research, while at the same time poses a challenge for the useful extraction of information from these datasets —a challenge that calls for new data analysis methods. We formulate a research agenda of key methodological problems whose solutions would enable progress across many areas of policy research. We then review recent advances in applying deep learning to network data and show how these methods may be used to address many of the identified methodological problems. We particularly emphasize deep generative methods, which can be used to generate realistic synthetic networks useful for microsimulation and agent-based models capable of informing key public policy questions. We extend these recent advances by developing a new generative framework that applies to large social contact networks commonly used in epidemiological modeling. For context, we also compare these recent neural network–based approaches with the more traditional Exponential Random Graph Models. Lastly, we discuss some open problems where more progress is needed. This talk will be mainly based on our recent report. See the project's GitHub repository.

Gavin Hartnett is an Information Scientist at the RAND Corporation and a professor at the Pardee RAND Graduate School, where he serves as the Tech and Narrative Lab AI Co-Lead. As a theoretical physicist turned machine learning (ML) researcher, his research centers around the application of ML to a diverse range of public policy areas. Hartnett's recent work includes investigations into COVID-19 vaccination strategies, applications of graph neural networks to agent-based modeling, applications of natural language processing to official U.S. government policy documents, and the implications of adversarial examples in defense scenarios. He has also worked on applications of AI/ML in the physical sciences, with a particular emphasis on spin-glass systems in theoretical physics and computer science. Prior to joining RAND, Hartnett studied black holes in string theory as a postdoc at the Southampton Theory Astrophysics and Gravitation Research Centre in the UK, and before that he was a PhD student at UCSB. His research focused on the existence and stability of black holes, and in using properties of black holes to understand phenomena in strongly coupled gauge theories through the gauge/gravity duality. As an undergraduate at Syracuse University, he researched gravitational waves as part of the LIGO collaboration, the expansion of the early universe, as well as topological defects in liquid crystals. Watch Hartnett's talk on YouTube.

Replication or Exploration? Sequential Design for Stochastic Simulation Experiments

Robert Gramacy
Robert Gramacy | Professor of Statistics | Virginia Polytechnic

We investigate the merits of replication and provide methods that search for optimal designs (including replicates), in the context of noisy computer simulation experiments. We first show that replication offers the potential to be beneficial from both design and computational perspectives, in the context of Gaussian process surrogate modeling. We then develop a look-ahead based sequential design scheme that can determine if a new run should be at an existing input location (i.e., replicate) or at a new one (explore). When paired with a newly developed heteroskedastic Gaussian process model, our dynamic design scheme facilitates learning of signal and noise relationships which can vary throughout the input space. We show that it does so efficiently, on both computational and statistical grounds. In addition to illustrative synthetic examples, we demonstrate performance on two challenging real-data simulation experiments, from inventory management and epidemiology.

Dr. Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech/VT) and affiliate faculty in VT's Computational Modeling and Data Analytics program. Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty. Dr. Gramacy recently published a book on surrogate modeling of computer experiments. Watch Gramacy's talk on YouTube.

Data Sketching as a Tool for High Performance Computing

Benjamin Priest
Benjamin W. Priest | Computing Scientist | Lawrence Livermore National Laboratory

High-throughput and high-volume data pipelines are prevalent  throughout data science. Additionally, many data problems consider structured data that is representable as graphs,  matrices, or tensors. Although modern high performance  software solutions are sufficient to solve many important  problems, the highly un-uniform structure of many realistic data  sets, such as scale-free graphs, can result in high latency and  poor resource utilization in distributed memory codes. In this  talk we will introduce distributed data sketching – the  deployment of composable, fixed-size data summaries – as a  mechanism for approximately querying distributed structured  data while minimizing memory and communication overhead. We will describe several specific sketch data structures,  including cardinality sketches and subspace embeddings,  while providing concrete examples of their application to HPC-scale computations – including local k-neighborhood  estimation and vertex embedding for clustering. We will also  introduce a broad cross-section of sketches and applications  from the theory of computing literature, and outline their  potential future applications to high performance numerical  linear algebra and graph analysis codes.


Benjamin Priest is a staff member in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. He received his PhD in 2019 from the Thayer School of Engineering at Dartmouth College. His areas of interest include streaming and sketching algorithms, high performance computing, graph analysis, numerical linear algebra, and machine  learning. His recent research foci are the development of high performance algorithms and codes for the sub-linear analysis of  graphs and for the scalable approximation of Gaussian processes.

Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients

Brenden Petersen
Brenden Petersen | Group Leader | Lawrence Livermore National Laboratory

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance. Watch Petersen's talk on YouTube.

A team of LLNL scientists collaborated on this effort. Brenden Petersen, Mikel Landajuela Larma, Nathan Mundhenk, Claudio Santiago, Soo Kim, and Joanne Kim. ICLR 2021 Publication.

Brenden Petersen is the group leader of the Operations Research and Systems Analysis group at Lawrence Livermore National Laboratory. He received his PhD in 2016 at a joint appointment at the University of California, Berkeley and University of California, San Francisco. His PhD background is in biological modeling and simulation. Since joining the Lab almost 5 years ago, his research explores the intersection of simulation and machine learning. His current research interests include deep reinforcement learning for simulation control and discrete optimization.

Deep Networks from First Principles

Yi Ma
Yi Ma | Professor in Residence | University of California, Berkeley

In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the perspective of data compression (and group invariance). We show how modern deep layered architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental tradeoff between invariance and sparsity for class separability; it reveals a fundamental connection between deep networks and Fourier transform for group invariance the computational advantage in the spectral domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation (optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective. 

This is a joint work with students Yaodong Yu, Ryan Chan, and Haozhi Qi of Berkeley; Dr. Chong You (now at Google Research) and Professor John Wright of Columbia University.

Yi Ma is a Professor in residence at the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received his Bachelor’s degree from Tsinghua University in 1995 and MS and PhD degrees from UC Berkeley in 2000. His research interests are in computer vision, high-dimensional data analysis, and intelligent systems. He has been on the faculty of UIUC ECE from 2000 to 2011, the manager of the Visual Computing group of Microsoft Research Asia from 2009 to 2014, and the Dean of the School of Information Science and Technology of Shanghai Tech University from 2014 to 2017. He has published over 160 papers and three textbooks in computer vision, statistical learning, and data science. He received NSF Career award in 2004 and ONR Young Investigator award in 2005. He also received the David Marr prize in computer vision in 1999 and has served as Program Chair and General Chair of ICCV 2013 and 2015, respectively. He is a Fellow of IEEE, SIAM, and ACM.

Adaptive Contraction Rates and Model Selection Consistency of Variational Posteriors

Lizhen Lin
Lizhen Lin | Sara and Robert Lumpkings Associate Professor | University of Notre Dame

This talk discusses adaptive inference based on variational Bayes. We propose a novel variational Bayes framework called adaptive variational Bayes, which can operate on a collection of model spaces with varying structures. The proposed framework averages variational posteriors over individual models with certain weights to obtain the variational posterior over the entire model space. It turns out that this averaged variational posterior minimizes the Kullback-Leibler divergence to the regular posterior distribution. We show that the proposed variational posterior can achieve optimal contraction rates adaptively in very general situations, as well as attain model selection consistency when the “true” model structure exists. We apply the adaptive variational Bayes to several classes of deep learning models and derive some new and adaptive inference results. Moreover, we propose a particle-based approach for the construction of a prior distribution and variational family, which automatically satisfies some of the theoretical conditions imposed in our general framework and provides an optimization algorithm applicable to general problems. Lastly, we consider the use of quasi-likelihood in our adaptive variational framework. We formulate conditions on quasi-likelihood to ensure the contraction rate remains the same. The proposed framework can be applied to a large class of problems including sparse linear regression, estimation of finite mixtures, and graphon estimation in network analysis.

Dr. Lizhen Lin is the Sara and Robert Lumpkings Associate Professor at the University of Notre Dame. Her areas of expertise are in Bayesian nonparametric, Bayesian asymptotic, statistics on manifolds, and geometric deep learning. She is also interested in statistical network analysis.

Artificial Intelligence in Support of Biomedical Data Privacy

Bradley Malin
Bradley Malin | Accenture Professor of Biomedical Informatics, Biostatistics, and Computer Science | Vanderbilt University

Privacy is a social construct that is realized in different ways under varying situations in healthcare and biomedical research.  In this respect, context is king, such that the manner by which privacy can be injected into a system is dependent on a variety of factors that influence the environment.  This is particularly the case when considering privacy in the big data age or what one might call, big data privacy.  As computing becomes increasingly cheap and ever more ubiquitous, it seems as though upholding privacy is an impossible task.  This notion is supported by the development and demonstration of a growing array of attacks on certain types of protections biomedical data managers aim to inject into clinical and genomic data shared for various purposes, such as the obfuscation of a patient’s identity or the suppression of sensitive facts about a research participant or academic medical center.  At the same time, these methodologies make strong assumptions about the extent to which an adversary functions in the world, such as operating under no (or limited) constraints with respect to resources at their disposal and motivation for mounting an attack.  In this brief presentation, I will review several attacks on biomedical data as they have evolved over the past several decades, but then posit a new approach to assessing data privacy risk in the real world that builds on computational economic perspectives of risk assessment and artificial data generation methods.  To illustrate the potential for this approach, I will draw upon several examples of how we have applied it with respect to sharing demographic, clinical, and genomic data, both at the individual- and summary-level for several U.S.-based consortia and multinational clinical trials.

Bradley Malin, Ph.D., is the Accenture Professor of Biomedical Informatics, Biostatistics, and Computer Science at Vanderbilt University. He is the Co-Founder and Co-Director of two centers.  The first is the Center for Genetic Privacy and Identity in Community Settings (GetPreCiSe), an NIH Center of Excellence in Ethical, Legal, and Social Implications Research.  The second is the Health Data Science Center, which integrates over ten laboratories at Vanderbilt working on data science applications in healthcare.  His research draws upon methodologies in computer science, biomedical science, and public policy to innovate novel computational techniques. In addition to running a vibrant scientific research program, since 2007, he has led a data privacy consultation service for the Electronic Medical Records and Genomics (eMERGE) network, an NIH consortium.  

MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local Cross-Validation

Amanda Muyskens
Amanda Muyskens | Applied Statistician | LLNL

The utilization of large and complex data by machine learning in support of decision-making is of increasing importance in many scientific and national security domains. However, the need for uncertainty estimates or similar confidence indicators inhibits the integration of many popular machine learning pipelines, such as those that rely upon deep learning. In contrast Gaussian Process (GP) models are popular for their principled uncertainty quantification but require quadratic memory to store the covariance matrix and cubic computation to perform inference or evaluate the likelihood function. In this talk, we present MuyGPs, a novel computationally efficient GP hyperparameter estimation method for large data that has recently been released for open-source use in the python package MuyGPyS MuyGPs builds upon prior methods that take advantage of nearest neighbor structure for sparsification and uses leave-one-out cross-validation to optimize covariance (kernel) hyperparameters without realizing the expensive multivariate normal likelihood. We describe our approximate methods and compare our implementations against the state-of-the-art competitors in approximate GP regression on a benchmark dataset and to several competitors, including convolutional neural networks, in a space-based image classification problem. We give examples of code to fit data such as these examples, and finally, we discuss future directions of MuyGPs.

Dr. Amanda Muyskens is a staff member in the Applied Statistic Group (ASG) within the Computational Engineering Division (CED) here at LLNL. She received bachelor’s degrees in both mathematics and music performance from the University of Cincinnati in 2013 and a MS and PhD from NC State University in statistics in 2015 and 2019 respectively. She began her postdoc at LLNL in 2019 in her current group. Her research interests include Gaussian processes, computationally efficient statistical methods, uncertainty quantification, and statistical consulting.

A Biased Tour of the Uncertainty Visualization Zoo

Matthew Kay
Matthew Kay | Assistant Professor | Northwestern University

Uncertain predictions permeate our daily lives (“will it rain today?”, “how long until my bus shows up?”, “who is most likely to win the next election?”). Fully understanding the uncertainty in such predictions would allow people to make better decisions, yet predictive systems usually communicate uncertainty poorly—or not at all. Based on my (and others') research and my own practice, I will discuss ways to combine knowledge of visualization perception, uncertainty cognition, and task requirements to design visualizations that more effectively communicate uncertainty. I will also discuss ongoing work in systematically characterizing the space of uncertainty visualization designs and in developing ways to communicate (difficult- or impossible-to-quantify) uncertainty in the data analysis process itself. As we push more predictive systems into people’s everyday lives, we must consider carefully how to communicate uncertainty in ways that people can actually use to make informed decisions.

Matthew Kay is an Assistant Professor jointly appointed in Computer Science and Communications Studies at Northwestern University. He works in human-computer interaction and information visualization; more specifically, his research areas include uncertainty visualization, personal health informatics, and the design of human-centered tools for data analysis. His current research is funded by multiple NSF awards, and he has received multiple best paper awards across human-computer interaction and information visualization venues (including ACM CHI and IEEE VIS). He co-directs the Midwest Uncertainty Collective ( and is the author of the tidybayes ( and ggdist ( R packages for visualizing Bayesian model output and uncertainty.

Hypergraphs and Topology for Data Science

Emilie Purvine
Emilie Purvine | Senior Data Scientist | Pacific Northwest National Laboratory

Data scientists and applied mathematicians must grapple with complex data when analyzing complex systems. Analytical methods almost always represent phenomena as a much simpler level than the complex structure or dynamics inherent in systems, through either simpler measured or sampled data, or simpler models, or both. As just one example, collaboration data from publications databases are often modeled as graphs of authors, in which pairs of authors (vertices) are connected if they published a paper together, perhaps weighted by the number of such papers. This graph view is also commonly found when analyzing many other kinds of data including biological, cyber, and social. But to better represent inherent complexity, researchers are striving to adopt hypergraphs, representing connections not only as pairwise, but as mulit-way or higher order. In bibliometrics, where papers have multiple authors, and authors write multiple papers, hypergraphs can natively capture the complex ways that groups of authors form into collaborations as sets of authors on papers, where traditional collaboration networks can only do so via complex coding schemes. Our recent work has focused on first developing and implementing methods that extend common graph methods to hypergraphs – e.g., distance, diameter, centrality – and then using such methods to study real data sets from biology to cyber security. Moreover,  the complexity of hypergraphs imbues them with  significant topological properties, and we have been active in developing a theory and interpretation of hypergraphs homology, through abstract simplicial complexes and other topological representations. Additionally, graphs and hypergraphs both arise in data systems with more than two dimensions, for example adding keywords or institutions to papers and authors. These four dimensions – authors, papers, keywords, and institutions – now can form a combinatorial number of hypergraphs (e.g., author vs. papers, papers vs. keywords, institutions vs. authors, etc.). But what mathematical structure can be formed when we consider all these dimensions simultaneously? Tensors may be one such structure, but even they may be too restrictive since tensors represent a multi-relation among all dimensions, and data may only be available on certain projections. In this talk I will provide an overview of our work on hypergraphs and topology for data science, including both theory and practice of the methods we have been developing, and provide some thoughts on going beyond hypergraphs.

Dr. Emilie Purvine is a Senior Data Scientist at Pacific Northwest National Laboratory. Although her academic background is in pure mathematics, with a BS from University of Wisconsin - Madison and a PhD from Rutgers University, her research since joining PNNL in 2011 has focused on applications of combinatorics and computational topology together with theoretical advances needed to support the applications.  Over her time at PNNL Emilie has been both PI and technical staff on a number of projects in applications ranging from computational chemistry and biology to cyber security and power grid modeling. She has authored over 40 technical publications and is currently an associate editor for the Notices of the American Mathematical Society. Emilie also coordinates PNNL’s Postgraduate Organization which plans career development seminars, an annual research symposium, and promotes networking and mentorship for PNNL’s post bachelors, post masters, and post doctorate research associates.

AI-Enabled Innovations in Validation of Sanitation and Detection of Pathogens

Nitin Nitin
Nitin Nitin | Professor and Engineer | UC Davis

Food safety is one of the leading public health issues that continue to be a significant challenge for the food industry and consumers. These issues are critical for the minimally processed food products such as the fresh produce industry. Sanitation is a critical control step for the safety of the food supply. However, the current approaches for verification and validation are limited. Similarly, the current sanitation processes use conventional chemical sanitizers and copious amounts of water and energy. Thus, there is an unmet need to develop and validate novel technologies for the sanitation of food contact surfaces. Complementary to sanitation, food safety testing is a fundamental approach for detecting pathogens in food, water, and environmental samples. This presentation will focus on advances in verification and validation of sanitation of food contact surfaces, including the inactivation of biofilms using chemical sanitizers and non-thermal atmospheric plasma technologies and detection of target bacteria in water and food samples. The presentation focuses on the role of AI methods in enabling the validation of sanitation and the detection of pathogens. For the verification of sanitation, the research will illustrate the application of AI for the analysis of spectroscopy data sets acquired using engineered surrogates for bacteria and their biofilms. To detect bacteria, I will present applications of AI methods for both imaging and spectroscopy data sets.  The results will illustrate the significant potential of AI technologies in addressing critical needs to improve food safety.

N. Nitin is a faculty member in the departments of food science and technology and biological and agricultural engineering. His research is at the interface of biomaterial science, biosensors, mathematical modeling, and data analytics. With these approaches, his research aims to enhance the quality, safety, and sustainability of food systems. In collaboration with his students, postdoctoral fellows, and faculty colleagues, he has co-authored over 145 peer-reviewed publications and is a co-inventor for ten patents and eight patent applications. Prof. Nitin also teaches courses in food processing, food safety engineering, and heat transfer in biological systems in both departments.  His research has also enabled co-founding of two early-stage companies.

Julia, The Power of Language

Alan Edelman
Alan Edelman | Professor | MIT

The Julia language has become well known for its combination of performance and ease-of-use. We argue the real power of language is the ability to have impact. In this talk we will assume no or little familiarity with the Julia language and describe why Julia is not just another language for everyday and high-performance computing. We argue that the real power of a language is the ability to collaborate and have impact. We will discuss the application of Julia to domains like climate science, materials design, simulations that require optimization, differential equations, machine learning, and uncertainty quantification and highlight why we say, "humans compose when software composes."

Professor Edelman considers himself to be a pure mathematician and an applied computer scientist. He works in the areas of numerical linear algebra, Random Matrix theory, high performance computing systems, networks, software, and algorithms. He has won many prizes for his work including the prestigious Gordon Bell Prize, the Householder prize, the Sidney Fernbach award, and Babbage Prize. He was the founder of Interactive Supercomputing,  a company acquired by Microsoft in its fifth year employing nearly 50 people and is a co-creator of Julia. He is an elected fellow of ACM, AMS, IEEE, and SIAM. He believes above all that math and computing go together and both should be fun.

Harnessing the Digital Revolution to Assessing Water Use Dynamics Under Climatic Stressors and Policy Regimes

Newsha Ajami
Newsha Ajami | Director of Urban Water Policy | Stanford University

Understanding water demand patterns and demand dynamics are vital in achieving long-term water resiliency and reliability, especially as traditional water supply solutions are increasingly under stress due to climate change. In this seminar, using change point detection methodology, I will closely examine various drivers that affect customer level water demand based on some of the emerging data sources. I will further assess the extent environmental and climatic stressors such as droughts and policy regimes, influence transitory behavioral modifications or structural changes in water demand and rebound patterns and how such dynamics are key in informing water supply reliability and infrastructure planning.

Newsha K. Ajami is the director of Urban Water Policy with Stanford University’s Water in the West program. A leading expert in sustainable water resource management, smart cities, and the water-energy-food nexus, she uses data science principles to study the human and policy dimensions of urban water and hydrologic systems. Her research throughout the years has been interdisciplinary and impact focused. Dr. Ajami served as a gubernatorial appointee to the Bay Area Regional Water Quality Control Board for two terms and is currently a mayoral appointee to the San Francisco Public Utilities Commission. She is a member of National Academies Board on Water Science and Technology. Dr. Ajami also serves on number of state-level and national advisory boards. Before joining Stanford, she worked as a senior research scholar at the Pacific Institute and served as a Science and Technology fellow at the California State Senate’s Natural Resources and Water Committee where she worked on various water and energy related legislation. She has published many highly cited peer-reviewed articles, coauthored two books, and contributed opinion pieces to the New York Times, San Jose Mercury, and Sacramento Bee. Dr. Ajami received her Ph.D. in Civil and Environmental Engineering from the UC, Irvine, an M.S. in Hydrology and Water Resources from the University of Arizona, and a B.S. in Civil Engineering from Amir Kabir University of Technology in Tehran.