Workshop 2019

The DSI and the University of California (UC) held the 2019 Data Science Workshop on July 23–24 at Garré Vineyard & Winery in Livermore.

This year’s workshop was organized into 9 sessions with a keynote address. Please datascience-workshop [at] llnl.gov (subject: DSI%20workshop) (contact) the Program Committee with any questions. Thank you to all session chairs, organizers, and attendees for another successful event!

Day 1: July 23, 2019

Keynote Address—Data and Ethics: Old Issues and New Challenges

Abel Rodriguez | Professor of Statistics, Associate Dean for Graduate Affairs | UC Santa Cruz

Abstract: The increasing availability of data and raw computational power, along with recent developments in models and algorithms, are changing the way businesses, academics and governments operate. However, this revolution has both created new ethical challenges and changed the nature of many familiar ones. For example, notions of informed consent, which were originally developed in the context of biomedical research after the atrocities of the Second World War, are a poor fit to an environment in which individuals are constantly monitored by scores of agents with vague (and often unenforceable) consent disclosures. Similarly, notions of confidentiality and privacy, originally devised for a world in which governments where the only agents with detailed information about large numbers of individuals, are not necessarily appropriate for an environment in which this kind of data are in the hands of a myriad of private entities. This talk uses a number of recent case studies to explore these and other issues related to the ethics of data collection, management and analysis, in an attempt to highlight issues that would appear to be relevant to the kind of activities carried out by the National Laboratories.

Challenges Facing Predictive Science Applications: Complexity, Uncertainty, and Experimental Data

Session moderator: Brian Van Essen | Group Leader | LLNL

  • Rushil Anirudh | Research Scientist | LLNL
  • Peter Harrington | HPC Engineer | LBL
  • Naoya Maruyama | Research Scientist | LLNL
  • Gilles Kluth | Visiting Scientist | LLNL
  • Jim Gaffney | Physicist | LLNL

Abstract: In this session we will cover multiple topics that challenge many applications of scientific machine learning. As machine learning and deep learning techniques are applied at scale to traditional scientific computing simulations, they are faced with significant challenges that are not seen in traditional machine learning applications. These challenges center around 1) the sheer complexity of the data sets and models being developed, 2) the scale of the data sets, number of compute resources and modalities, as well as infrastructure challenges such as I/O, and 3) the interpretability and confidence that we have in the models that we train and how that compares to simulations. Additional challenges that we will highlight are efforts in creating neural network architectures that provide physically consistent results and the use of deep learning “in-the-loop” for scientific applications.

Computational Approaches for Prediction of Response to Small Molecule Perturbations

Session moderator: Amrita Basu | Assistant Professor | UC San Francisco

  • Denise Wolf | Computational Biologist | UC San Francisco
  • Helgi Ingolfsson | Staff Scientist | LLNL
  • Kevin McLoughlin | Computational Biologist | LLNL

Abstract: This session will discuss new and emerging analytical approaches to study perturbation using small molecule inhibitors and experimental agents. The audience will gain current perspectives on how experimental agents are used to study lipid membrane dynamics, treatment of aggressive early stage breast cancer, and prediction of toxicities from large high-throughput drug screens.

Data Science in the Earth and Atmospheric Sciences

Session moderator: Don Lucas | Staff Scientist | LLNL

  • Qingkai Kong | Assistant Data Scientist | UC Berkeley
  • Chris Sherman | Staff Scientist | LLNL
  • Gemma Anderson | Staff Scientist | LLNL
  • Karthik Kashinath | Project Scientist | LBL

Abstract: This session highlights recent impactful application of machine learning in the earth and atmospheric sciences. The growing wealth of data collected in the earth and atmospheric sciences, combined with major developments in machine learning power and accessibility promises significant breakthroughs in the atmospheric and earth sciences as more researchers turn to machine learning and other data science techniques. However, owing to the inherent complexity of machine learning methods, and characteristics unique to earth and atmospheric science applications, such methods are prone to misapplication, may produce uninterpretable models, and are often insufficiently documented.

Machine Learning for Smart Grid Systems

Session moderator: Jose Cadena Pico | Research Staff | LLNL

  • Deepjyoti Deka | Staff Scientist | LANL
  • Harish Bhat | Associate Professor | UC Merced
  • Chris Vogl | Computational Physicist | LLNL
  • Jonathan Donadee | Research Engineer | LLNL

Abstract: In recent years, there has been a significant effort from government agencies—the Department of Energy in particular—to modernize America’s power grid infrastructure. With two-way sensors and automated control systems becoming ubiquitous, the goal is to set in place a “smart grid,” which promises more efficient transmission, security, and real-time adaptation to energy demands. The goal of this session to showcase recent applications of ML to smart grid, including transformer health assessment, grid data analysis through collaborative autonomy, and temporal modeling of smart grid data as well as provide a space where power systems experts and data scientists / ML experts further discuss challenges, potential applications of ML to the smart grid, and foster future collaboration.

Day 2: July 24, 2019

Challenges Facing Predictive Science Applications: Workflows and Scalable Learning

Session moderator: Brian Spears | Research Scientist | LLNL

  • Kelli Humbird | Livermore Graduate Scholar | LLNL
  • Luc Peterson | Design Physicist | LLNL
  • Francesco Di Natale | Computer Scientist | LLNL
  • Peter Robinson | Computer Scientist | LLNL
  • Sam Jacobs | Research Scientist | LLNL

Abstract: This session will cover the challenges for many applications of scientific machine learning related to both large scale workflows and incorporating experimental data with scientific simulations. Each of the talks identified will cover different aspects of these challenges and how they relate to multiple scientific areas.

Artificial Intelligence in the Acute Care Setting

Session moderator: Xiao Hu | Professor | UC San Francisco

  • Christine Lee | PhD Candidate | UC Irvine
  • Ira Hofer | Assistant Professor | UC Los Angeles
  • Paria Rashidinejad | PhD Candidate | UC Berkeley
  • Priya Prasad | Assistant Professor | UC San Francisco

Abstract: Wide adoption of electronic health record (EHR) systems has transformed the landscape of biomedical informatics field with an appreciable shift from building the EHR system to more efforts at data-driven discoveries and applications that leverage the vast amount of healthcare data generated each day. In parallel to accumulation of EHR data, a large amount of high-fidelity physiological data including waveforms are being systematically archived with linkage to EHR at several UC health campuses. This session will present a snapshot of ongoing research at four different UC campuses that aims to combine repositories of EHR data with physiological and genomic datasets to make new discovery and improve patient care. The four talks will cover infrastructure development, specific application and algorithm, and implementation of data driven solutions. The primary goal is to promote more collaborations among data scientists, engineers, and health science researchers among UCs and the affiliated national labs to accelerate the pace of data-driven discovery and translation.

Machine Learning as Applied to Nondestructive Characterization

Session moderator: Harry Martz | Director, Nondestructive Characterization Institute | LLNL

  • Jian-Qiao Sun | Professor | UC Merced
  • Kyle Champley | Applied Mathematician | LLNL
  • Brian Giera | Research Engineer | LLNL

Abstract: This panel discussion examines roles for machine learning to assist Nondestructive Evaluation (NDE) methods, particularly x‐ray computed tomography (CT) and optical visualization during manufacturing. CT is used to characterize the material composition and internal structure of components and assemblies. Optical visualization is being explored to correlate to monitoring of advanced manufacturing of parts. The focus of the NDE application areas discussed by the panel is improving CT reconstruction improving the quantitative accuracy and precision of material composition and internal structure of components and assemblies, i.e., measuring the material properties and geometric dimensions. The quality of acquired data is heavily influenced by the operational health of the NDE system and the effects of this will be explored in-depth in this panel discussion.

Machine Learning for Chemical and Material Sciences: Advances in Methodology and Tool Development

Session moderator: Yu-Hang Tang | Luis W. Alvarez Postdoctoral Fellow | LBL (Learning with Graph Kernels in the Chemical Universe)

  • Leonardo Zepeda | Postdoctoral Fellow | LBL
  • Muammar El Khatib | Postdoctoral Scholar | LBL
  • Samuel Blau | Postdoctoral Researcher | LBL

Abstract: The chemistry and materials communities have embraced data science and machine learning to bring about revolutionizing solutions to long‐standing challenges in molecular modeling, optimal experiment design, and high‐throughput structure screening. This is the result of a synergistic effort between (1) the work on adapting machine learning methodologies for atomistic systems which entail unique quality requirements concerning symmetry and dimensionality, etc., and (2) the work to bring the methods to a broader audience with user‐friendly and high performance computational software and services. This session will feature two talks on each of the two aforementioned aspects to present some of the latest developments in machine learning for chemical and materials sciences.

Machine Learning for Design and Manufacturing

Session moderator: Clara Druzgalski | Research Engineer | LLNL

  • Vic Castillo | Computational Engineer | LLNL
  • Russell Whitesides | Technical Staff | LLNL
  • Victor Beck | Research Engineer | LLNL

Abstract: Many systems in Design and Manufacturing benefit from process optimization, control, and monitoring. Algorithms from statistics, machine learning, and artificial intelligence provide high value to this problem space. This session covers a wide range of use cases where machine learning algorithms are used in conjunction with experimental measurements, sensors, and physics-based simulation data to address problems in Design and Manufacturing. An interdisciplinary approach to problem solving has led to new areas of research, identified fundamental challenges, and improved outcomes.