DOE Data Days

DOE logo with text of "DOE DATA DAY (D3)" in all caps

 

The Department of Energy (DOE) has joined the larger scientific community in the promotion of data management as a means to higher quality, more efficient research and analysis, and as a critical component of data science. Tools and platforms to support data management and analysis are rapidly evolving and provide enormous opportunities. They also pose challenges that can be specific to DOE but are common across DOE mission areas and organizations.

The DOE Data Day workshop, abbreviated to D3, was born from this critical work. D3’s primary goals are:

  • Bring DOE institutions together to share their data management use cases, challenges, and solutions;
  • Identify potential synergies and efficiencies; and
  • Establish proactive channels for future collaborations.

The event crosses program boundaries and mission areas, with participants exploring best practices and the latest technologies to help DOE researchers leverage new techniques, respond to data security threats, and advance fundamental science in valuable ways.

DOE Data Days will be held on March 3-5, 2026 in the Washington D.C. area. The event program will consist of two days of hybrid, unclassified sessions and a third in-person, classified day. More information on the sessions, locations, and format will be shared in early FY26.

four people and a moderator on stage for the panel discussion

Read about this year's event via LLNL News: Data Days workshop gathers DOE national labs to discuss future of data management

Workshop dates: October 22–24, 2024

Hosted by: LLNL



Themes:
Cloud and Hybrid Data Management
Data Intensive Computing
Data Curation and Governance

Resources:


Tuesday, October 22

Cloud and Hybrid Data Management

Wednesday, October 23

  • DOE Leadership Panel

Data Intensive Computing

Poster Session

Thursday, October 24

Data Curation and Governance

five panelists on a stage in front of an audience

Read about this year's event via LLNL News: Data Days brings Department of Energy labs together for discussions on data management and more

Workshop dates: October 24–26, 2023

Hosted by: LLNL

 

Themes:

  • Data Intensive Computing
  • Cloud and Hybrid Data Management
  • Data Access, Sharing, and Sensitivity
  • Data Curation and Metadata Standards
  • Data Governance and Policy

Report: 2023 Report (PDF 997KB)

Tuesday, October 24

Wednesday, October 25

Time Topics, talks, and activities
7:00am Check-in and hospitality
8:15am

Session 3: Data Access, Sharing, and Sensitivity

1:00pm

Session 4: Data Intensive Computing

5:30pm Adjourn

Thursday, October 26

Time Topics, talks, and activities
7:00am Check-in and hospitality
8:15am

Session 5: Cloud and Hybrid Data Management

2:30pm Adjourn

Workshop dates: June 1–3, 2022

Hosted by: LLNL

Themes:

  • Cloud and hybrid data management
  • Data-intensive computing
  • Data access, sharing, and sensitivity
  • Data policy and ethics

Resources:

Presentations Posters

Invited speaker: Data Policy as Translational Ethics (PDF 440KB) – D. Danks (UCSD)

Invited speaker: Scientific Data Management for DOE (PDF 3.57MB) – M. Macduff (PNNL)

Putting Data in the Spotlight: The Case for a Scientific Data Federation (PDF 728KB) – S. Boehm, S. Somnath, O. Kuchar (ORNL)

Tsdat: An Open-Source Tool to Standardize Time-Series Data (.mp4 24.4MB) – C. Lansing et al. (PNNL)

Data Flow: A Data Bridge to Air-Gapped Instruments (PDF 885KB) – G. Shutt, M. McDonnell, S. Somnath (ORNL)

Invited speaker: Genomic Analysis and Learning at Scale: Mapping Irregular Computations to Advanced Architectures (PDF 6MB) – K. Yelick (LBNL)

Indexing LANL's Historical Collections in Less Than 1,000 Years (PDF 1.2MB) – J. Maze (LANL)

Data as an Ecosystem at ARM User Facilities (PDF 2.08MB) – M. Ihli et al. (ORNL)

Materials, Aging and Compatibility Data Platform: A Case Study in Data Management (PDF) – S. Butterworth (LLNL)

Making the Most of Data: Feature Engineering for Applied Supervised Machine Learning (PDF 2.6MB) – T. Martin (NETL)

Invited speaker: Scientific Research Data Initiatives, Resources, and Support (.mp4 129.7MB) – M. Cooke (DOE Office of Science)

DOE Office of Scientific and Technical Information Artificial Intelligence and Machine Learning (PDF 910KB) – M. B. West (OSTI)

Building the Next Generation Earth System Grid Federation (ESGF2) (PDF 3.61MB) – F. Hoffman (ORNL), I. Foster (ANL), S. Ames (LLNL)

APPFL: Open-Source Software Framework for Privacy-Preserving Federated Learning (PDF 2.75MB) – K. Kim et al. (ANL)

PNNL DataHub: Building a Central Data Capability (PDF 4.1MB) – M. Hofmockel et al. (PNNL)

Need-to-Know (NTK) Considerations for High Volume Data Access (PDF 263KB) – S. Byrnes (SNL)

From ENSDF to NuDat: Disseminating Digestible Data Using Modern Web-based Technologies to Search, Filter, and Visualize Nuclear Data (PDF 3.43MB) – D. Mason, E. Ricard-McCutchen, A. A. Sonzogni (BNL)

Managing Randomness to Enable Reproducible Machine Learning (PDF 426KB) – H. Ahmed, J. Lofsted (SNL)

FAIRification for HPC Datasets and AI Models (PDF 237MB) – P.-H. Lin (LLNL)

Towards a Big-Data Toolkit: Ensuring Data Governance & Ethical Considerations Are Applied to Large Datasets (PDF 241KB) – A. May et al. (ORNL)

Ontology-Driven Discovery of Legacy Data in DMS (PDF) – C. Mathieu, K. Brunner (LLNL)

U on Up: Cloud Computing Across Classification Boundaries (PDF 823KB) – W. Rosenberger (LANL)

Scientific Data Management: An Under-Recognized Grand Challenge (PDF 281KB) – S. Somnath et al. (ORNL)

Survey of Scientific Data Needs at Oak Ridge National Laboratory (PDF 331KB) – S. Somnath, O. Kuchar (ORNL)

Data Management System Enhancements for the APS Upgrade (PDF 1.01MB) – J. Hammonds et al. (APL)

Unoccupied Aerial Systems: Development of a New ESS-DIVE Data and Metadata Reporting Format (PDF 818KB) – K. S. Ely, D. Yang, S. P. Serbin (BNL)

Exploring Lossy Compressibility through Statistical Correlations of Scientific Datasets (PDF 51MB) – J. Bessac (ANL) et al. from Clemson University

Distributed and Asynchronous Management of Science Workflows: Data and Computation Across DOE User Facilities (PDF 218KB) – S. R. Wilkinson et al. (ORNL)

Using Semantic Relationships to Describe, Enhance, and Extend United States Nuclear Tests: July 1945 through September 1992 (PDF 396KB) – K. Brunner (LLNL)

A Data Management Infrastructure for Multi-Lab Geoscience Projects (PDF 2.18MB) – K. Hodgkinson (SNL) et al. from LLNL, PNNL, LANL

Towards a DOE Data Catalog: Ensuring Access, Sharing, and Protection (PDF 50.2MB) – A. May et al. (ORNL)

The Energy Data Exchange: DOE Office of Fossil Energy Carbon Management’s Trusted Data Curation Platform (PDF 1.78MB) – C. Rowan (Maximus) et al. from NETL and Matric

National Security Data Solution: Unsiloing Weapons Research Data (PDF 900KB) – V. Feagin et al. (LANL)

Linking Legacies as a Basis for Categorizing Data in the Complex (PDF 519KB) – M. Ham (LANL)

Data Enclaves for Scientific Computing (PDF 185KB) – A. Akram (UC Davis) et al. from LBNL

The Challenges of Curating a Research Data Set from the World’s Most Complex Machine (PDF 129KB) – E. Andersen, J. Banning (PNNL)

Toward a Petascale Data Hackathon for Exploring a Digital Twin of the Earth (PDF 4.22MB) – V. Anantharaj, S. Parete-Koon, T. Papatheodore (ORNL)

Using Metadata Standards to Simplify Discovery and Navigation of Results (PDF 3.1MB) – H. Collier et al. (ORNL)

Workshop dates: October 5–7, 2020

Hosted by: LLNL, virtual only

Themes:

  • Data curation and standards: legacy data, existing data, and future data
  • Data-intensive computing, high performance computing (HPC), and data science tools for DOE's computing communities
  • Data access, sharing, and sensitivity
  • Cloud, HPC, and hybrid data management

A companion hackathon covering a selected topic from the topic areas above was held prior to D3. There was no fee to participate in D3, and an abstract was not required to attend the workshop.

Resources:

Workshop dates: September 25–26, 2019

Hosted by: LLNL

Themes:

  • Data curation and standards
  • Data-intensive computing
  • Data management in the cloud
  • Data access, sharing, and sensitivity

Resources: