Open Data Initiative

The DSI’s Open Data Initiative (ODI) enables us to share LLNL’s rich, challenging, and unique datasets with the larger data science community. Our goal is for these datasets to help support curriculum development, raise awareness around LLNL’s data science efforts, foster new collaborations, and be leveraged across other learning opportunities.

As we develop this catalog over time, the data will represent a wide variety of key LLNL mission areas and may include subsets of some of the world’s largest datasets. We plan to provide data ranging in complexity from dense, featureful, labeled datasets with well understood solutions to those that are sparse, noisy, and largely unexplored. These datasets can also be used to test novel hardware solutions for scalable machine learning platforms.


Code repository (LLNL-CODE-764041) | Download (LLNL-MI-835833) | License

high-resolution simulation of the electrical activation map in a human’s heart

Building off LLNL's Cardioid code, which simulates the electrophysiology of the human heart, a research team has conducted a computational study to generate a dataset of cardiac simulations at high spatiotemporal resolutions. The dataset, which is publicly available for further cardiac machine learning research, was built using real cardiac bi-ventricular geometries and clinically inspired endocardial activation patterns under different physiological and pathophysiological conditions.

The dataset consists of pairs of computationally simulated intracardiac transmembrane voltage recordings and electrocardiogram (ECG) signals. In total, 16,140 organ-level simulations were performed on LLNL's Lassen supercomputer, concurrently utilizing 4 GPUs and 40 CPU cores. Each simulation produced pairs of 500ms-by-12 ECG signals and 500ms-by-75 transmembrane voltage signals. The data was conveniently preprocessed and saved as NumPy arrays.

This project was funded by the Laboratory Directed Research and Development program (18-LW-078, principal investigator: Robert Blake), and the paper "Intracardiac Electrical Imaging Using the 12-Lead ECG: A Machine Learning Approach Using Synthetic Data" was accepted to the 2022 Computing in Cardiology international scientific conference. Co-authors are LLNL's Mikel Landajuela, Rushil Anirudh, and Robert Blake along with Joe Loscazo from Harvard Medical School.

View all datasets in the UCSD LLNL collection.