Open Data Initiative

The DSI’s Open Data Initiative (ODI) enables us to share LLNL’s rich, challenging, and unique datasets with the larger data science community. Our goal is for these datasets to help support curriculum development, raise awareness around LLNL’s data science efforts, foster new collaborations, and be leveraged across other learning opportunities.

As we develop this catalog over time, the data will represent a wide variety of key LLNL mission areas and may include subsets of some of the world’s largest datasets. We plan to provide data ranging in complexity from dense, featureful, labeled datasets with well understood solutions to those that are sparse, noisy, and largely unexplored. These datasets can also be used to test novel hardware solutions for scalable machine learning platforms.


Code repository

3x2 grid of astronomical images of stars and galaxies

ZTF is a wide-field astronomical imaging survey seeking to answer many of the universe's mysteries, by frequently imaging billions of objects in our night sky. Understanding the nature of dark matter and dark energy relies on accurately mapping the universe around us, making distinguishing stars from galaxies a crucial task for astronomers. Mislabeling stars and galaxies poses a considerable problem for cosmological models, and current methods for separating these populations consumes valuable time and resources, driving astronomers to search for new methods of deciphering objects in a quick and effective manner.

Scientists at LLNL have developed a computationally efficient Gaussian process hyperparameter estimation method, MuyGPS, that can be applied to image classification tasks. This repo demonstrates the use of MuyGPyS, a python package that implements the MuyGPs method, and contains various descriptive notebooks covering everything from curating a dataset of real stars and galaxies, to obtaining a classification accuracy. For more information: