Open Data Initiative

The DSI’s Open Data Initiative (ODI) enables us to share LLNL’s rich, challenging, and unique datasets with the larger data science community. Our goal is for these datasets to help support curriculum development, raise awareness around LLNL’s data science efforts, foster new collaborations, and be leveraged across other learning opportunities.

As we develop this catalog over time, the data will represent a wide variety of key LLNL mission areas and may include subsets of some of the world’s largest datasets. We plan to provide data ranging in complexity from dense, featureful, labeled datasets with well understood solutions to those that are sparse, noisy, and largely unexplored. These datasets can also be used to test novel hardware solutions for scalable machine learning platforms.


Download (LLNL-MI-84834) | License

top: rainbow-colored plume of pollution marked with wind speed and direction as well as colormap deposition pattern; bottom: colormap chart showing color intensity on the y-axis and relative deposition on the x-axis

In fluid mechanics problems, computational fluid dynamics (CFD) uses data structures and numerical analysis to investigate the flow of liquids and gases. Researchers in LLNL’s Atmospheric, Earth, and Energy Division use CFD models to simulate atmospheric transport and dispersion. For instance, simulations of wind-driven dynamics can be used to train machine learning models that, in turn, can predict spatial patterns with high accuracy.

This dataset includes 16,000 CFD simulations—15,000 training cases and 1,000 test cases—post-processed for machine learning training. The physics problem is a 2D spatial pattern formed from a pollutant that has been released into the atmosphere and dispersed for up to an hour while undergoing deposition to the surface. The pollutant’s release location is assumed to occur anywhere in a 2D domain of 5000 × 5000 meters. The release is initialized from a small bubble centered 5 meters above the surface, has a radius of 5 meters, and has internal momentum that causes it to expand within the initial minute of simulation time. All the realizations use unit mass releases, and the resulting deposition patterns can be scaled proportionately for other mass amounts. The pollutant is blown in a direction controlled by the large-scale atmospheric inflow winds expressed with variable wind speeds. The goal is to predict a deposition image given its associated release location and wind velocity. The research was funded by the National Nuclear Security Administration, Defense Nuclear Nonproliferation Research and Development (NNSA DNN R&D).

References: (1) Predicting wind-driven spatial deposition through simulated color images using deep autoencoders; (2) Large eddy simulations of turbulent and buoyant flows in urban and complex terrain areas using the Aeolus model

View all datasets in the UCSD LLNL collection.