Data Science Summer Institute (DSSI)

You have a passion for data science and problem-solving.
We offer access to world-class staff and computational resources.

What is the Data Science Summer Institute?

Lawrence Livermore National Laboratory is offering data science graduate students and advanced undergraduate students like you the opportunity to join the Data Science Summer Institute (DSSI) — interns who get to work on real problems that really matter to our country.

This is a flexible summer internship program that runs for 12 weeks. As a member of DSSI, you will be able to take your ideas, passion, and the skills you've acquired in Machine Learning, Statistics and High Performance Computing and apply them to projects in areas of National Importance.

The Laboratory's next generation science and engineering are being applied to achieve breakthroughs for counterterrorism and nonproliferation, defense and intelligence, energy and environmental security. The program for the summer of 2018 anticipates bringing in a limited number of undergraduate and graduate science and engineering students with backgrounds in machine learning, applied mathematics, computer science, or statistics to work with some of the best minds in data science to tackle some of the world's largest problems.

The availability of massive amounts of data, along with high-resolution models enabled by High Performance Computers, will help us better understand how power production is related to atmospheric variables (such as wind speed and turbulence) across a broad range of spatial and temporal scales and in widely varying geographic areas. This understanding will be used to optimize power production from wind farms and to improve the fidelity of forecasting models that relate power output to atmospheric conditions.

You'll work with our Staff Scientists on high-impact problems

At the beginning of the program, you'll be paired with one of our mentors — LLNL staff scientists and engineers who are recognized experts in their fields. Your mentor will develop a purposed research project for you that will be significant, yet manageable within the length of the DSSI Program. Some of these projects will be designed to lead to follow-on work for students who would like to return in subsequent years. A few of our mentors and their projects from previous years include:

  • Barry Chen — Large-Scale Self-Supervised Multimodal Deep Learning Research — We are developing new deep learning algorithms that map images, video, and text into a joint semantic feature space where conceptually related items are proximal. To make this vision a reality, we are developing new scalable training algorithms that take advantage of LLNL’s world-class supercomputers for rapidly training large neural networks on massive datasets.
  • Jonathan Allen — Molecular Markers for Diagnostic and Countermeasure Design — We are developing and applying data analytic tools to biological datasets to extract molecular markers for diagnostic and countermeasure design. A combination of machine learning, bioinformatics and statistical analysis tools are used to detect molecular markers of disease.
  • Brian Van Essen — Deep Neural Networks (DNNs) — We are applying parallel programming techniques to Deep Neural Nets in order to leverage the unique characteristics of existing and upcoming High Performance Computing systems, as well as new architectures such as IBM’s TrueNorth Neurosynaptic architecture.

Students will have access to supercomputing resources including Catalyst and Surface (for GPU processing)

High-performance computing has been a central strength of the Laboratory—vital to mission success and a source for scientific discovery and technological innovation. Catalyst computing nodes include 128 gigabytes of dynamic random access memory (DRAM) per node and 800 gigabytes of non-volatile memory (NVRAM). It is particularly well suited to solving big data problems, such as those found in the areas of bioinformatics, graph networks, machine learning, and natural language processing. Surface computing nodes include dual 8-core Xeon processors and NVIDIA Tesla K40 GPUs. Surface is well suited for applications that can leverage the parallel GPU cores.

What else will you do?

In addition to individual project work with your mentor, you'll be able to participate in DSSI seminars and short courses, such as:

  • Big data analysis using Spark
  • Deep Learning with Caffe
  • Practical data analysis with R

You'll also participate in the DSSI Grand Challenge, where teams of students will work together to develop and present a system of solving hard problems relevant to the Laboratory's Mission. You'll also have opportunities to tour some of LLNL's unique facilities, such as the National Ignition Facility, the Terascale computing facility, and the Additive Manufacturing facility.

Applying to the Data Science Summer Institute program at LLNL

Applications for the 2018 program are now closed. Thank you for your interest. We expect to open the application process for the 2019 program in September.