Dec. 14, 2020

After successful virtual summer, DSSI looks ahead to 2021

Holly Auten/LLNL

In a year distinguished by the COVID-19 pandemic, LLNL’s Data Science Summer Institute (DSSI) pivoted quickly to an all-virtual program. The 28 students in the class of 2020 worked from their homes and attended online seminars, one-on-one mentoring sessions, team-building games, and other activities. Instead of an in-person poster session, the students participated in a virtual Summer Slam where they presented their projects—in just three minutes—to a review committee for feedback.

DSSI director Goran Konjevod states, “When the Laboratory reduced onsite operations in response to the pandemic, we had to move quickly to adapt the DSSI program to a fully remote setting. We kept almost all of the components of the traditional onsite program, including lectures by both Lab staff and remote visitors, and sessions for work on Challenge Problems. While some aspects of the program had to be modified, we have received great feedback from the students and mentors, and are working on additional improvements for the next iteration. Even with the ‘virtual’ internship, our interns have clearly done excellent work, some of which has already made its way into papers submitted for publication.”

Real-World Projects

Biostatistician Nisha Mulakken mentored graduate student Emilia Grzesiak (Duke University). During the virtual program, they corresponded daily via email and met regularly on video chat for a more personal connection. Their project applied machine learning to trace CRISPR technology vectors to the source lab. CRISPR is an increasingly popular genome-editing tool that may not always be used ethically.

Mulakken explains, “The CRISPR process creates small signatures within the vectors that are not easy to detect without the aid of these algorithms. Emilia used a convolutional neural network to identify the source lab from the patterns in the vector sequences. She replicated the results from literature and dramatically improved the accuracy using different model optimizations.”

Andrew Gillette, a computational scientist in LLNL’s Center for Applied Scientific Computing (CASC), mentored two PhD students this summer as they set up a machine learning workflow for computational fluid dynamics simulations. Justin Crum (University of Arizona) and Craig Gross (Michigan State University) explored vortex formation using two open-source software libraries and compared the results.

Gillette says of the remote work arrangement, “With all of our meetings online, a faculty colleague with shared interests could easily join our discussions, enriching the experience for all of us. Even though we were forced into an online-only work environment, it was a welcome surprise to find that the rising generation of scientists will have great tools at their disposal to work effectively from anywhere.”

Applied statistician Jason Bernstein mentored PhD student Jordan Murphy (University of Colorado at Boulder). He explains, “Jordan has expertise in reinforcement learning and aerospace engineering, so we worked on a research project that applied these skills to a probabilistic modeling problem. The goal of the internship was to use reinforcement learning to control a spacecraft subject to random perturbations.”

Bernstein, who works in LLNL’s Computational Engineering Division, was excited to watch students make rapid progress on their research problems. “The next generation is highly motivated and technically capable,” he says, adding that the virtual setting was not a hindrance. “I was pleasantly surprised that much of the spontaneity and informal brainstorming of previous summers carried over to the virtual world.”

CASC cybersecurity expert Celeste Matarazzo regularly works with interns and this summer co-mentored four DSSI students who received GEM Fellowships. The National GEM Consortium helps students from underrepresented populations pursue internships and graduate education in applied science and engineering. The GEM DSSI students were tasked with solving problems involving network simulations, network security analysis, graph analytics, and machine learning approaches for cybersecurity detection. For example, graduate student Anthony Guerra (University of Southern California) used a random forest classifier on network traffic to determine the presence of a cyber-attack.

“The online summer program went very well. I had excellent interactions with the students virtually,” says Matarazzo, who mentored Guerra alongside cybersecurity colleague Kristine Monteith.

Leadership Transition

At the summer session’s conclusion, DSSI co-director Marisol Gamboa handed off her duties to Mulakken. “Since 2018, we have increased our female and underrepresented minority applicant pool significantly and transitioned many students to full-time data scientist positions at the Lab,” notes Gamboa, whose new role focuses on developing the Computing Directorate’s workforce.

Gamboa has enjoyed her time with the program and looks forward to the DSSI’s continued growth and success. She states, “I’ve met some remarkable students over the years and helped to place them with mentors leading the way in data science. Nisha is very talented and will bring her own ideas to elevate the program to the next level.”

Mulakken, who interned at LLNL four times during her undergraduate and graduate studies, says, “I hope the class of 2021 will experience the Lab’s collaborative culture, learn about academic topics and practical applications they may not have been exposed to yet, and genuinely enjoy getting to know each other and their mentors.”

Konjevod adds, “While we are hoping for an end to the pandemic, our current plan is to run the DSSI as a virtual program again in 2021. Having gone through the experience once, we can better understand the drawbacks of the virtual setting and make improvements in an attempt to match the full onsite experience from previous years.”