Data Science Workshop Embraces Multidisciplinary Community

Since launching in early 2018, Lawrence Livermore National Laboratory’s (LLNL’s) Data Science Institute (DSI) has hit the ground running with a series of seminars, collaborative sessions, reading groups and other activities for the LLNL data science community. On August 7–8, the DSI hosted its inaugural offsite workshop, which was co-sponsored by the University of California (UC).

More than 200 people gathered at Garré Winery in Livermore to share their work and explore innovative techniques in all fields of data science—from statistics and computer simulations to machine learning and computer vision. The agenda featured 46 talks and 47 posters plus networking breaks and an evening reception.

DSI director Michael Goldman opened the event by explaining that the institute was established to facilitate the growth of data science at the Lab and beyond. He stated, “One of our goals is to break down as many stovepipes as possible.”

presenter standing among the audience
Yong Han (standing), LLNL staff scientist and group leader for Functional Material Synthesis and Integration, described a project that combines materials science with data science. His team is using machine learning algorithms to extract targeted information from scientific papers with emphasis on nanomaterials synthesis, looking for chemical “recipes” for feedstock developments. The goal is to efficiently gather data from the literature and rapidly analyze complex relationships to accelerate the materials discovery, optimization and deployment processes.

Representing the UC Office of the President, Kim Budil, UC’s vice president for national laboratories, welcomed attendees with a charge to drive the field forward. “You name it, someone in this room is researching it and applying data science to it,” she said, emphasizing the importance of finding new collaborations.

Also on hand to welcome the audience were Anantha Krishnan and Bruce Hendrickson, LLNL associate directors for Engineering and Computation, respectively. “Data science is one of the fastest growing areas at the Lab,” noted Krishnan. Hendrickson added, “This is an opportunity for the broad community of data science researchers at LLNL to get acquainted with each other. This workshop serves an internal as well as an external purpose.”

In addition to LLNL attendees, workshop participants hailed from Los Alamos and Lawrence Berkeley national laboratories, several UC campuses (Berkeley, Davis, Irvine, Los Angeles, Merced and Santa Cruz) and Bay Area research organizations.

Leveraging the Location

The DSI’s vision for the workshop revolved around the potential of established relationships. “We wanted to share our successes, failures, experiences, knowledge and problems in data science with UC campuses and other UC-affiliated labs to help foster interest in what we do at LLNL,” explained Goldman. “UC has top-notch researchers, students and faculty across the subject matter domains relevant to LLNL’s data science mission. We wanted to leverage our proximity and affiliation as best we can.”

Face-to-face interaction was crucial for realizing workshop goals. Kassie Fronczyk, from LLNL’s Computational Engineering Division, helped organize the event. She added, “When physically in the same room for extended periods of time, participants tend to talk about more than just their main projects. We were able to connect people with curiosity about areas they don’t necessarily work in day to day but would like to move into. This simply doesn’t happen via e-mail, phone or short meetings.”

Advancing the Discipline

Data science has the potential to transform many scientific and nonscientific fields, as illustrated by the range of topics highlighted at the event. The workshop introduced mission-driven data science projects across multiple LLNL directorates, including teams associated with the ATOM (Accelerating Therapeutic Opportunities in Medicine) consortium and ERNIE (Enhanced Radiological Nuclear Inspection and Evaluation) system.

Talks were divided into 12 sessions—cognitive simulation, uncertainty quantification, computer vision, methods, statistics and several categories of practical applications. Technical posters covered analysis and prediction techniques for scenarios as disparate as tracking gaseous chemical plumes, detecting magnetic anomalies, optimizing nanoprinted structures, controlling autonomous vehicles and evaluating congressional voting records.

a workshop attendee considers a poster
Nearly 50 posters lined the perimeter of the meeting space at Garré Winery. Many external participants gave talks and submitted posters, as in this example from Lawrence Berkeley National Laboratory. A key workshop objective was finding synergies among institutions.

Goldman stated, “The national labs and UC work on a tremendous number of topics, and only a fraction was covered in the two-day workshop. We saw advancements in agriculture, climate, energy, materials science, cybersecurity, nonproliferation, precision medicine, predictive biology and space security. The list goes on.”

Several talks and posters demonstrated that the National Ignition Facility (NIF) is full of data science opportunities. Six presenters detailed inertial confinement fusion (ICF) research advancing capsule design and quality control, experiment calibration, physics simulations and implosion predictions. A seventh speaker described using machine learning to track damage on NIF’s thousands of optics.

Brian Spears, a design physicist in LLNL’s Weapons and Complex Integration directorate, presented his team’s efforts in using machine learning to compare ICF simulations and experiments. “Data science techniques are the next tool in the toolbox for predictive scientists,” he said. “We at LLNL have a chance to drive this technology for national security, using supercomputers in ways that can only be done at the Lab. We need to be looking 10 to 15 years ahead.”

According to Spears, the workshop provided attendees with several advantages such as building collaborations that might not otherwise exist. He also cited the importance of interacting with the students present. “Given the competition in Silicon Valley, this event is an efficient means of showing young researchers what the Lab does,” Spears noted.

Showcasing the Students

a student discusses his poster with another workshop attendee
The workshop enabled DSSI interns like Ryan Bockmon (right) to share their summer projects with the data science community.

Data Science Summer Institute (DSSI) directors Marisol Gamboa and Goran Konjevod brought the program’s 26 students to the workshop to expose them to the larger data science community. The interns were invited to submit posters detailing their summer projects, which explored topics as diverse as energy consumption, satellite imagery, drug molecules and disease treatments.

“By attending the workshop, students were able to see the breadth of work at the Lab. This also gave Lab scientists a chance to meet potential hires,” Konjevod explained. “We got some students and postdocs interested in the Lab, which is a big win,” Goldman added.

One DSSI student’s work was selected for the presentation session on neural networks. Brian Bartoldson, a PhD candidate at Florida State University, described a project that retrieves text from human-annotated videos. Working under LLNL mentor Brenda Ng, Bartoldson developed a multimodal tool that extracts meaningful features from a sample set of 15 hours of videos with 50,000 annotations. The goal is to help nonproliferation analysts obtain important data from relevant video footage.

Bartoldson returned to LLNL for a second summer to continue applying his doctoral research at the Lab. The workshop gave him valuable experience in other skills as well. “I haven’t given a talk at an event like this before,” he stated. “I was excited to receive feedback while sharing my ideas with others, and it’s great to advance the field at the same time.”

Expanding the Community

Adam Kellerman made the trip from Los Angeles to learn more about data science techniques and find collaborators. “I wanted to understand the state of the art in this field so I can adapt it to my own research,” he said.

Kellerman’s work in UCLA’s Earth, Planetary and Space Sciences department includes investigating Earth’s radiation belts and the atmospheric phenomenon of electronic precipitation. He noted, “GPS signals, spacecraft and satellites can be affected by these types of interference.” During the workshop, Kellerman gave a talk and connected with an LLNL materials scientist.

Felipe Mejia and Paul Gamble from Lab41 (Menlo Park, California) hoped to meet other data scientists with overlapping interests. Their organization tackles national security projects, so they saw common themes throughout the workshop sessions.

“I was able to connect with other speakers working on biological applications,” said Gamble, whose talk outlined the use of machine learning for detecting and characterizing synthetic DNA. “We can learn a lot from what others are doing.”

Mejia presented his project developing security measures to address vulnerability attacks on data sets. He noted, “I was impressed by other talks describing methods for interpreting models and making sure they behave properly.”

Kellerman, Mejia and Gamble all look forward the next DSI workshop. “I’ll definitely come back and bring some of my UCLA colleagues with me,” stated Kellerman.

Planning the Future

As with any first-time event of this size and scope, the format will likely undergo adjustments in the future. “The DSI will certainly host another workshop,” asserted Goldman, citing several suggestions from attendees—some of whom were thinking about organizing their own similar events.

Fronczyk expressed hope for more student involvement next time. “The Lab has summer students who work on data science problems but were hired outside of the DSSI. Not many external people brought their students. We’ll have to figure out how to extend the invitation,” she stated.

Overall, feedback was positive. Goldman said, “We witnessed several people across campuses and organizations make new connections with those they may never have had a chance to meet or interact with in their daily work. We heard statements like ‘We didn’t know the Lab was involved in X’ and ‘I had no idea the labs worked on these types of problems,’ which I think justifies the need for this type of event.”

Acknowledgments

The DSI’s Data Science Council includes Peer-Timo Bremer, Barry Chen, Dan Faissol, Ana Kupresanin and Michael Schneider. In addition to Fronczyk, Dave Buttler, Ginny Dance-Rios and Cindy Gonzales helped organize the workshop. The 12 sessions were moderated by Rushil Anirudh, Jason Bernstein, Brenton Blair, Ryan Goldhahn, Derek Jensen, Steven Magana-Zook, Brenda Ng, Giuliana Pallotta, Brenden Petersen and Ana Paula Sales. Along with Budil, Kathy Glasgow and June Yu provided valuable support from UC.

Workshop presentations are posted here.

—Article by Holly Auten/LLNL
—Photos by Ian Fabre/LLNL