May 18, 2023

Celebrating the DSI’s first five years

Holly Auten/LLNL

View the LLNL Flickr album Data Science Institute Turns Five.

Data science—a field combining technical disciplines such as computer science, statistics, mathematics, software development, domain science, and more—has become a crucial part of how LLNL carries out its mission. Since the DSI’s founding in 2018, the Lab has seen tremendous growth in its data science community and has invested heavily in related research. Five years later, the DSI has found its stride with a multipronged strategy of raising awareness about the field, encouraging partnerships across the community, supporting researchers, and nurturing the next generation of data scientists.

In response to the worldwide emergence of artificial intelligence (AI), machine learning (ML), and related technologies, data science gained recognition as a discipline during the most recent decade of the Lab’s 70-year history. In 2013, LLNL launched a Data Science Initiative to align national security priorities with high-performance computing capabilities and data-intensive applications. The integration of data science into research had become important in part because the Lab’s unique experimental facilities generate tremendous amounts of data every day. Moreover, DSI director Michael Goldman recalls, “ML and AI were quickly becoming integral parts of every discipline at LLNL, and we needed to find alternative ways to help organize our efforts and workforce.”

Through the Initiative, data science techniques became more prevalent in Lab programs, but some needs remained: develop the workforce and adapt to the rapid pace of research in this field. In 2017, Jim Brase, Rob Sharpe, and Eric McKinzie—deputy associate directors for Computing Programs, Engineering R&D, and Global Security, respectively—proposed a new institute to advance the state-of-the-art in LLNL’s data science capabilities as well as build a data science community and workforce pipeline. The trio recruited researchers from different areas of the Lab to form a leadership Council with Goldman, a Computing group leader and associate program leader in Global Security, as director. Funded by the three directorates, the DSI quickly gathered momentum and absorbed an existing student internship program called Data Heroes.

Research Momentum

One of the DSI’s primary goals is to support the Lab’s researchers—and thus their scientific discoveries—in two complementary pursuits: (1) mission-driven applications that benefit from data science, and (2) the advancement of foundational data science techniques and tools. For example, a team of domain scientists and ML experts may be using neural networks to optimize a material’s synthesis process, while also working to ensure that their ML algorithms are reliable and accurate for that task.

The DSI’s website showcases numerous examples of LLNL research enabled by data science expertise, categorized into precision medicine, materials and advanced manufacturing, scientific ML foundations, national security, basic science, and energy applications. The DSI does not directly fund research projects, so instead it furnishes a range of resources including workshops, technical seminars, and reading groups, among many others.

Led by applied statisticians Jason Bernstein and Kathleen Schmidt, the DSI’s Consulting Service (DSICS) offers statistical and ML expertise to Lab research teams on a short-term basis. These consultations can turn into long-term collaborations, such as Laboratory Directed Research and Development (LDRD) projects, and help strengthen ties across the Lab.

Bernstein explains, “Consultants are brought in at all stages of a project, from determining how many experiments need to be completed, to suggesting methods to analyze already collected data. Additionally, the DSICS provides a way for data scientists to meet new people at LLNL, learn about the mission areas they work in, and potentially work in those areas themselves. The consulting service often helps the consultant as much as the consultee.”

The DSI Council was instrumental in ensuring data science representation in LDRD proposal review committees, which are made up of LLNL experts in many disciplines. For three decades, the competitive LDRD program has funded research projects with emphases on exploration, innovation, and paradigm disruption. But until recently, the proposal review process had not included data scientists, even as more proposals began to incorporate data science methods and tools.

“It takes people with data science background to appreciate these proposals,” states Ana Kupresanin, who has been on the DSI Council since its inception and serves as deputy associate director for LLNL’s Weapon Simulation and Computing Computational Physics program. “The DSI brought awareness of the discipline to the forefront and pushed for a committee of data science experts to evaluate and provide input during the LDRD proposal process.”

conference room full of people sitting at long tables while Brenden presents at the front of the room
A reading group focusing on reinforcement learning techniques kicked off in 2019.

Community Through Collaboration

The DSI has expanded its reach through many collaborative activities beyond those mentioned above, including conference participation and a monthly newsletter. The DSI’s website is home to the Department of Energy (DOE) Data Days workshop materials as well as LLNL’s AI Innovation Incubator, a hub for industry partners to form collaborations around hardware, software, tools, and utilities that accelerate AI for applied science.

The Lab’s enduring relationship with the University of California (UC) system is stronger thanks to the DSI’s workshops, student programs, and Open Data Initiative (ODI). Since 2019, the ODI has published unique public datasets under the direction of LLNL research scientist Rushil Anirudh. The UC San Diego library catalog includes several of the datasets, which represent a wide variety of scientific domains such as medical diagnostics, astrophysics, additive manufacturing, atmospheric science, and inertial confinement fusion. Students and researchers can use these datasets to explore applications of algorithms, models, and methodologies. The larger data science community has noticed: The National AI Initiative website lists the ODI as a key data resource for AI research and development.

The DSI also promotes diversity and inclusion within the community, such as through the Data Science Challenge (DSC; more on this program below), which mentors underrepresented minority students, and co-sponsorship of the Women in Data Science (WiDS) Livermore conference. Held annually on International Women’s Day and in conjunction with the global WiDS conference, WiDS Livermore provides a space for women to share their career journeys and work–life balance, learn how navigate a male-dominated field, and discuss the importance of mentorship. In 2021, event organizers launched a year-round virtual career panel series inspired by WiDS. Panelists have come from many areas of the Lab: women in leadership positions, former interns now hired as full-time staff, scientists investigating COVID-19, advanced manufacturing researchers, software developers, and members of employee resource groups.

Marisol Gamboa advises several WiDS attendees sitting in a small group
Co-sponsored by the DSI, the 2023 WiDS Livermore event included mentoring sessions for both in-person and online attendees.

Jump-Starting Careers

LLNL’s employee population has grown substantially over the last few years, and the Lab is still hiring in anticipation of increased DOE mission requirements and upcoming retirements. The DSI has taken this recruiting charge to heart through educational programs with multidisciplinary mentoring.

Designed for undergraduate and graduate students, the Data Heroes program was rebranded as the Data Science Summer Institute (DSSI) under the DSI’s purview. The 12-week internship gives students hands-on experience with real Lab projects, such as using deep learning techniques to detect asteroids in telescope images or training a neural network to identify drug compounds from virtual molecule screening datasets. the DSSI has welcomed Japanese students into the program each summer beginning in 2020, thanks to LLNL’s relationship with the Government of Japan’s Ministry of Education, Culture, Sports, Science and Technology. The class of 2023 will include three interns from Japanese universities.

The program is highly competitive, with thousands of applicants vying each year for about 30 spots. More than 150 students have completed DSSI internships—many of them returning for a second summer—and nearly 30 have been offered full-time LLNL staff positions. Nisha Mulakken, who co-directs the DSSI alongside Amanda Muyskens, states, “I continue to be amazed by the quality of resumes we receive year after year. There are extremely bright, talented students who just need a chance to prove themselves outside of school. Through real-life work experience and exposure to novel data science applications for challenging scientific problems, the DSSI provides exactly the opportunity many students need to jump-start their data science careers.”

Building upon the Lab’s connection with the UC system, the DSI launched the DSC mini-internship in 2019 with UC Merced. The program, which has since expanded to include UC Riverside, offers an intensive two-week internship where students work in teams on a scientific problem with mentoring from LLNL data scientists and UC professors. Undergraduates tackle the problem in teams, while graduate students gain experience as team leaders. This year, students will use electrocardiogram data and machine learning techniques to reconstruct electro-anatomical maps of the human heart.

“A lot of undergrads coming into DSC hadn’t previously considered a science career. It’s great to see students get excited about this whole new world of career opportunities opening up to them,” explains DSC co-director Brian Gallagher. “Over 25 percent of DSC alumni apply for LLNL internships, and many of the undergraduates go on to grad school.”

In 2023 the DSC will, for the first time, host Merced and Riverside students during the same two-week period at the Lab’s new University of California Livermore Collaboration Center (UCLCC). Gallagher says, “The DSC program continues to evolve, but the core goals remain the same: bringing together students with diverse skills and backgrounds, providing a real-world experience of computational science careers, encouraging students to pursue graduate degrees, and strengthening the workforce pipeline between UC and LLNL.”

The Lab’s data science workforce is also growing from within. In 2019, the DSI created an Immersion Program that enables nontechnical LLNL staff to transition from an unrelated educational and/or career background into a data science field. For up to a year, the mentee works closely with mentors to gain on-the-job training, and leverages the Lab’s Education Assistance Program to earn a relevant degree. The two data scientists who have completed the program to date now work in LLNL’s Global Security Computing Applications Division.

“The Data Science Immersion Program allowed me to hone my skills without needing to leave the Lab,” states Cindy Gonzales, who joined LLNL in 2016 in an administrative position and now co-directs the DSC with Gallagher. “I’m grateful for this opportunity, as it completely changed my career trajectory at the Lab and in pretty much every other aspect of my life.”

a group of 32 students and mentors stand outside the supercomputing building
Originally dubbed Data Heroes, the Lab’s data science–focused student internship program was rebranded as DSSI when the DSI was founded.

Pandemic Pivot

The DSI was only two years old when California’s COVID-19 lockdown began. With workshops, seminars, internships, and other activities already established, director Goldman recognized that the DSI needed to adapt as much as possible to virtual formats. “We were still in the midst of building relationships and programs that had to be rethought or reprioritized,” he recalls. “Our data science workforce needs weren’t slowing down. Pivoting to an all-virtual summer program was certainly unexpected and a difficult challenge, but we now have a hybrid program and have still been able to keep pace with hiring needs.”

Responding to the virus itself, many of LLNL’s biomedical and biotechnology research efforts transitioned into coronavirus-related projects, including computational workflows incorporating ML algorithms and a public database of SARS-CoV-2 protein structures and molecules. The DSI helped promote these efforts and hosted a three-day online workshop focused on AI in healthcare applications.

Nowadays, the Lab has been pivoting to a hybrid environment, and the DSI is following suit as appropriate for each event. For instance, after three years of remote sessions, all DSC students and most DSSI students will work onsite at LLNL this summer. WiDS Livermore welcomed both in-person and online participation in early 2023, while technical seminars began accommodating hybrid attendance in late 2022. Council member Kupresanin notes, “We might never have gone virtual with our seminar series, and now it brings in a much larger audience because it can include external participants.”

To maximize the physical proximity of data scientists when they are onsite, the Council secured office and meeting space in a newly renovated building formerly inhabited by the Lab’s Innovations and Partnerships Office. “The idea for this this space arose as the DSI was forming, and with so many moving parts and people involved, it took five years to get to where we are today. This unique area joins junior and senior data scientists from across different directorates. We hope this space will inspire new ideas and mentorship, while increasing reciprocity between LLNL programs and data science staff,” Goldman says. Kupresanin adds, “The building is a hub of activity. Now more than ever we need that kind of physical space to connect with each other.”

5x5 grid of students and mentors in a videoconference
The 2021 Data Science Challenge with UC Merced (cohort shown here) and UC Riverside was held virtually using online collaborative tools.

A New Phase

Goldman served as DSI’s director until April 2023, then passed the baton to materials and manufacturing researcher Brian Giera. “Leading the DSI was a tremendous experience for me,” Goldman states. “I’m excited to see what the next five years will look like, and I wish Brian, the Council, and the Lab’s data scientists the best of luck in continuing the DSI’s success. Back in my programmatic role, I hope to contribute to DSI as so many others have.”

The strength and potential of LLNL’s data science community inspired Giera to apply for the directorship. He says, “The DSI is the product of many contributors in data science research and workforce development. I wanted the chance to cultivate and shape these efforts that make LLNL a hub of data science.”

Indeed, in any year, the Council and supporting staff strive to establish LLNL as a top-tier destination for data science research and applications, helping the Lab attract people excited about pushing the boundaries of frontier science. Gonzales explains, “The DSI continues to grow its offerings every year and improve its ongoing activities. Whether it’s the DSC, the DSSI, or the WiDS Livermore conference, the DSI provides amazing programs for all levels of current and future Lab employees.” Kupresanin adds, “The DSI is uniquely experimental in that it spans all parts of the Lab. It provides a sense of belonging to people who do similar kinds of work. Outside of LLNL, our external engagements are a platform that tells the world who we are and what we work on.”

Looking back over the past five years, Goldman is unequivocal. “The accomplishments of DSI are not mine,” he points out. “This is a collective effort led by many individuals who have the passion to see data science thrive at LLNL.”