The DSI recognizes contributions staff make to LLNL's data science community through periodic Data Scientist Spotlights.
With an M.S. in Statistics and Applied Math from UC Santa Cruz, Mary Silva knows firsthand how the Lab’s multidisciplinary approach to teamwork can elevate everyone involved. “Even without an extensive background in biology, I can contribute to vaccine development and target identification while utilizing domain experts’ knowledge to interpret models and results. These experiences have inspired me to take computational biology courses. A data scientist is forever a student,” she explains. A former DSSI intern, Silva joined LLNL in 2020 and today works on active learning and Bayesian spatial models for rapid design of COVID-19 antibodies, as well as enhancing machine learning models through the multi-institutional Scalable Precision Medicine Open Knowledge Engine (SPOKE) project. As a mentor, she helps students improve their weaknesses. For instance, she says, “If a student doesn’t have public speaking confidence, I can give them opportunities to present their work to an audience.” Silva also co-organizes the Lab’s Women in Data Science (WiDS) event and datathon challenge. “I found my Lab internship by attending WiDS Livermore with my professor, and the ability to socialize and network with LLNL researchers kicked off my career,” she states.
According to Amanda Muyskens, the best thing about being a statistician is the opportunity to work on—and learn from—unique challenges. She joined the Lab in 2019 after earning a PhD in Statistics from North Carolina State University, and today her research includes Gaussian processes (GP), computationally efficient statistical methods, uncertainty quantification, and statistical consulting. Muyskens is the principal investigator for the MuyGPs project, which introduces a computationally efficient GP hyperparameter estimation method for large data (watch her DSI virtual seminar and the MuyGPs video). Her team used MuyGPs methods to efficiently classify images of stars and galaxies, and they developed an open-source Python code called MuyGPyS for fast implementation of the GP algorithm. Muyskens credits the team dynamic for its success, noting, “We constantly teach each other from our disciplines and achieve things together that wouldn’t have been possible alone.” In 2022, Data Science Summer Institute (DSSI) students contributed to MuyGPs with parameter estimation optimization and an interactive visualization tool. “Students bring a new perspective to the work, and I’m inspired to see them tackle problems in ways that those of us entrenched in the applications may never have considered,” Muyskens says. Most recently, she assumed DSSI co-directorship and began a collaboration with Auburn University data science students.
With Biomedical Engineering degrees from Duke University, Emilia Grzesiak contributes to LLNL’s COVID-19 research by comparing simulations to bioassays that measure the binding affinity between the virus’s variants and antibody candidates. She also builds analysis and visualization tools to identify antibody designs that could be useful drug candidates. “I’m excited to help with therapeutic design decision-making and speed up the drug-design process,” she says. Grzesiak joined LLNL’s Global Security Computing Applications Division in 2021 after interning with the Data Science Summer Institute (DSSI) the previous year. Now, as a first-time mentor, she states, “I’m figuring out when to let go of the reins and when to step in more. Establishing trust and open communication is important, as making those judgment calls becomes easier when you understand how your intern approaches problems and what kind of advice they respond best to.” Grzesiak recently shared her career journey and research highlights during a DSI-sponsored panel discussion and a seminar for the DSSI's class of 2022.
Data Scientist and Machine Learning Researcher
Hyojin Kim is a data scientist and machine learning researcher at LLNL’s Center for Applied Scientific Computing. His research interests in machine learning and computer vision are recently related to applications for computed tomography, AI-driven drug discovery, scalable and distributed deep learning, and multimodal image analysis. He also has hands-on experience applying GPU computing to challenging problems in these areas. Balancing research and development, as well as learning domain knowledge, are crucial because, Kim says, “I often see data scientists trying to apply a new technique to a particular domain application where it may not be suitable.” This summer, Kim mentored students from two University of California campuses in DSI’s Data Science Challenge to accelerate drug discovery for COVID-19. During the intensive two-week program, he states, “Many of the students I met were enthusiastic, and some of them came up with brilliant ideas that I never thought about before. Students majoring in fields other than computer science are quite knowledgeable in data science, and I actually feel the growing popularity of data science in recent years.” Kim joined LLNL in 2013 after earning his Ph.D. in Computer Science from UC Davis in 2012.
Kevin McLoughlin has always been fascinated by the intersection of computing and biology. As a graduate student in the 1980s, he worked on early neural network simulations to understand the human brain. He recalls, “Computational biology as a field didn’t exist then, but that changed when the Human Genome Project launched.” After a stint with a biotech startup, McLoughlin joined the Lab in 2004 to work on pathogen bioinformatics. Since 2017, he has participated in the Accelerating Therapeutics for Opportunities in Medicine (ATOM) consortium, which combines HPC and data science techniques to design drugs for cancer, pathogens, and other diseases. “ATOM is enormously important, draws on my full skill set, and demands that I constantly learn new things. I work with extremely smart people from LLNL and our partners,” he says. McLoughlin helped develop a COVID-19 antiviral drug design pipeline that combines a computational autoencoder framework with machine learning algorithms to propose molecular structures, identify those with desirable properties, and suggest new molecules based on the best results—research that earned one of the Lab’s 2021 Excellence in Publication Awards. He holds a PhD in Biostatistics from UC Berkeley.
With a PhD in Mathematics from the University of Illinois at Urbana-Champaign, Sarah Mackay enjoys using mathematical techniques to make inferences about real-world systems. She draws on her experience in combinatorial optimization, network science, and statistics to perform risk analyses for LLNL’s Cyber and Infrastructure Resilience program. Mackay designs and implements algorithms to secure infrastructure such as power grids, gas pipelines, and communication systems. “This work involves making assumptions about the structure of the system we’re studying. It can be challenging to know if the assumptions are valid and, thus, if we can trust our conclusions,” she explains. Mackay, who also coordinates the DSI’s virtual seminar series, thrives in the Lab’s culture of interdisciplinary teamwork. She states, “The set of problems one can tackle becomes so much larger when the pool of expertise grows.”
The Lab’s biosecurity mission relies on multidisciplinary expertise in areas such as molecular biology, bioinformatics, high-performance computing, and machine learning. LLNL’s Jonathan Allen seeks to understand biological systems that impact human health and safety, in part by working with the Accelerating Therapeutics for Opportunities in Medicine (ATOM) consortium. “An exciting challenge is to synthesize small molecule compounds proposed by a computational model, physically test them, and revise the model based on the experimental feedback,” he explains. Recently, Allen helped expand LLNL’s partnership with ATOM to include Purdue University. The collaboration gave students an opportunity to apply data science techniques to the drug discovery process, searching for novel therapeutics for cancers and other diseases; future plans include evaluation of new COVID-19 compounds. As a mentor, Allen states, “I hope to contribute to a positive learning environment and encourage a healthy, socially thoughtful research community. Every person develops their skills and interests at their own pace and has the potential to do great things.” Allen joined LLNL in 2007 after earning a PhD in Computer Science and Bioinformatics from Johns Hopkins University.
Ryan Dana is a data scientist working in the Global Security Computing Applications Division and the astrophysical analytics group. He was a student intern with LLNL’s Data Science Summer Institute (DSSI) in 2019 and graduated that year with a B.A. in Physics, Astrophysics, and Data Science from UC Berkeley. "The DSSI perfectly connected my coursework in physics, astronomy, statistics, and computer science to solving problems using real-world astronomical data," he says. Dana joined the Lab as a full-time employee in early 2020 and mentored students in the 2021 Data Science Challenge program—a position he returns to in 2022. His research interests include using machine learning techniques to approach astrophysical questions.
Computer scientist Michael Ward strives to improve the world in any way he can. “My motivation is often driven by making things better, whether fixing something that’s broken, providing a better experience for a user, or refining something to be more capable or stable,” he explains. Ward works in LLNL’s Global Security Computing Applications Division on data science projects involving geospatial intelligence, object detection, and imagery processing. “The biggest challenge is keeping up with the pace of the field and supporting technology,” he states, noting that he continually enjoys tackling “tough and unique problems with some of the smartest minds in the field.” Before joining the Lab in 2018, Ward built software for sales training, banking, inventory, and telecommunications. He also taught college-level computer science for four years, and says the experience of finding ways to convey complex ideas and technologies to others has come in handy at the Lab. Ward earned bachelor’s and master’s degrees from the University of South Alabama.
With a passion for outreach and volunteering, Kerianne Pruett enjoys encouraging and inspiring students to pursue STEM careers. She has held roles such as mentor, teacher, and organizer for various K–12 events; led telescope viewings and science demonstrations; and provided resources and support to underrepresented college students, promoting diversity and retention in STEM programs. This summer, Pruett mentored undergraduate and graduate students in two Data Science Challenge sessions—and received awards from LLNL’s National Security Engineering Division and Physical and Life Sciences Directorate for doing so. “When I was informed that this year’s Challenge was astronomy themed and help was needed, I was all over it!” she says. Since joining LLNL in 2019, Pruett supports the Astronomy and Astrophysics Analytics Group and Space Science and Security Program, applying data science to topics such as dark matter, dark energy, and space situational awareness. She points out, “At the Lab, we’re using data science and machine learning across so many different fields and for such a diverse range of applications.” With a B.S. in Physics from UC Davis, Pruett currently pursues a Master’s program in Data Analytics at the Air Force Institute of Technology.
As an applied statistician who enjoys tackling interesting problems, Kathleen Schmidt is never bored. “Nearly every field with a quantitative question can benefit from a statistician, so we get to explore a wide variety of science applications,” she says. Schmidt works primarily on two projects: one with messy physics reaction history data collected from older technology, and another where statistical modeling helps optimize materials strength experiments. Her recent publications include modeling for radiation source localization and material behavior in extreme conditions. During 2019–2021, Schmidt served as technical coordinator for the DSI’s seminars and transitioned the series to a virtual format in 2020. She recently spoke at the 4th Annual Reaction History Workshop and has presented at the Lab’s regional Women in Data Science event. She states, “Each data scientist has an individual area of expertise. In coming together as a community, we all have something to contribute.” Schmidt earned a PhD in Applied Mathematics from North Carolina State University before joining the Lab as a postdoctoral researcher in 2016 and converting to full-time staff in 2018.
Associate Division Leader
Katie Lewis started working at LLNL in 1998, just days after earning a B.S. in Mathematics from the University of San Francisco. She spent 17 years on a parallel mesh generation project that she ultimately led, and has held numerous leadership positions in the Lab’s Computing, National Ignition Facility, and Weapons and Complex Integration directorates. Today, Lewis serves as Associate Division Leader for the Computational Physics Section of the Design Physics Division where her duties span recruiting, hiring, professional development, workforce planning, staff assessment, and diversity and inclusion efforts. Lewis also leads the Vidya machine learning project in LLNL’s Weapons Simulation and Computing Program, where she applies artificial intelligence techniques to high-performance computing simulations. “AI/ML is proving to be a gamechanger in approaching challenging problems related to scientific computing. Employing these techniques to solve problems more accurately and more quickly will lead to greater scientific discovery,” she states.
Mentoring has always been important to Brian Gallagher, who joined the Lab in 2005 after completing his M.S. in Computer Science at UMass Amherst. “I feel extremely grateful for the opportunities I’ve had in my life and the people who have helped me along the way,” he states. After serving as a DSI Data Science Challenge mentor in 2020, Gallagher directs this year’s program for UC Merced and UC Riverside students. “My main goal for the Challenge is to provide an environment where everyone can grow,” he says. “You can see and feel the changes in people from day to day. That’s my favorite part of the experience.” When he’s not working with students, Gallagher leads the Data Science and Analytics Group at LLNL’s Center for Applied Scientific Computing. He contributes to LLNL projects that leverage machine learning for nuclear threat-reduction applications, optimization of feedstock materials, and design of high-entropy alloys. “Because data science is so broadly applicable, I am constantly exposed to new application areas and new people from a variety of backgrounds,” Gallagher explains.
To understand the science of nuclear weapons without underground testing, researchers at LLNL’s National Ignition Facility conduct inertial confinement fusion (ICF) experiments and optimize target designs for higher energy yields. Design physicist Kelli Humbird has developed machine learning models that accurately predict energy yield and reveal the surprising potential of ovoid-shaped targets. She explains, “This is a great example of what machine learning can do because it has no biases. It directed us to a design space we would not have typically considered.” She presented her research on transfer learning for ICF applications at LLNL’s 2020 Women in Data Science regional conference and says, “I feel really lucky to be a part of such cool science.” A former Lab intern and Livermore Graduate Scholar, Humbird holds a PhD in Nuclear Engineering and Physics from Texas A&M University and recently received her alma mater’s Nuclear Engineering 2020/2021 Young Former Student Award.
With a B.S. in Mathematics and Computer Science from UC San Diego, Olivia Miano was poised to join LLNL in the spring of 2020 as a software developer. Then she heard about the Data Science Immersion Program and immediately signed up. “I knew next to nothing about data science when I first joined the Lab, so almost everything I know I learned during the program,” she says. Under the mentorship of David Buttler and Juanita Ordoñez, Miano explored word embeddings and active learning for context-based entity classification as well as authorship attribution and verification with social media data. A year later, she works on natural language processing projects—including information extraction and authorship verification—for LLNL’s Global Security Computing Applications Division. For Miano, the challenges of applying data science are also what make it exciting. She states, “You need domain knowledge on top of your data science, computer science, and math knowledge. And I’m always eager to learn and willing to tackle a challenging assignment, especially when the work is meaningful like what we do at the Lab.”
Marisol Gamboa thrives at the intersection of solving challenges in unique ways and mentoring the next generation. Over her 18-year career at LLNL, she has honed expertise in software engineering, web applications, and big data analytics by developing solutions for numerous defense and counterproliferation programs—such as tools that help Department of Defense personnel distill, combine, relate, manipulate, and access massive amounts of data in a timely manner. “The many lessons I’ve learned over the years have positioned me to tackle any challenge knowing that I am able to learn quickly and adjust to any situation in real-time,” she says. Gamboa is the Deputy Division Leader for LLNL’s Global Security Computing Applications Division as well as Computing’s Workforce Team Lead. She formerly co-directed the Data Science Summer Institute and created the annual Data Science Challenge with UC Merced. Active in outreach to young women and underrepresented minorities in STEM—including LLNL’s Women in Data Science regional events—Gamboa holds a B.S. in Computer Science from the University of New Mexico.
Nisha Mulakken’s research lies at the confluence of biology, computer science, and statistics. Her work in LLNL’s Bioinformatics Group includes enhancing the Lawrence Livermore Microbial Detection Array (LLMDA) system with detection capability for all variants of SARS-CoV-2, as well as analyzing mutations in SARS-CoV-2 proteins to support future discovery of therapeutic compounds. In another project, she uses machine learning to trace unethical use of CRISPR technology to the source lab. Mulakken was recently named the new co-director of the Data Science Summer Institute and looks forward to working with the class of 2021. She says, “I hope the students will experience the Lab’s collaborative culture, learn about academic topics and practical applications they may not have been exposed to yet, and genuinely enjoy getting to know each other and their mentors.” A four-time LLNL summer intern and longtime employee, Mulakken holds degrees in genetics and biostatistics.
Harsh Bhatia’s research in scientific visualization is all about seeing the unseen. “In this field, we can develop new techniques that distill extremely complex data into comprehensible visual information,” he states. His wide range of projects include applying topological techniques to understand the behavior of lithium ions, generating topological representations of aerodynamics data, and analyzing and visualizing HPC performance data. Notably, Bhatia and his collaborators won the SC19 Best Paper Award for their work on the Multiscale Machine-Learned Modeling Infrastructure (MuMMI), which predictively models protein interactions that can lead to cancer. He notes, “MuMMI offers a new paradigm that is arbitrarily scalable and promises to solve the problems no other technology can.” Bhatia was a Lawrence Graduate Scholar and an LLNL postdoctoral researcher before joining the Lab’s Center for Applied Scientific Computing full time in 2017. He holds a PhD from the University of Utah’s Scientific Computing and Imaging Institute.
Frank Di Natale
Frank Di Natale looks for ways to more easily and effectively harness compute power, especially when a real-world problem is at stake. He says, “Leaning on simulations to better understand our world requires making compute accessible and facilitating sound simulation software and tools.” A notable example is the research described in the SC19 Best Paper. Di Natale and researchers from several organizations developed the novel Multiscale Machine-Learned Modeling Infrastructure (MuMMI) that predictively models the dynamics of RAS protein interactions with lipids in cell membranes. RAS protein mutations are linked to more than 30% of all human cancers. MuMMI’s machine learning algorithm selects lipid “patches” for closer examination while reducing compute resources. As lead author of the winning paper, Di Natale is proud of the team’s accomplishment and excited for MuMMI’s next phase: atomistic-scale protein simulation. “It’s exciting to design a multi-component system that produces the computational techniques that explore science in new ways,” he explains. Di Natale, who came to the Lab in 2016 after a stint at Intel Corporation, has an M.S. in Computer Science from the University of Colorado at Boulder. He is also the PI for the open-source Maestro Workflow Conductor software.
Amar Saini lives by the motto “Saving the world, one epoch at a time.” Epoch refers to a learning model making a single pass over its training data. As for saving the world, Saini points to LLNL’s mission that tackles a range of global and national security problems. “In my eyes, using deep learning, machine learning, and artificial intelligence to contribute to these solutions is saving the world,” he says. Saini works mostly with DL and neural networks. Recently he explored a denoising model that removed background noise from voice audio by converting the audio to images, then used neural networks (U-Nets) to remove distortion and convert the image back to audio. His hackathon experiments include voice cloning and altering car color with generative adversarial networks. “Everything in this field is a challenge because we’re constantly researching to find the best methods to train our DL models,” Saini explains. Last fall, he traveled to Copenhagen to give a talk at the ACM Conference on Recommender Systems. Saini is active on the FastAI and PyTorch forums, volunteers with Girls Who Code, and helps organize hackathons. With an M.S. in Electrical Engineering and Computer Science from UC Merced, he joined LLNL in 2019 after a Data Science Summer Institute internship.
COVID-19 Research Team
LLNL researchers have identified an initial set of therapeutic antibody sequences, designed in a few weeks using machine learning (ML) and high-performance computing (HPC), aimed at binding and neutralizing SARS-CoV-2, the virus that causes COVID-19. As reported by LLNL News, the research team is performing experimental testing on the chosen antibody designs and working on publishing the results. Pictured here are Dan Faissol (also a member of the DSI Council), Magdalena Franco, Adam Zemla, Edmond Lau, and Tom Desautels. They used two of the Laboratory’s supercomputers—Catalyst and the coincidentally named Corona—and an ML-driven computational platform to design antibody candidates predicted to bind with the SARS-CoV-2 Receptor Binding Domain, narrowing the number of possible designs from a nearly infinite set of candidates to 20 initial sequences predicted to target SARS-CoV-2. “The combination of all of these computational elements, including bioinformatics, simulation, and ML, means that we can flexibly and scalably follow up on mutants with promising predictions as they emerge, effectively using Livermore’s HPC systems,” said project PI Desautels.
Collaborative autonomous networks have been used in national security, critical infrastructure, and commercial applications. As hardware becomes smaller, faster, and cheaper, new algorithms are needed to collectively make sense of and collaboratively act upon the data collected from sensors. In this context, Ryan Goldhahn says, the sum can be greater than the parts. He explains, “An individual sensor may not be very capable, but the algorithms we develop at LLNL allow large networks of sensors to ‘punch above their weight’ through data fusion and cooperative behaviors.” His team recently filed a patent on a method of decentralized estimation in large autonomous sensor networks using stochastic gradient Markov chain Monte Carlo techniques. Goldhahn, who works in in LLNL’s Computational Engineering Division, was a featured speaker in the DSI’s 2019 seminar series. “The DSI helps keep the Lab on the cutting edge,” he says. “I love exchanging ideas with people who have such amazing technical depth, and solving problems of national significance makes the job that much more fulfilling.” At LLNL since 2015, Goldhahn holds a PhD in Electrical and Computer Engineering from Duke University.
Machine learning (ML) and data analytics tools are rapidly proving necessary for materials discovery, optimization, characterization, property prediction, and accelerated deployment. Yong Han is at the forefront of LLNL’s efforts to integrate data science techniques into materials science research and development. For example, he leads a team that uses ML to analyze multimodal data for optimizing feedstock materials. Han explains, “We’re addressing important questions in data sparsity, explainability, reliability, uncertainty, and domain-aware model development.” His recent work in this area includes an njp Computational Materials paper, a Science & Technology Review research highlight, and a DSI research spotlight. “Not all data are created equal. We need to evaluate what data we’re collecting and how we’re collecting them,” Han states. He emphasizes that domain scientists and data scientists will benefit from working closely together, adding, “I envision permeation of data science tools in all of our projects at the Lab.” Han holds a PhD in Chemistry from UC Santa Barbara and joined LLNL in 2005.
DSSI class of 2019
Aspiring Data Scientists
The DSSI class of 2019—31 students in all—were selected from a highly competitive applicant pool of more than 1,400. While at LLNL, they had access to LLNL’s HPC resources, participated in Grand Challenge team exercises, and displayed their research posters at LLNL's student poster symposium. These bright students are among the next generation of promising data scientists, and we look forward to seeing their careers develop.
Cindy Gonzales joined LLNL intending to continue her career as an administrator with the Computing Scholar Program, working with summer students during their internships. Then she attended a machine learning (ML) seminar, and the rest is history. She says, “I was taking an introductory statistics course as a part-time student when I learned what ML was. I thought, I could do this.” For two and a half years, Gonzales juggled a demanding workload: interning with data scientists, learning from mentors, supporting the DSI, coordinating the Scholar Program, and attending school part time. She earned her B.S. in Statistics from Cal State East Bay before beginning a distance-learning M.S. in Data Science at Johns Hopkins. Today, Gonzales uses ML to detect objects in satellite imagery—work she will present at the Applied Imagery and Pattern Recognition workshop in October. She explains, “Data science is such a diverse field, which makes it both exciting and challenging. You need a background in many different areas such as computer science plus domain knowledge. These skills will open doors to other scientific domains.”
Machine Learning Research Scientist
Dr. Amanda Minnich’s passion for socially meaningful and scientifically interesting projects converges with her machine learning (ML) and data mining expertise in the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium. Co-founded by LLNL, ATOM aims to accelerate drug development and discovery. “We want to show that ML has a place in the pharmaceutical world,” says Minnich. “I use historical drug data from pharmaceutical companies to build ML models that predict key pharmacokinetic and safety parameters.” Minnich also applies her skills to community outreach: She has served as a Girls Who Code mentor, organized a speed-mentoring session at LLNL’s 2019 Women in Data Science regional event, and recently spoke at the DSI-co-sponsored women’s lunch at the 2019 Conference on Knowledge Discovery and Data Mining. Minnich joined LLNL’s Global Security Computing Applications Division after meeting recruiters at the Grace Hopper Celebration (GHC), where she was a GHC14 scholar and now co-chairs the Artificial Intelligence Track. She holds a BA in integrative biology from UC Berkeley and an MS and PhD in computer science from the University of New Mexico.
Brian Giera thrives on problems that blend innovation and multidisciplinary teamwork, like his latest project that optimizes production of carbon capture technology with machine learning (ML). His team has created microcapsules only a few hundred micrometers in diameter—“They look like blue caviar,” Giera says—that absorb CO2 from the atmosphere. Recently featured in Lab on a Chip, the project uses ML to unlock real-time monitoring and sorting of the microcapsules, reducing production time and expenses while increasing the quality of the collected product. Giera states, “It was fun to have a computation-centric project and produce something tangible in the lab. We worked with experimentalists on an actual microfluidic device.” The system is portable to other microencapsulation devices and microfluidic production systems that can benefit from automation. With a background in the food and fragrance manufacturing industry, Giera holds a PhD in chemical engineering from UC Santa Barbara. He is active in LLNL’s Abilities Champions employee resource group and regularly mentors interns, noting, “Students are an excellent source of collaboration.”
Since joining LLNL in 2000, Ghaleb Abdulla has embraced projects that depend on teamwork and data sharing. His tenure includes establishing partnerships with universities seeking LLNL’s expertise in HPC and large-scale data analysis. He supported approximate queries over large-scale simulation datasets for the AQSim project and helped design a multi-petabyte database for the Large Synoptic Survey Telescope. Abdulla used machine learning (ML) to inspect and predict optics damage at the National Ignition Facility, and leveraged data management and analytics to enhance HPC energy efficiency. Recently, he led a Cancer Registry of Norway project developing personalized prevention and treatment strategies through pattern recognition, ML, and time-series statistical analysis of cervical cancer screening data. Today, Abdulla is co-PI of the Earth System Grid Federation—an international collaboration that manages a global climate database for 25,000 users on 6 continents. “The ability to move between different science domains and work on diverse data science challenges makes LLNL a great place to pursue a career in data science,” he says. Abdulla holds a PhD in computer science from Virginia Tech.
CASC ML team
As machine learning (ML) research heats up at LLNL, a team of computer scientists from the Center for Applied Scientific Computing (CASC) is leading the way. Pictured here are Harsh Bhatia, Shusen Liu, Bhavya Kailkhura, Peer-Timo Bremer (also a member of the DSI Council), Jayaraman Thiagarajan, Rushil Anirudh, and Hyojin Kim. Their research was recently featured in LLNL’s magazine, Science & Technology Review. As the cover story, “Machine Learning on a Mission,” explains, ML has important implications for scientific data analysis and for the Lab’s national security missions. This CASC team takes a bidirectional approach to ML, both advancing underlying theory and solving real-world problems—an effort that includes scaling algorithms for supercomputers and developing ways to analyze different types and varying volumes of data. Bremer states, “Commercial companies don’t solve scientific problems, just as national labs don’t optimize selections of movie reviews. So we build on commercial tools to create the techniques we need to analyze data from experiments, simulations, and other sources.”
Laura Kegelmeyer embraces her role as a problem solver. Since arriving at LLNL in 1988, she has brought her expertise to bear on image processing and analysis—first in biomedical applications, such as DNA mapping and breast cancer detection, and now at the National Ignition Facility (NIF), home of the world’s most energetic laser. Her Optics Inspection team combines large-scale database integration with custom machine learning algorithms and other data science techniques to analyze images captured throughout NIF’s 192 beamlines. This inspection process informs an automated “recycle loop” that extends optic lifetimes. Based on this work and previous involvement with Women in Data Science (WiDS) events, Kegelmeyer was invited to speak at the 2019 WiDS conference. “It’s an amazing opportunity to present an example of applying machine learning to ‘big science.’ NIF’s exploration of physical phenomena under extreme conditions has far-reaching impact across the globe and for future generations,” she says. “I hope to inspire data scientists to use their skills to address challenges in exciting scientific areas.” Kegelmeyer holds degrees in Biomedical Engineering and Electrical Engineering from Boston University.
Brenden Petersen isn’t content merely applying advanced data science methods to real-world problems. He’d rather tackle challenges where, he says, “the state-of-the-art doesn’t cut it.” Since joining LLNL’s Computational Engineering Division in 2016, he pursues deep reinforcement learning (RL) solutions for many fields including cybersecurity, energy, and healthcare (see DSI workshop slides [PDF]). Whereas deep learning traditionally addresses prediction problems, RL solves control problems. He explains, “RL provides a framework for learning how to behave in a task-completion scenario. Working in the field feels very goal-oriented, even competitive. Each application is a new personal challenge.” Petersen recently launched an RL reading group to help other LLNL staff get started in the field. “At the first meeting, I recognized only about 20% of the attendees, which was awesome! A major goal of the group, and DSI as a whole, is to connect researchers across the Lab,” he states. Petersen earned his biomedical engineering PhD through a joint program at UC Berkeley and UC San Francisco.
Kailkhura thrives on solving challenging problems in data science, focusing on improving the reliability and the safety of machine learning systems. “Reliability and safety in AI should not be an option but a design principle,” he states. “The better we can address these challenges, the more successful we will be in developing useful, relevant, and important ML systems.” Kailkhura also pursues mathematical solutions to open optimization problems, including a novel sphere-packing theory. He is building provably safe, explainable deep neural networks to enable reliable learning in applications for materials science, autonomous drones, and inertial confinement fusion. Thanks to his efforts with gradient-free algorithms and experiment designs, LLNL is the only national lab with research accepted at two high-profile venues—NIPS and JMLR—in 2018. Prior to joining LLNL’s Center for Applied Scientific Computing, Kailkhura attended Syracuse University where his PhD dissertation won an all-university prize. Recently, he co-authored the book Secure Networked Inference with Unreliable Data Sources.
Applied Statistics Group Leader
Fronczyk is a “total nerd” whose multifaceted job makes her an ideal panelist for the Women in Statistics and Data Science conference, where she recently discussed research opportunities at national labs. Fronczyk leads LLNL’s Applied Statistics Group while providing statistical analysis and uncertainty quantification for several projects, including a warhead life-extension program and the U.S. Nuclear Detection System. “I love learning new things and tackling interesting problems,” states Fronczyk. “Standard approaches rarely work on real-world data, so finding the right tool for the job often means exploring new methods and combining or modifying others.” She brings this creative mentality to on- and offsite collaborations, such as with the Innovations and Partnerships Office and the Institute of Makers of Explosives Science Panel. She also sits on LLNL’s Engineering Science & Technology Council, manages two seminar series (including DSI’s), and co-organized DSI’s inaugural workshop. Fronczyk holds a PhD in statistics and stochastic modeling from UC Santa Cruz.
Jose Cadena Pico
Cadena Pico enjoys the discovery process when analyzing new data sets, despite the difficulties in preparing data before building machine learning models. “Often a data set is incomplete or contains errors from different sources. Sometimes its size makes it difficult to extract knowledge,” he says. “Solving these challenges and knowing that I’m helping other researchers advance their work is very gratifying.” Once a PhD student at Virginia Tech, Cadena Pico now contributes to LLNL’s brain-on-a-chip project by studying complex networks among brain cells. He also investigates ways to detect anomalous activity in networks, and his recent work—developing a method for finding clusters of under-vaccinated populations to inform public health resources—was presented at the 24th KDD Conference. Formerly a three-time LLNL summer intern, Cadena Pico values ongoing education: “I like to keep learning about different research domains while developing a data science skill set applicable to many problems of global importance.”
DSSI class of 2018
Aspiring Data Scientists
The DSSI class of 2018—26 students in all—were selected from a highly competitive applicant pool of more than a thousand. While at LLNL, they participated in Grand Challenge team exercises and displayed their research posters at the DSI’s summer workshop. These bright students are among the next generation of promising data scientists, and we look forward to seeing their careers develop.
With a PhD in computer vision and machine learning, Anirudh joined LLNL’s Center for Applied Scientific Computing in 2016. He enjoys the challenges of an exponentially growing field, noting, “Something on a whiteboard today is likely to end up being used by someone within a few months.” Anirudh develops convolutional neural networks that can complete computed tomography (CT) images when the scanned object is only partially visible. His team’s paper, “Lose the Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion,” is one of only 7% selected for a spotlight presentation at the 2018 Computer Vision and Pattern Recognition conference. Anirudh’s related work with generative adversarial networks was recently featured in NVIDIA’s developer blog. “I am very glad the Lab has the DSI,” says Anirudh. “A central institute that brings together everyone working on similar ideas is a great step toward becoming a leader in artificial intelligence and machine learning.”
T. Nathan Mundhenk
Mundhenk enjoys “nerding around” in LLNL’s Computational Engineering Division, especially when it comes to research aimed at improving people’s lives. With a PhD in computer science from the University of Southern California, he works on projects that use LLNL’s powerful computing capabilities to advance neural network technologies. Mundhenk recently co-authored a paper, “Improvements to Context Based Self-Supervised Learning,” which was accepted to the 2018 Computer Vision and Pattern Recognition conference. His team is developing a state-of-the-art technique for refining unsupervised deep learning. In their method of self-supervision, a deep neural network can be pre-trained on a large generic dataset before training on a small labeled dataset, resulting in better accuracy (e.g., of image recognition) in the latter. “The entire field of artificial intelligence is bursting with new innovation,” says Mundhenk. “It’s challenging to keep up with the extraordinary pace of research, but also very exciting to be part of it.”
Senior Bioinformatics Software Developer
Since joining LLNL in 2002, Torres has combined her love of biology with coding. She serves as lead bioinformatics software developer on biosecurity projects supporting the Global Security Program. Her team is building the Gene Surprise Toolkit, which determines biothreat severity and detects potential genetic engineering of pathogens. In addition, Torres contributes to the Accelerating Therapeutics for Opportunities in Medicine consortium. The project aims to accelerate the drug discovery pipeline by building predictive, data-driven pharmaceutical models. In March 2018, Torres organized a regional symposium in conjunction with Stanford University’s Women in Data Science conference. She also encourages local middle school students to explore computer science through the Girls Who Code program and mentors student interns for LLNL’s Data Science Summer Institute (DSSI). “I’m interested in collaborating across domains with similar data analysis needs,” says Torres. “I look forward to strengthening networking and educational opportunities through DSI, especially for the DSSI.”