The DSI recognizes contributions staff make to LLNL's data science community through periodic Data Scientist Spotlights.
Amar Saini lives by the motto “Saving the world, one epoch at a time.” Epoch refers to a learning model making a single pass over its training data. As for saving the world, Saini points to LLNL’s mission that tackles a range of global and national security problems. “In my eyes, using deep learning, machine learning, and artificial intelligence to contribute to these solutions is saving the world,” he says. Saini works mostly with DL and neural networks. Recently he explored a denoising model that removed background noise from voice audio by converting the audio to images, then used neural networks (U-Nets) to remove distortion and convert the image back to audio. His hackathon experiments include voice cloning and altering car color with generative adversarial networks. “Everything in this field is a challenge because we’re constantly researching to find the best methods to train our DL models,” Saini explains. Last fall, he traveled to Copenhagen to give a talk at the ACM Conference on Recommender Systems. Saini is active on the FastAI and PyTorch forums, volunteers with Girls Who Code, and helps organize hackathons. With an M.S. in Electrical Engineering and Computer Science from UC Merced, he joined LLNL in 2019 after a Data Science Summer Institute internship.
COVID-19 Research Team
LLNL researchers have identified an initial set of therapeutic antibody sequences, designed in a few weeks using machine learning (ML) and high-performance computing (HPC), aimed at binding and neutralizing SARS-CoV-2, the virus that causes COVID-19. As reported by LLNL News, the research team is performing experimental testing on the chosen antibody designs and working on publishing the results. Pictured here are Dan Faissol (also a member of the DSI Council), Magdalena Franco, Adam Zemla, Edmond Lau, and Tom Desautels. They used two of the Laboratory’s supercomputers—Catalyst and the coincidentally named Corona—and an ML-driven computational platform to design antibody candidates predicted to bind with the SARS-CoV-2 Receptor Binding Domain, narrowing the number of possible designs from a nearly infinite set of candidates to 20 initial sequences predicted to target SARS-CoV-2. “The combination of all of these computational elements, including bioinformatics, simulation, and ML, means that we can flexibly and scalably follow up on mutants with promising predictions as they emerge, effectively using Livermore’s HPC systems,” said project PI Desautels.
Collaborative autonomous networks have been used in national security, critical infrastructure, and commercial applications. As hardware becomes smaller, faster, and cheaper, new algorithms are needed to collectively make sense of and collaboratively act upon the data collected from sensors. In this context, Ryan Goldhahn says, the sum can be greater than the parts. He explains, “An individual sensor may not be very capable, but the algorithms we develop at LLNL allow large networks of sensors to ‘punch above their weight’ through data fusion and cooperative behaviors.” His team recently filed a patent on a method of decentralized estimation in large autonomous sensor networks using stochastic gradient Markov chain Monte Carlo techniques. Goldhahn, who works in in LLNL’s Computational Engineering Division, was a featured speaker in the DSI’s 2019 seminar series. “The DSI helps keep the Lab on the cutting edge,” he says. “I love exchanging ideas with people who have such amazing technical depth, and solving problems of national significance makes the job that much more fulfilling.” At LLNL since 2015, Goldhahn holds a PhD in Electrical and Computer Engineering from Duke University.
Machine learning (ML) and data analytics tools are rapidly proving necessary for materials discovery, optimization, characterization, property prediction, and accelerated deployment. Yong Han is at the forefront of LLNL’s efforts to integrate data science techniques into materials science research and development. For example, he leads a team that uses ML to analyze multimodal data for optimizing feedstock materials. Han explains, “We’re addressing important questions in data sparsity, explainability, reliability, uncertainty, and domain-aware model development.” His recent work in this area includes an njp Computational Materials paper, a Science & Technology Review research highlight, and a DSI research spotlight. “Not all data are created equal. We need to evaluate what data we’re collecting and how we’re collecting them,” Han states. He emphasizes that domain scientists and data scientists will benefit from working closely together, adding, “I envision permeation of data science tools in all of our projects at the Lab.” Han holds a PhD in Chemistry from UC Santa Barbara and joined LLNL in 2005.
DSSI class of 2019
Aspiring Data Scientists
The DSSI class of 2019—31 students in all—were selected from a highly competitive applicant pool of more than 1,400. While at LLNL, they had access to LLNL’s HPC resources, participated in Grand Challenge team exercises, and displayed their research posters at LLNL's student poster symposium. These bright students are among the next generation of promising data scientists, and we look forward to seeing their careers develop.
Cindy Gonzales joined LLNL intending to continue her career as an administrator with the Computing Scholar Program, working with summer students during their internships. Then she attended a machine learning (ML) seminar, and the rest is history. She says, “I was taking an introductory statistics course as a part-time student when I learned what ML was. I thought, I could do this.” For two and a half years, Gonzales juggled a demanding workload: interning with data scientists, learning from mentors, supporting the DSI, coordinating the Scholar Program, and attending school part time. She earned her B.S. in Statistics from Cal State East Bay before beginning a distance-learning M.S. in Data Science at Johns Hopkins. Today, Gonzales uses ML to detect objects in satellite imagery—work she will present at the Applied Imagery and Pattern Recognition workshop in October. She explains, “Data science is such a diverse field, which makes it both exciting and challenging. You need a background in many different areas such as computer science plus domain knowledge. These skills will open doors to other scientific domains.”
Machine Learning Research Scientist
Dr. Amanda Minnich’s passion for socially meaningful and scientifically interesting projects converges with her machine learning (ML) and data mining expertise in the Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium. Co-founded by LLNL, ATOM aims to accelerate drug development and discovery. “We want to show that ML has a place in the pharmaceutical world,” says Minnich. “I use historical drug data from pharmaceutical companies to build ML models that predict key pharmacokinetic and safety parameters.” Minnich also applies her skills to community outreach: She has served as a Girls Who Code mentor, organized a speed-mentoring session at LLNL’s 2019 Women in Data Science regional event, and recently spoke at the DSI-co-sponsored women’s lunch at the 2019 Conference on Knowledge Discovery and Data Mining. Minnich joined LLNL’s Global Security Computing Applications Division after meeting recruiters at the Grace Hopper Celebration (GHC), where she was a GHC14 scholar and now co-chairs the Artificial Intelligence Track. She holds a BA in integrative biology from UC Berkeley and an MS and PhD in computer science from the University of New Mexico.
Brian Giera thrives on problems that blend innovation and multidisciplinary teamwork, like his latest project that optimizes production of carbon capture technology with machine learning (ML). His team has created microcapsules only a few hundred micrometers in diameter—“They look like blue caviar,” Giera says—that absorb CO2 from the atmosphere. Recently featured in Lab on a Chip, the project uses ML to unlock real-time monitoring and sorting of the microcapsules, reducing production time and expenses while increasing the quality of the collected product. Giera states, “It was fun to have a computation-centric project and produce something tangible in the lab. We worked with experimentalists on an actual microfluidic device.” The system is portable to other microencapsulation devices and microfluidic production systems that can benefit from automation. With a background in the food and fragrance manufacturing industry, Giera holds a PhD in chemical engineering from UC Santa Barbara. He is active in LLNL’s Abilities Champions employee resource group and regularly mentors interns, noting, “Students are an excellent source of collaboration.”
Since joining LLNL in 2000, Ghaleb Abdulla has embraced projects that depend on teamwork and data sharing. His tenure includes establishing partnerships with universities seeking LLNL’s expertise in HPC and large-scale data analysis. He supported approximate queries over large-scale simulation datasets for the AQSim project and helped design a multi-petabyte database for the Large Synoptic Survey Telescope. Abdulla used machine learning (ML) to inspect and predict optics damage at the National Ignition Facility, and leveraged data management and analytics to enhance HPC energy efficiency. Recently, he led a Cancer Registry of Norway project developing personalized prevention and treatment strategies through pattern recognition, ML, and time-series statistical analysis of cervical cancer screening data. Today, Abdulla is co-PI of the Earth System Grid Federation—an international collaboration that manages a global climate database for 25,000 users on 6 continents. “The ability to move between different science domains and work on diverse data science challenges makes LLNL a great place to pursue a career in data science,” he says. Abdulla holds a PhD in computer science from Virginia Tech.
CASC ML team
As machine learning (ML) research heats up at LLNL, a team of computer scientists from the Center for Applied Scientific Computing (CASC) is leading the way. Pictured here are Harsh Bhatia, Shusen Liu, Bhavya Kailkhura, Peer-Timo Bremer (also a member of the DSI Council), Jayaraman Thiagarajan, Rushil Anirudh, and Hyojin Kim. Their research was recently featured in LLNL’s magazine, Science & Technology Review. As the cover story, “Machine Learning on a Mission,” explains, ML has important implications for scientific data analysis and for the Lab’s national security missions. This CASC team takes a bidirectional approach to ML, both advancing underlying theory and solving real-world problems—an effort that includes scaling algorithms for supercomputers and developing ways to analyze different types and varying volumes of data. Bremer states, “Commercial companies don’t solve scientific problems, just as national labs don’t optimize selections of movie reviews. So we build on commercial tools to create the techniques we need to analyze data from experiments, simulations, and other sources.”
Laura Kegelmeyer embraces her role as a problem solver. Since arriving at LLNL in 1988, she has brought her expertise to bear on image processing and analysis—first in biomedical applications, such as DNA mapping and breast cancer detection, and now at the National Ignition Facility (NIF), home of the world’s most energetic laser. Her Optics Inspection team combines large-scale database integration with custom machine learning algorithms and other data science techniques to analyze images captured throughout NIF’s 192 beamlines. This inspection process informs an automated “recycle loop” that extends optic lifetimes. Based on this work and previous involvement with Women in Data Science (WiDS) events, Kegelmeyer was invited to speak at the 2019 WiDS conference. “It’s an amazing opportunity to present an example of applying machine learning to ‘big science.’ NIF’s exploration of physical phenomena under extreme conditions has far-reaching impact across the globe and for future generations,” she says. “I hope to inspire data scientists to use their skills to address challenges in exciting scientific areas.” Kegelmeyer holds degrees in Biomedical Engineering and Electrical Engineering from Boston University.
Brenden Petersen isn’t content merely applying advanced data science methods to real-world problems. He’d rather tackle challenges where, he says, “the state-of-the-art doesn’t cut it.” Since joining LLNL’s Computational Engineering Division in 2016, he pursues deep reinforcement learning (RL) solutions for many fields including cybersecurity, energy, and healthcare (see DSI workshop slides [PDF]). Whereas deep learning traditionally addresses prediction problems, RL solves control problems. He explains, “RL provides a framework for learning how to behave in a task-completion scenario. Working in the field feels very goal-oriented, even competitive. Each application is a new personal challenge.” Petersen recently launched an RL reading group to help other LLNL staff get started in the field. “At the first meeting, I recognized only about 20% of the attendees, which was awesome! A major goal of the group, and DSI as a whole, is to connect researchers across the Lab,” he states. Petersen earned his biomedical engineering PhD through a joint program at UC Berkeley and UC San Francisco.
Kailkhura thrives on solving challenging problems in data science, focusing on improving the reliability and the safety of machine learning systems. “Reliability and safety in AI should not be an option but a design principle,” he states. “The better we can address these challenges, the more successful we will be in developing useful, relevant, and important ML systems.” Kailkhura also pursues mathematical solutions to open optimization problems, including a novel sphere-packing theory. He is building provably safe, explainable deep neural networks to enable reliable learning in applications for materials science, autonomous drones, and inertial confinement fusion. Thanks to his efforts with gradient-free algorithms and experiment designs, LLNL is the only national lab with research accepted at two high-profile venues—NIPS and JMLR—in 2018. Prior to joining LLNL’s Center for Applied Scientific Computing, Kailkhura attended Syracuse University where his PhD dissertation won an all-university prize. Recently, he co-authored the book Secure Networked Inference with Unreliable Data Sources.
Applied Statistics Group Leader
Fronczyk is a “total nerd” whose multifaceted job makes her an ideal panelist for the Women in Statistics and Data Science conference, where she recently discussed research opportunities at national labs. Fronczyk leads LLNL’s Applied Statistics Group while providing statistical analysis and uncertainty quantification for several projects, including a warhead life-extension program and the U.S. Nuclear Detection System. “I love learning new things and tackling interesting problems,” states Fronczyk. “Standard approaches rarely work on real-world data, so finding the right tool for the job often means exploring new methods and combining or modifying others.” She brings this creative mentality to on- and offsite collaborations, such as with the Innovations and Partnerships Office and the Institute of Makers of Explosives Science Panel. She also sits on LLNL’s Engineering Science & Technology Council, manages two seminar series (including DSI’s), and co-organized DSI’s inaugural workshop. Fronczyk holds a PhD in statistics and stochastic modeling from UC Santa Cruz.
Jose Cadena Pico
Cadena Pico enjoys the discovery process when analyzing new data sets, despite the difficulties in preparing data before building machine learning models. “Often a data set is incomplete or contains errors from different sources. Sometimes its size makes it difficult to extract knowledge,” he says. “Solving these challenges and knowing that I’m helping other researchers advance their work is very gratifying.” Once a PhD student at Virginia Tech, Cadena Pico now contributes to LLNL’s brain-on-a-chip project by studying complex networks among brain cells. He also investigates ways to detect anomalous activity in networks, and his recent work—developing a method for finding clusters of under-vaccinated populations to inform public health resources—was presented at the 24th KDD Conference. Formerly a three-time LLNL summer intern, Cadena Pico values ongoing education: “I like to keep learning about different research domains while developing a data science skill set applicable to many problems of global importance.”
DSSI class of 2018
Aspiring Data Scientists
The DSSI class of 2018—26 students in all—were selected from a highly competitive applicant pool of more than a thousand. While at LLNL, they participated in Grand Challenge team exercises and displayed their research posters at the DSI’s summer workshop. These bright students are among the next generation of promising data scientists, and we look forward to seeing their careers develop.
With a PhD in computer vision and machine learning, Anirudh joined LLNL’s Center for Applied Scientific Computing in 2016. He enjoys the challenges of an exponentially growing field, noting, “Something on a whiteboard today is likely to end up being used by someone within a few months.” Anirudh develops convolutional neural networks that can complete computed tomography (CT) images when the scanned object is only partially visible. His team’s paper, “Lose the Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion,” is one of only 7% selected for a spotlight presentation at the 2018 Computer Vision and Pattern Recognition conference. Anirudh’s related work with generative adversarial networks was recently featured in NVIDIA’s developer blog. “I am very glad the Lab has the DSI,” says Anirudh. “A central institute that brings together everyone working on similar ideas is a great step toward becoming a leader in artificial intelligence and machine learning.”
T. Nathan Mundhenk
Mundhenk enjoys “nerding around” in LLNL’s Computational Engineering Division, especially when it comes to research aimed at improving people’s lives. With a PhD in computer science from the University of Southern California, he works on projects that use LLNL’s powerful computing capabilities to advance neural network technologies. Mundhenk recently co-authored a paper, “Improvements to Context Based Self-Supervised Learning,” which was accepted to the 2018 Computer Vision and Pattern Recognition conference. His team is developing a state-of-the-art technique for refining unsupervised deep learning. In their method of self-supervision, a deep neural network can be pre-trained on a large generic dataset before training on a small labeled dataset, resulting in better accuracy (e.g., of image recognition) in the latter. “The entire field of artificial intelligence is bursting with new innovation,” says Mundhenk. “It’s challenging to keep up with the extraordinary pace of research, but also very exciting to be part of it.”
Senior Bioinformatics Software Developer
Since joining LLNL in 2002, Torres has combined her love of biology with coding. She serves as lead bioinformatics software developer on biosecurity projects supporting the Global Security Program. Her team is building the Gene Surprise Toolkit, which determines biothreat severity and detects potential genetic engineering of pathogens. In addition, Torres contributes to the Accelerating Therapeutics for Opportunities in Medicine consortium. The project aims to accelerate the drug discovery pipeline by building predictive, data-driven pharmaceutical models. In March 2018, Torres organized a regional symposium in conjunction with Stanford University’s Women in Data Science conference. She also encourages local middle school students to explore computer science through the Girls Who Code program and mentors student interns for LLNL’s Data Science Summer Institute (DSSI). “I’m interested in collaborating across domains with similar data analysis needs,” says Torres. “I look forward to strengthening networking and educational opportunities through DSI, especially for the DSSI.”