The DSI’s career panel series continued on June 28 to highlight some of LLNL’s COVID-19 research projects. Three data scientists—Emilia Grzesiak, Derek Jones, and Priyadip Ray—joined moderator and data scientist Stewart He to talk about their work in drug screening, protein–drug compounds, antibody–antigen sequence analysis, and risk factor identification.
He, who earned a PhD in Computer Science from UC Davis in 2016, asked the panelists to describe their journeys into data science at the Lab. Grzesiak interned with the Data Science Summer Institute two years ago while earning a Master’s in Biomedical Engineering from Duke University. Her academic experience with the SLURM workload management software while researching wearable data related to viruses provided a foundation for the internship, which in turn exposed her to bioinformatics. “I had some baseline knowledge when COVID came along,” she noted.
Jones became interested in game theory and computational complexity during his undergraduate years and “eventually made my way to machine learning [ML].” He interned at Lawrence Berkeley National Laboratory where he was introduced to supercomputers, then interned at LLNL. Now a full-time employee in the Global Security Computing Applications Division (GS-CAD), he is pursuing a Computer Science PhD from the University of California, San Diego, while continuing to work part-time at the Lab.
Ray, who joined the Lab in 2016 and holds a PhD in Electrical Engineering from Syracuse University, has a background in statistical signal processing and Bayesian modeling. After graduate school, he taught at the Indian Institute of Technology before returning to the U.S., where he learned about LLNL from an academic colleague. “Working here has been a great experience,” he said.
The Lab’s mission and varied scientific portfolio appealed to the panelists during their job searches. Solving important problems in service to the nation was a big motivator for Grzesiak. “I wanted to work for an employer that aligned with my values, and do something with high impact that’s positive for society,” she stated. Ray has researched many diseases and finds satisfaction in the opportunity for real-world solutions. Agreed He, “The Lab provides an opportunity to work on tough and interesting questions, and the growth of data science enables researchers to choose which scientific challenges they want to address. You can have very eclectic tastes here.”
The panelists were able to pivot their research efforts when the pandemic began. He—who works on bioinformatics, molecular dynamics simulations, and computed tomography reconstruction projects in addition to developing ML models for COVID drug screening—pointed out, “Who wouldn’t want a cure for COVID?”
Jones and several LLNL colleagues had been exploring structure-based deep learning (DL) techniques, including developing a spatial graph convolutional neural network (SG-CNN) for a project sponsored by the American Heart Association. They were able to apply the SG-CNN and other DL models to screen protein–drug compound complexes across four SARS-CoV-2 protein targets. The team published their work in the Journal of Chemical Theory and Computation.
Grzesiak’s research compares simulations to bioassays that measure the binding affinity between COVID variants and antibody candidates. From there, she builds analysis and visualization tools to identify mutations and design patterns that will likely increase binding affinity and stability. “We’re narrowing in on some antibody designs that look promising for potential working drug candidates,” she explained.
When pandemic response became a priority for LLNL, Ray reached out to collaborators in machine learning, applied math, and statistics, as well as clinicians involved in patient care. They set out to identify novel cancer-related risk factors for COVID and disease-state dependent risk factors for hospitalized COVID patients. For example, his team discovered that men are more likely to transition from moderate to severe disease after hospitalization, but once these patients reach a severe state, more women than men die.
The panelists agreed that teaming up with biology and biomedicine colleagues helps them focus their data science efforts in the right direction for a project. “Much of what I’ve learned about biology has come through learning to communicate with domain experts and building intuition about what’s happening on a new project,” stated He. Ray works closely with domain experts to understand how to approach problems from a data science perspective. “Biology provides complex, high-dimensional modeling problems, and the dataset is often limited,” he said.
Marisa Torres, GS-CAD Bioinformatics group leader and co-organizer of the event, stated, “The panel highlighted their diverse approaches to building new predictive disease capabilities at the Lab and how we can forge our careers to support a greater good, such as with COVID research.”
The DSI’s career panel series will continue this summer with a session featuring former LLNL interns who now hold full-time positions at the Lab, and a session with additive manufacturing researchers and materials scientists. In addition to Torres, co-organizers are Mary Silva, Cindy Gonzales, Amar Saini, Jennifer Bellig, and Holly Auten.