Our mission at the Data Science Institute (DSI) is to enable excellence in data science research and applications across the Laboratory's core missions.
Data science has become an essential discipline paving the path of LLNL's key program areas, and the Laboratory is home to some of the largest, most unique, and most interesting data and supercomputers in the world. The DSI acts as the central hub for all data science activity—in areas of artificial intelligence, big-data analytics, computer vision, machine learning, predictive modeling, statistical inference, uncertainty quantification, and more—at LLNL working to help lead, build, and strengthen the data science workforce, research, and outreach to advance the state-of-the-art of our nation's data science capabilities. Read more about the DSI.
Data Scientist Spotlight
With a B.S. in Mathematics and Computer Science from UC San Diego, Olivia Miano was poised to join LLNL in the spring of 2020 as a software developer. Then she heard about the Data Science Immersion Program and immediately signed up. “I knew next to nothing about data science when I first joined the Lab, so almost everything I know I learned during the program,” she says. Under the mentorship of David Buttler and Juanita Ordoñez, Miano explored word embeddings and active learning for context-based entity classification as well as authorship attribution and verification with social media data. A year later, she works on natural language processing projects—including information extraction and authorship verification—for LLNL’s Global Security Computing Applications Division. For Miano, the challenges of applying data science are also what make it exciting. She states, “You need domain knowledge on top of your data science, computer science, and math knowledge. And I’m always eager to learn and willing to tackle a challenging assignment, especially when the work is meaningful like what we do at the Lab.”
New Research in AI
An LLNL team proposes a framework that leverages DL for symbolic regression via a simple idea—use a large model (neural network) to search the space of small models (mathematical expressions). The research was accepted as an Oral Presentation (with an acceptance rate of 1.5%) at the upcoming International Conference on Learning Representations (ICLR), ranking fifth out of approximately 3,000 scored papers.
- Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients (preprint) – Brenden Petersen, Mikel Landajuela Larma, Nathan Mundhenk, Claudio Santiago, Soo Kim, and Joanne Kim
Lead author Petersen explains, “From the algorithmic perspective, our approach is not specific to the problem of symbolic regression. More broadly, our framework applies to discrete optimization problems where the user may want to incorporate some knowledge into the search. We are just now beginning to apply it to other tasks, such as finding interpretable reinforcement learning policies or optimizing amino acid sequences.”
Petersen credits the team’s expertise in optimization, mathematics, physics, deep learning, and reinforcement learning with making the approach successful. The research will also be featured at an upcoming DSI virtual seminar.