Consulting service infuses Lab projects with data science expertise
A key advantage of LLNL’s culture of multidisciplinary teamwork is that domain scientists don’t need to be experts in everything. Physicists, chemists, biologists, materials engineers, climate scientists, computer scientists, and other researchers regularly work alongside specialists in other fields to tackle challenging problems. The rise of Big Data across the Lab has led to a demand for data science knowledge at any or all stages of a team’s project.
Led by applied statisticians Jason Bernstein and Kathleen Schmidt, the Data Science Institute’s Consulting Service (DSICS) offers statistical and machine learning (ML) expertise to Lab research teams on a short-term basis. These consultations can turn into long-term collaborations, such as Laboratory Directed Research and Development (LDRD) projects, and help strengthen ties across the Lab.
Bernstein explains, “Consultants can help determine how many experiments need to be completed, or suggest methods to analyze already collected data. Additionally, the DSICS provides a way for data scientists to meet new people at LLNL, learn about the mission areas they work in, and potentially work in those areas themselves. The consulting service often helps the consultant as much as the consultee.”
Responding to Evolving Needs
Several years ago, LLNL’s Engineering Directorate had recognized a gap where research teams were trying to determine the best ways to use statistical methods, such as sensitivity analysis and setting tolerance limits. Coordinated by Kristin Lennox and, later, Cory Lanker, the directorate’s Statistical Consulting Service (SCS) was born.
The effort promoted best practices for understanding data: how to collect it, how to determine a meaningful sample size, how to measure variables, and more. SCS staff found that talking with a statistician early—before experimental or simulation data is ever collected—could enrich a project’s statistical analysis, prevent costly errors, and improve modeling effectiveness.
After the DSI launched in 2018 in response to the Lab’s growing data science research community, the SCS expanded to include ML and other data science techniques and eventually became the DSICS. Today, the service is available to LLNL researchers, experimenters, principal investigators, project leaders, and program managers. Consultants are experts in ML, deep learning, reinforcement learning, predictive modeling, experimental design, and statistical inference.
Range of Requests
The DSICS sees inquiries from all corners of LLNL and from a variety of projects. The consultation tasks are similarly wide-ranging. Bernstein states, “We help teams with experimental design, sensitivity analysis, sample size calculations, data visualization, and so much more. Sometimes we provide sample code for ML software implementation and help with debugging. Other times we help write LDRD proposals or educate scientists on statistical concepts such as confidence intervals and issues arising from multiple hypothesis testing.”
Consultants have received more than 135 requests over the SCS/DSICS lifespan, tackling key questions like What is the research question? What is the data? and How will we use the data to answer the question? Recent topics include COVID-19, TensorFlow debugging, and R Shiny dashboards. “I’ve dealt with interesting subject areas, including reliability assessment of the National Ignition Facility and analysis of data from Formula One racing. There have also been statistically interesting problems such as sampling on the surface of a sphere,” recalls Schmidt, who has been a consultant since the DSICS’s inception. “Consulting problems are always a bit of a surprise.”
Over the years, the scope of consulting requests has evolved alongside changing research goals and technology trends. Early requests focused on statistics and analysis concerning relatively small datasets. “The questions we get today are often more complex. Datasets are large and heterogeneous, and software tools are key,” Bernstein notes. “Many projects require different areas of consulting expertise, so we’re looking to collaborate with other service organizations around the Lab like the Software Development Resource Center and the Advanced Research Technologies Working Group.”
Win–Win
Consulting requires finding quick, readily implementable solutions. Amanda Muyskens, who co-directs LLNL’s Data Science Summer Institute student program, volunteered for the DSICS when she was a postdoctoral researcher. Her first consulting project helped a customer understand the potential financial benefit of cleaning up and developing brownfields, which are lands abandoned due to industrial pollutants. Four years later, she participates in every DSICS request she can.
“When consulting, I get to learn about a new project and likely new science I hadn’t heard of before. I can consider a mathematical framework to solve the problem, plus work with the scientist all in a few hours,” explains Muyskens. “My favorite part is the impact the consultant can make on a project in a short amount of time. In some cases, the scientist might use your ideas as a branching off point to take their research in a completely new direction.”