ISCP projects make machine learning advantages tangible
Data science tools are not only rapidly taking hold across disciplines, they are constantly evolving. The applications, services, and techniques one cohort of scientists and engineers may have learned could be out of date by the next cohort, especially as machine learning (ML) and artificial intelligence (AI) tools become commonplace.
To keep employees abreast of the latest tools, two data science–focused projects are under way as part of Lawrence Livermore’s Institutional Scientific Capability Portfolio (ISCP). Distinct from traditional scientific research, ISCP projects improve the Laboratory’s overall ability to conduct research by overcoming technical and operational hurdles.
SEAM Makes Researchers a Cut Above
Before Laboratory employees can implement new data science tools in their work, they must learn to use them. One ISCP project, Shared Education in Artificial intelligence and Machine learning (SEAM), is a professional development program designed to equip employees with job-relevant skills using AI/ML tools. Within 3-person cadres, participants follow an increasingly challenging, 12-week course while writing their own code and collectively preparing a final presentation on lessons learned, both technical and practical.
SEAM program lead Andrew Gillette is a research scientist in the Center for Applied Scientific Computing (CASC). Before coming to Livermore 5 years ago, however, he was a faculty member in the University of Arizona mathematics department teaching courses from introductory statistics to advanced topics. “The Laboratory is a good match for my research interests, but I also wanted to leverage my experience as an educator. I’m grateful to CASC director Jeff Hittinger, who recognized the need for an internal professional development program early on and encouraged me to devise one.”
Gillette found three AI and ML topics that were sure to garner interest for Lab employees. He noticed demand for Surrogate Modeling and Design Optimization, a host of techniques used to substitute statistical models for experiments and better predict subsequently needed experiments. SEAM’s course on the topic gives participants hands-on instruction using neural networks, Gaussian processes, and other ML tools to create an iteratively updated “fast inference” model of data. Because the technique is purely statistical, the skills are transferrable to data modeling problems across scientific disciplines. SEAM’s two other courses are also applicable to multiple research areas. The course on Synthetic Image Generation & Analysis demonstrates how to load, manipulate, and classify images from scientific datasets. In the Single Agent Reinforcement Learning course, participants use Abmarl, a Lab-developed software package, to explore the effects of defining different reward functions and hyperparameters for agent training.
“We’re at a unique moment where there’s high demand for AI and ML expertise, but many new hires may not have this expertise because their academic field hasn’t fully adopted those technologies into their degree programs. Thankfully, the Lab is a place where people want to upskill,” says Gillette.
At its launch this year, SEAM received over 200 applications from across Lab organizations. Gillette is hopeful that next year, the program can expand the number of participants enrolled while lowering the program cost per student. He points out, “The whole intent of the program is scalability. By having people work together in groups, connecting formally and informally throughout the course, we’re fostering a self-sustaining community.”
CSMS Safely Sources Cloud Solutions
Headed by CASC scientist Brian Weston, another ISCP project, Cloud Services for Mission Science (CSMS), is designed to help Livermore strategically integrate cloud-based workflows to enhance mission deliverability. “By leveraging tech industry innovations, we can significantly boost our research capabilities and throughput to reduce time-to-discovery,” says Weston. CSMS is, in a sense, a “matchmaking” effort between internal needs and external cloud services.
Livermore is a world leader in the hardware and software used to run physics solvers and simulations on high-performance computing (HPC) systems, but continually revamping data management strategies need not detract from its scientific research focus. “There are still many computing tasks that don’t require HPC. Science-enabling technologies such as databases, search engines, Jupyter notebooks, project management tools, and dashboards significantly enhance our overall productivity. These tools continue to develop, especially in the realm of machine learning operations and implementations of large language models [LLMs],” Weston says.
Naturally, the Laboratory cannot rely on just any service to handle its information, which ranges in sensitivity from “Unlimited Release” to “Top Secret.” Federal agencies look to the authorizations issued by the Federal Risk and Authorization Management Program (FedRAMP), an initiative of the U.S. General Services Administration to provide standardized security certifications for assessing private cloud solutions for federal use. Weston and collaborating scientist Joshua DeOtte pay close attention to how Livermore data is classified, liaising between LivIT (which manages and secures the Lab’s IT needs) and mission areas to detect product matches.
Weston shares that a highlight of the project so far has been augmenting Livermore’s classified document management system. CSMS helped identify how Amazon Web Services could safely perform optical character recognition to read data from scanned documents. Then, LLMs could summarize the documents, and users could execute a semantic search to quickly find documents of interest. CSMS ultimately helped to locate existing services that met Livermore’s operational needs and security requirements, preserving resources that otherwise would have been devoted to developing brand new tools.
Going forward, Weston says CSMS is investigating further data management and engineering opportunities. Currently, a sizeable portion of the Laboratory’s data storage methods do not meet data management principles of findability, accessibility, interoperability, and reusability (FAIR), primarily due to historical data siloing. “We produce a lot of expensive data, so we need to make sure our management systems meet these principles. Several cloud-based data engineering solutions can help us reach that goal,” he says.
Gillette notes, “CSMS is standing up cloud services at the Lab. Meanwhile, our SEAM courses utilize cloud services to deliver online Python environments for educating our staff about ML methods. These themes of technology adoption and education go hand in hand.”