“Data analysis is a foundation in all science disciplines. We need to learn, experiment, and utilize new tools that are being developed to help our causes.”
– Yong Han
LLNL’s development of innovative materials ranges from advanced metallic nanowires to high-performance alloys, from freestanding polymer films to components that refresh the aging nuclear stockpile, and much more. Regardless of the final product, ideal feedstock materials need to be synthesized and optimized for system level integration that will meet performance requirements with predictive behaviors.
At LLNL, data science techniques take center stage in a series of materials synthesis and optimization projects led by researcher Yong Han—all of which aim to accelerate the materials discovery, optimization, and deployment processes. As Han explained at the DSI’s 2018 workshop, the materials discovery process can take 10 to 15 years before application integration. “To improve on this, we should leverage new tools, especially at a place like the Lab, where people across all disciplines are working on data analysis problems and solutions,” he states.
One project was inspired by the growing volume of scientific literature. Han says, “Experimentalists must read a great deal of literature to stay up to date on their fields of research while learning new protocols and chemicals others have used in creating materials.” His team came up with a way to extract targeted information from published papers, thus automating a repetitive and formidable task.
The resulting browser-based tool contains text from hundreds of papers and allows the user to analyze data for different variables. The extraction pipeline begins with a supervised logistic regression algorithm that highlights recipe-like sentences. Another algorithm combines a conditional random field model with natural language processing to extract chemical information—formulas, concentrations, relationships, reaction times, morphology, and more. A visualization tool renders the data for further analysis by the end users.
Another project goes beyond text descriptions of materials to analyzing images. In high-explosive (HE) materials, physical properties often correlate with performance. For example, in certain HEs, uniformly round particles or specific particle sizes are indicators of mechanical performance. However, Han notes, “Appearance can be difficult to quantify.” Furthermore, data volume is again an issue, as a single scanning electron microscopy image can include more than 300 extractable physical features.
The research team developed a feature extraction tool that uses computer vision and deep learning to determine which features are meaningful. The tool leverages open-source technologies that define engineered features, like edge and boundary detection, at a pixel level. The computer learns to weigh feature importance, then provides computed values that translate to mechanical performance prediction.
Han’s latest multidisciplinary project further enhances the development of HE materials by combining multimodal data for feature extraction. The effort aims to correlate additional material properties, such as crystal purity, with HE performance by identifying features in images and numerical values from varied sources. The team hopes to advance the application of machine learning algorithms for small data sets while also implementing physics-based approaches.
“Data science provides materials scientists with a whole new toolbox to make our work easier and more fruitful,” Han states. “The method you are most familiar with may not be the most efficient way to accomplish your goals.” He cautions that finding the right solution begins with identifying the problem—in other words, asking the right question. As his team’s projects illustrate, the questions What are the important ingredients and processing steps in silver nanowire recipes? and Which physical features dominate an explosive’s performance? require tailored solutions.
Pictured (left to right): Jinkyu Han, Brian Gallagher, Bhavya Kailkhura, Anna Hiszpanski, Sookyung Kim, Peggy Li, Emily Robertson, Yong Han, T. Nathan Mundhenk, Shusen Liu, Dave Buttler, and Matt Rever.
Not pictured: Karthik Chellappan and Hyojin Kim.