“Working with the TBI dataset has presented us with one of the most complicated computing challenges we’ve ever faced. It’s really pushed us, but we’ve met the first set of challenges with success.”
– Shankar Sundaram
Traumatic brain injuries (TBIs) affect millions of people each year, whether from car accidents, sports injuries, or on the battlefield. The Defense and Veterans Brain Injury Center reported nearly 414,000 TBIs among U.S. service members worldwide between 2000 and late 2019—yet treatment and, in many cases, even diagnosis remain elusive.
Outcomes from TBI can range from a complete recovery after a severe injury to debilitating depression or a personality disorder from what, at first, appeared to be a mild incident. Traditional clinical tools do little to explain why different injuries produce different symptoms, to predict a patient’s prognosis, or to identify precision treatment for individual patients.
Making sense of the brain with its 100 billion neurons and tens of thousands of connections in the service of veterans is perhaps one of the best possible applications of data science and the use of Lawrence Livermore’s expertise in data science expertise and high-performance computing (HPC). “No two brains are alike. And even the same brain looks different before and after TBI. The brain seems inscrutable, but we have data from clinical tools like CT [computed tomography] and MRI [magnetic resonance imaging], combined with patient records. Data science can tell us where to look for patterns so we can make predictions and help patients,” says Shankar Sundaram, director of Livermore's Center for Bioengineering.
In March of 2018, LLNL joined the Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI)—a national multi-year, multi-disciplinary effort funded by the National Institute for Neurological Disorders and Stroke and led by the University of California San Francisco in collaboration with Lawrence Berkeley and Argonne (ANL) national laboratories and other leading research organizations and universities. TRACK-TBI aims to collect comprehensive multimodal data—demographics, previous medical histories, injury details, clinical information, blood-based biomarker measures, multiple imaging modalities with radiologist annotations, MRI annotations, and outcome measures at 3, 6, and 12 months post-injury—from thousands of TBI patients.
“TBI is incredibly complex. It’s also a brand-new field for data science. We’re looking to aggregate information across multiple modalities, but our models must be reliable and interpretable so clinicians can understand and trust the results,” says Sundaram, one of the lead investigators of TRACK-TBI. “Working with the TBI dataset has presented us with one of the most complicated computing challenges we’ve ever faced. It’s really pushed us, but we’ve met the first set of challenges with success.” To advance supercomputing and machine learning (ML), uncover fundamental new insights into how to diagnose and treat TBI, and help deliver precision medicine to patients, LLNL researchers have produced a robust computational pipeline for researchers and clinicians and have deployed statistical ML to integrate anonymized TBI patient data.
The MaPPeRTrac Container
A key challenge of this project required building a suitable data infrastructure or compute pipeline to transfer, store, and provide clinicians and TBI researchers access to complex multimodal patient data at high speeds. “We looked at all of this data and asked, ‘What can we do to make this better?’ In the past, it would take 24 hours to process a single MRI. What do you do when you have data for thousands of patients? The Lab has the HPC and data science expertise, so we developed a compute pipeline that’s smarter and faster. The size of the dataset should not be a limitation,” says Peer-Timo Bremer, research scientist in LLNL’s Center for Applied Scientific Computing, who helped developed the Massively Parallel, Portable, and Reproducible Tractography (MaPPeRTrac) brain tractography workflow with software developer Joseph Moon.
“MaPPeRTrac was a challenging project since we needed to address HPC and neuroimaging problems at the same time. We iterated many software architectures until we hit on the final arrangement: user-space scripts with access to containerized scientific programs,” says Moon. MaPPeRTrac is a containerized parallel compute infrastructure which fully automates the tractography workflow from management of raw MRI data to edge-density visualization of a connectome—the stunning visualization of the brain’s neural connections.
MaPPeRTrac’s data and dependencies, handled by the Brain Imaging Data Structure and Containerization using Docker and Singularity, are de-coupled from code to enable rapid prototyping and modification. Data artifacts are designed to be findable, accessible, interoperable, and reusable in accordance with FAIR principles. In collaboration with an ANL team led by Ravi Madduri, the pipeline uses the PARSL parallel programming framework developed by ANL to exploit all available HPC resources in a portable and scalable manner, allowing the team to create connectome datasets as well as validation studies of unprecedented size.
The MaPPeRTrac container enables high-performance, parallel, parametrized, and portable generation of connectomes that is fast, efficient, well-tested, robust, and easy-to-use. To lower the barrier to entry for users and democratize access to this tool across the research community, Bremer’s team has made MaPPeRTrac open source via GitHub so that any researcher can generate connectomes using state-of-the-art software libraries and HPC and will hopefully provide clinicians in diverse hospital settings off-the-shelf access to clinically actionable data.
The team examined the statistical stability of probabilistic tractography and the predictability of its computations using the Department of Energy’s ability to process the massive amount of data from 88 subjects. “We spent significant time choosing the best data science techniques for MaPPeRTrac’s outputs. The neuroimaging community has high standards, and we ultimately tried over 20 different algorithms to cover the most possible interpretations of our findings,” notes Moon.
This analysis of connectomes has never been performed before because of prohibitive computational cost and time. The team’s findings determined that reducing the number of tractography streamlines can speed up processing and that this reduction does not impact how well one can identify the connectome of a specific patient among a cohort of participants given an independently computed connectome from a prior MRI scan.
An Algorithm That Can Handle Messy, Missing-Ness
Another major challenge was developing an algorithm to process multimodal data and produce reliable, interpretable diagnostics and prognostics even when some of the data is missing or uncertain. “A research subject with incomplete data would typically be dropped from an algorithm’s analysis. But in real-world situations, life is messy, someone forgets to ask a question or leaves an answer blank, so we had to figure out how to ensure that we can utilize all subjects’ data, even if there are gaps,” says Alan Kaplan, an LLNL research scientist who developed ML methods for the TRACK-TBI consortium.
The Livermore team created a statistical model based on a latent-state representation that computes the likelihood of any combination of variables, while accounting for arbitrary “missing-ness” patterns. One important question the model can address is, ‘Are there groupings or sub-groupings that emerge despite gaps in the data?’ The approach reveals sub-groupings of subjects that correspond to varied TBI outcomes, such as memory loss versus global function, that could inform clinical decision-making.
In addition, the model can be used to predict patient outcomes using biomarkers, moving toward the idea of enhanced specificity among subgroups of TBI patients. “We now know that labels of ‘mild, moderate, or severe’ are inadequate in terms TBI,” explains Kaplan. “The model we’ve developed incorporates data that portrays TBI’s truly multifaceted nature. The initial findings indicate a 20% improvement over baseline predictions, and we hope that with further validation, it could help clinicians make more accurate prognoses and aid in decision-making for early clinical management and determining targeted treatment interventions.”
Having developed MaPPeRTrac to synthesize and process MRI connectome data and a statistical algorithm to identify and organize swaths of TBI patient data despite missing data points, the Livermore team, working with TRACK-TBI, has a proof-of-concept and hopes to scale the project by collaborating with other institutions including the National Collegiate Athletic Association, Veterans Affairs, and the Department of Defense to leverage their TBI datasets and further engage the research community in identifying the best ways to help these patients.
Shankar Sundaram, Peer-Timo Bremer, Alan Kaplan, Joseph Moon, and Aditya Mohan.