Statistical framework synchronizes medical study data
The risks and benefits of heart surgery, chemotherapy, vaccination, and other medical treatments can change based on the time of day they are administered. These variations arise in part due to changes in gene expression levels throughout the 24-hour day-night cycle, with around 50% of genes displaying oscillatory behavior.
To evaluate new therapies, investigators study how a gene’s oscillatory behavior changes under different experimental conditions. Yet a problem can still arise when measuring this behavior relative to patients’ internal clocks. LLNL Computational Engineering staff member Tavish McDonald collaborated with computational scientist Michael Gorczyca of MTG Research Consulting and University of British Columbia PhD student Justice Sefas to improve statistical analysis of gene expression data for these types of clinical studies. The team analyzed data from several circadian transcriptomic studies and developed a statistical framework that accounts for individual differences in a gene’s oscillatory behavior (preprint available).
Statistical analyses for circadian gene expression studies typically use 24-hour day-night cycle time, or Zeitgeber time (ZT), to model these oscillations. However, oscillation timing is based on an individual’s internal circadian time (ICT), a 24-hour timing system that is offset relative to ZT due to genetics, age, and environmental conditions. Gorczyca explains, “Suppose Tavish and I live in the same place, and the ZT is 4 PM. Given differences in our biology and lifestyles, my ICT as a ‘night owl’ is 2 PM, and Tavish’s ICT as an ‘early bird’ is 6 PM.”
Relying on ZT instead of ICT to model gene expression over time has far-reaching consequences. “Many statistical analyses in this field assume that every study participant has the same internal timing system. This is rarely true,” says Gorczyca. “In a related project, we observed that violations of this assumption decreased the magnitude of studies’ test statistics by as much as 25%, and we concluded that p-values became large enough that investigators could miss new discoveries.”
To determine one’s ICT offset relative to ZT, a useful proxy is the start of melatonin production in the body, which begins before sleep in response to fading light. This dim-light melatonin onset (DLMO) measurement is informative but difficult to ascertain experimentally. McDonald says, “For studies that determined DLMO and considered it to be a person’s offset, we observed statistical power increased. The problem is, determining DLMO through repeated blood or saliva testing can cost hundreds of dollars per participant. Sometimes, DLMO might not be reliably determined at all.”
To address these challenges, the team used a mathematical tool called deconvolution to save investigators from having to determine each participant’s DLMO. Instead, the team’s framework only requires prior statistical knowledge such as the average and standard deviation of DLMOs across participants. Because this knowledge is rare in practice, the team computed “rule of thumb” statistics with data from 13 published studies on how DLMO varies among people. Using these statistics, they consistently found the numeric values output by their analysis framework matched those from an analysis where an investigator knew each person’s DLMO.
The team says their statistical method could remove time and cost barriers in determining DLMO, which would encourage more researchers to investigate this space and identify personalized treatment therapies for patients. “Communicating with biologists and statisticians was essential for the success of this effort, and we were fortunate to find researchers in both spaces who were willing to share their knowledge, including their datasets,” says McDonald.