“We don’t have much ground truth in biology.” According to Barbara Engelhardt, a computer scientist at Princeton University, that’s just one of the many challenges that researchers face when trying to prime traditional machine-learning methods to analyze genomic data. Techniques in artificial intelligence and machine learning are dramatically altering the landscape of biological research, but Engelhardt doesn’t think those “black box” approaches are enough to provide the insights necessary for understanding, diagnosing and treating disease. Instead, she’s been developing new statistical tools that search for expected biological patterns to map out the genome’s real but elusive “ground truth.”
Engelhardt likens the effort to detective work, as it involves combing through constellations of genetic variation, and even discarded data, for hidden gems. In research published last October, for example, she used one of her models to determine how mutations relate to the regulation of genes on other chromosomes (referred to as distal genes) in 44 human tissues. Among other findings, the results pointed to a potential genetic target for thyroid cancer therapies. Her work has similarly linked mutations and gene expression to specific features found in pathology images.