The Exposome and Human Health: A Conversation with Dr. Chirag Patel
interview by evan Hsiang
HHPR Executive Content Editor Evan Hsiang interviewed Dr. Chirag Patel, PhD, Associate Professor of Biomedical Informatics at Harvard Medical School. Dr. Patel’s research seeks to address human health through computational analysis of electronic medical records, genomic data, and environmental exposures. Dr. Patel received his PhD from Stanford University. This interview has been edited for brevity and clarity.
Evan Hsiang (EH): Dr. Patel, thank you so much for taking time out of your day to speak with me. Could you explain what the exposome globe is and its medical applications?
Dr. Chirag Patel (CP): The exposome is this expansive concept that is, I believe sooner rather than later, coming to a more concrete definition. It is a complement data representation to the genome. When we think of the genome, we think of the DNA nucleotides that make up the map of how proteins are made, which do the functions of all species, and in disease research, the sequencing of the genome has been a paradigm shift in the way that we've been able to develop measurement tools to measure the genome at a very large scale. This enables us to figure out what genetic variants, or places where you and I might differ along our DNA, are associated with disease and biological phenotypes. It has led to new insights on causality of disease, but what has been missing is the role that environment plays.
It's been hypothesized that it's not only the genome, but also the ways that our genes interact with the environment or the exposome, that bring about phenotypic change. Traditionally, we've been studying the environment or the exposome by asking people about their environment. So an epidemiological investigator might ask, “What was your diet like yesterday?” and record this into a large database and then associate that information with your future outcomes. Now, those methods have been very powerful, but what is missing is the way the environment interacts with the genome and the totality of how these different environmental factors may interact in contribution to that outcome. For example, diet is not a monolithic thing; it's correlated with our age, so when we are young, our diet and bodies are much different from when we age. Our health also covaries with where we live, other environmental exposures, and potentially infectious disease, so we missed this totality of things that could be interacting with one another in connection with the disease outcome.
The exposome tries to bring about a systematic view or taxonomization of environmental factors so that we can analyze them in relation to a phenotypic outcome. It serves as a better predictor of disease that can help us uncover the ways that these environmental factors interact so that we can develop more effective interventions. The exposome globe is a visualization of how these environmental factors are correlated with one another or how they might analytically interact. You have different exposure factors that are measured, like vitamin D levels of an individual, beta carotene levels of an individual, whether or not an individual had flu or COVID-19 at a certain age, and lead levels, and the globe shows how they're all connected to one another. The globe is depicting, for different environmental factors, how this co-occurrence occurs to show a gestalt such that if you measure just one thing, you lose the correlation of the others that may also contribute to a particular phenotype or disease outcome.
Another analogy I like to make is that the globe is a representation of correlations that have been well-studied in the genetic world. Through sequencing projects, we know how individuals of different racial or ethnic groups have different correlation structures and thus different globes than one another, which has been immensely helpful for us to understand the role of population structure in the association of genetic factors with phenotype. That is, if we identify a genetic factor with a disease outcome, it could be that the genetic factor is the causal agent, or could be coming along for the ride along with an individual's race or ancestral background. The analogy makes sense with the exposome in that all these things coming together are correlated with one another. And it may not be just one of those things that are connected with the outcome, phenotype, or disease, but maybe all those things simultaneously.
EH: That's fascinating. One of the limits of the exposome you alluded to is that it can only draw correlations and associations rather than causal relationships. How do you deal with that limit when you're drawing conclusions about factors for disease?
CP: It’s a great question. I think it persists throughout all of biomedicine when we're dealing with these observational cohorts. It can be a bug in the future in that we're dealing with observational data, environmental information, environmental factors, but we can never achieve the gold standard of experimentation, which is being able to apply an exposure to somebody or to a population, randomize them to a treatment, and then observe the outcomes. If we could do that, we would break what is probably the largest threat in observational studies: the phenomenon known as confounding. It's a threat because if the causal claim between, for example, beta carotene and diabetes is confounded by a third variable, it's not causal; potentially, it's just an indicator variable. The problem is amplified by maybe hundreds or thousands of variables.
We try to mitigate this first by doing replication among different cohorts. The idea is that confounding could be a one-off thing that may be very specific to a particular cohort or setting, and if you're not able to replicate the result, that might be a good indication that the variable is not generalizable, but also probably not causal. We also try to stitch together multiple lines of evidence. For example, there may be a very compelling model system study like a mouse model that might expose these mice to an exposure that you might have found in a population or epidemiological setting that also exhibits similar phenotypes as your epidemiological study. Since that exposure and phenotype might be seen in another organism or setting, that might add to the evidence base of a particular causal agent or what we think is a hypothetical causal agent. There are other analytical tricks we can apply, and these are things that were passed down in our field from icons like Bradford Hill, who established these heuristics to determine causality, one of which is looking at a different setting, trying to replicate mouse models and other biological systems. Other indicators could be the size of the correlation; we might hypothesize that the larger the correlations might be, the closer you might be to making a causal statement about the factors. But this is all at large because when we're talking about the exposome, it kind of goes against some of these Bradford Hill criteria on our heuristics because we're hypothesizing that multiple factors influence the phenotype while the Bradford Hill criteria assumes that only one factor influences the outcome. So the general challenge of the field going forward is how do we make a causal statement about multiple things rather than a single thing, but right now the field tends to use that single-thing heuristic for the factors that we do find.
EH: You had a 2018 study on HIV and Zambia that showed how factors like wealth, widowed status, and condom usage can increase risk, while other factors like breastfeeding could potentially decrease risk. How do we use that knowledge from these findings to create effective preventative measures that are still culturally sensitive and account for the lifestyles of local populations?
CP: Yeah, that's a great question and a difficult one that I probably won't be able to give a specific answer to. But I think we can start by using this database as a way of prioritizing things that we do need to study in more detail, and I think your question is getting to, What are those details? We have this database with a bunch of factors that, among the myriad of things that we could have tested, seem to be causal for a particular event like HIV positivity. I think the next thing one needs to do particularly for those studies, which happened to be cross sectional data driven and hypothesis generating, is potentially test those things in those settings in a longitudinal fashion. So being culturally sensitive and specific might mean working with local investigators in those particular areas and recruiting a cohort that you are able to see, manage, and monitor over time, with those specific factors in mind. That entails building the instruments to measure those variables that we're finding at scale; it means recruiting individuals who are willing to participate in this research but also keeping in mind that they're potentially at risk for HIV. And then the hardest part is to follow those individuals over time to see who gets HIV and who doesn’t. I think that's the first set of things that you might need to consider.
The second thing to keep in mind is the type of variable you have. Some of those factors might be intervenable in a way that allows us to make causal statements. One example that I think emerged from that study, but emerged more prominently in a second study that we did across Sub-Saharan Africa, was the emergence of circumcision as a variable for males. So this is closer to a behavioral health intervention rather than a classic exposome variable but that variable has been studied extensively. It was found in an observational study way before ours was and had been evaluated in a randomized study, and it is used as a current intervention for mitigation of HIV infection and complications. So that might be one example of a place where, among that large database of variables that you might find, some might be intervenable, and those might be ones that are ripe for a causal randomized study.
EH: As you mentioned, you did extend this study to Sub Saharan Africa. How has your understanding of HIV risk factors changed after conducting this recent study?
CP: Yeah, it's changed in two ways. First, it was kind of surprising how some of these variables were consistent across these different countries. My prior assumption was that if I find a variable in one place, it might be generalizable for that one culture or area, but I didn't expect the consistency that we found for other countries, albeit the scale of the risk was a little bit different. The second more surprising thing was that the overall odds ratios, or the ways that we measure risk, are quite modest across all these variables. It was obvious to me that HIV was a complex phenotype with multiple behaviors, environmental factors, lines of individual-level susceptibility affecting who gets infected and who doesn't, but the overall distribution of that risk was surprisingly small given so many different factors, which again points to the fact many factors might contribute to HIV.
EH: Your work lies at this intersection of bioinformatics, epidemiology, and policy. How does the heterogeneous nature of exposures lend itself to a multidisciplinary collaboration to implement preventative measures?
CP: It's a great question. The heterogeneity of these variables is a bug in the future analytically: sometimes we get criticized because we're analyzing all of these variables all at once and we should be treating each of these variables in different ways since they might have different confounding structures. But I think, as you're alluding to, it's a feature that allows us to collaborate with investigators who we have never worked with before. For example, several earlier studies had pointed to dietary variables coming out for certain phenotypes, so I reached out to experts in dietary methodology and instrument creation and learned a lot about the state of the field. Another instance was when we were examining these behaviors that help identify HIV positivity; that led me to look at interventions that had been done in HIV, some of which have won Nobel Prizes at MIT, for example, and that, you know, has been pretty enlightening. So yes, the heterogeneity of the variables have allowed me to think about the pathways of exposure and new ways of collaborating with people.
EH: In your paper, “Development of Exposome Correlation Globes to map out Environment-wide Associations,” you found that Type 2 diabetes is linked with exposure to manufacturing chemicals like heptachlor epoxide. What can these findings tell us about the structural and behavioral causes of diabetes?
CP: Yeah. In a way it tells us a lot, but in a way it tells us really nothing. The heptachlor epoxide and diabetes one is an interesting example. It was what we call a strong association, namely that it was one that exceeded statistical significance in a very robust way, and what we call the effect size, or the risk that we are able to find, was quite large. But again, it comes back to our question about confounding and causal statements. If you were to ask me, “If we were to remove heptachlor epoxide from the population, will our rates of diabetes lower?” I'd be doubtful because that factor comes along with many different other factors, so it's hard for me to say, “Yes, it’s heptachlor epoxide.” Or it could be that individuals who have higher levels of heptachlor epoxide might have higher BMI, be obese, or be prone to certain diets that lead to obesity and diabetes.
I will say that this chemical was a persistent pollutant that was used for termite eradication and pesticide control in the Midwest. It had been banned in the mid ‘70s. So it's still floating around in our fatty tissue, and it could be that it might be found in diets that obese people are more prone to consume than non obese people, and that's what we're picking up. That's the hypothesis I’d probably lean on. It's some indicator of some adverse diet or caloric intake that is leading to higher BMI that then is leading to diabetes and not so much that factor itself.
EH: On the other hand, we find some things like diet and lack of physical activity that are very well known to be linked to diseases and health conditions. What are the challenges to implementing programs to reduce behaviors that are already so well recognized as detrimental to health?
CP: I'm speaking out of my expertise area here, but there is an inspiring paper to me that came out in the New England Journal of Medicine where individuals are randomized to a diet in a lifestyle intervention called the Diabetes Prevention Program or they're randomized to this drug called metformin, which lowers glucose levels. The lifestyle program included interventions that reduced smoking, increased physical activity, and decreased caloric intake towards lowering weight to reduce diabetes incidence. The results showed that individuals who had these interventions had dramatically lower risk for diabetes on par if not better than the drug. What level did this intervention occur? And how strong was it? The thing is that these interventions were extraordinary and exceptional. The diabetes prevention investigators asked individuals to come in to basically get their diets monitored by an external third party, so they had given them a very constrained tray of food. That caloric intake was monitored; they'd be made to make food logs and monitor their physical activity. So it’s an Orwellian type of monitoring of somebody's behavior to ensure that they were adhering to this diet, and they showed extraordinary results. So I think the next level in this area of cardio-metabolic research in diabetes is ways we can intervene such that these interventions that we know are causal, like caloric restriction and physical activity, encourage people to stick to the regimen, even though it's so onerous. I think that the next paradigm shift for diabetes prevention is if we had ways of enabling people to remain on their diets.
EH: Lastly, what would you say to aspiring bioinformaticians seeking to tackle these large health crises?
CP: I would try to read a lot and pay close attention to what's happening with the scientific literature. Ask what good questions there are to research and talk to doctors and physicians, who see the problems every day, to make sure that the topics we're thinking about remain impactful. Study stats and computer science. I think I'm kind of drinking the Kool Aid here, but I do think that the next era of research in this area will be very data focused. I think our ways of measuring, like the exposome, have become immense. Now we can measure genes on a single cell level, measure the environment, and measure on a molecular level. We can even measure genetic variants. So I think there's this confluence of different large-dimension data that are very tricky to sift through, and that would be a skill set that would be most coveted for biomedical researchers to have.
EH: Again, thank you so much for this wonderful conversation.
With over 125,000 lives lost to coronavirus in the past few months, the need for adequate healthcare access has never felt more pressing. And yet, there is an impasse.