The sequencing of the human genome, the characterization of the patterns of human genetic variation, and technological and methodological advances in genotyping and sequencing studies have underwritten a veritable explosion in genetic discovery. Crucially, these studies have queried the entire human genome in an agnostic fashion, free from the constraints of pre-existing biological knowledge and thus enabling the implication of heretofore unsuspected pathways. Larger sample sizes achieved via international collaboration, improved imputation methods and next-generation sequencing techniques have expanded the allele frequency spectrum for variant association, allowing for the detection of low-frequency variants and the targeting of specific ethnic subgroups. In this manner, over the last decade, nearly 100 loci have been associated with type 2 diabetes or related traits in multiple populations.
The Florez lab has been involved in many of these efforts. We participate in the DIAGRAM Consortium and its multiethnic counterpart DIAMANTE, focused on GWAS meta-analyses in type 2 diabetes; we co-lead MAGIC, focused on GWAS meta-analyses in quantitative glycemic traits; we co-lead GENIE and the JDRF-funded Diabetic Nephropathy Collaborative Research Initiative, focused on GWAS meta-analyses for diabetic kidney disease; we lead the SIGMA Consortium, focused on genetic discovery in Latino populations; we participate in the CHARGE and AAGILE Consortia; and we co-lead high-throughput sequencing studies in T2D-GENES. We endeavor to place all of this information in the novel AMP-T2D Knowledge Portal.
Though together these variants only explain 10-15% of the inherited cause of type 2 diabetes, the approach has proven successful and the methods have been streamlined: it is likely that the accrual of larger sample sizes (e.g. in developing nations or large health care systems) as costs continue to drop will only continue to advance discovery. In the meantime, several key insights have emerged. The GWAS we and our collaborators have undertaken have established ?-cell function as the focus in type 2 diabetes pathogenesis, complementing prior observations in monogenic diabetes; they have revealed causal links between metabolism and circadian rhythmicity, fetal development or lipid regulation that were previously highlighted by epidemiological correlations; they have identified new pathways (e.g. zinc transport into ?-cell granules, KLF14 target genes in adipocytes, melatonin signaling, or monocarboxylate transport) in type 2 diabetes pathogenesis; and they have enabled a more comprehensive exploration of the genetic architecture of the disease, setting boundaries for the effect sizes and allelic series that comprise the likely universe of disease-causing variation.
Confirming robust genomic associations is only the beginning. These signals serve to plant a flag in a given genomic region, where a haplotype (a linear arrangement of correlated genetic variants) is more often present in disease than in health. However, the physical proximity of the index variant to a protein-coding gene does not imply that this is the gene that, when mutated, gives rise to the phenotype. The variant could be disrupting an enhancer element or another regulatory region for more distant genes (including those that encode microRNAs or long non-coding RNAs, for instance), misleading naïve investigators about the relevant drug target or the gene to disrupt in model organisms. Thus, it is essential that genomic studies be followed by principled searches for the effector transcript that underlies each genetic association.
One potential avenue involves the discovery of coding mutations (either in isolation or in aggregate) that disrupt protein function and phenocopy the original association. For noncoding variants, ancillary information on the pattern of tissue expression of index genes can be found in the GTEx database, which combines expression and human genomic data across many human tissues. This allows one to establish the presence of the transcript of interest in physiologically relevant organs, as well as examine whether noncoding variants associated with the disease phenotype affect message levels (eQTL analysis). Experimental validation that the allelic change leads to the expected perturbation in enhancer or promoter activity is arduous to obtain, but no less crucial in demonstrating causality.
Identifying a likely effector transcript via the above approaches does not by itself establish the direction of effect. At times, variants that change amino acid sequence will have a clear effect on protein function, aligning the direction of the molecular consequence with the disease risk allele. Very often, however, a single amino acid change has no discernible impact, and a search for mutations that alter the protein unambiguously becomes necessary. Through large-scale sequencing approaches, investigators with access to diverse cohorts can identify protein-truncating variants (PTVs) that disrupt protein function (e.g. stop codons, intron-exon splice acceptor sites, frameshifts or read-through mutations), enabling the study of physiological consequences of haploinsufficiency at that site in living humans. If PTVs are statistically more frequent in disease than in health, it can be presumed that their effect on the protein (whether loss of function by deletion of a key activity domain, or gain of function by deletion of an inhibitory domain) is deleterious, and therapies should counteract this effect by either raising the activity or expression of the affected protein (if the PTV induces loss of function) or inhibiting its activity or expression (if the PTV induces gain of function). The reciprocal strategies would be employed if PTVs are found to be protective. Finally, corroborating proof can be obtained by overexpression, silencing or knockout experiments in appropriate cellular or animal model systems, now facilitated by genome editing technologies such as CRISPR/Cas9.
Of the 100+ loci associated with type 2 diabetes or related traits, the Florez lab and collaborators are focusing on the functional characterization of select genes: these include SLC16A11, IGF2, HNF1A, SLC2A2 and ZMPSTE24.
We have entered a new era in which technological developments coupled with expanded computational power and increased statistical sophistication have enabled the global query of discrete biological axes. Manufacturing advances introduced arrays that could measure mRNA transcript levels or DNA single nucleotide variation comprehensively in a single experiment. A deeper knowledge of the patterns of human genetic variation allowed for the explosion of genome-wide association studies (GWAS). Remarkable efficiency gains in high-throughput next-generation sequencing seeded whole-exome and whole-genome sequencing studies, as well as ancillary explorations of the transcriptome (RNAseq), transcription factor binding sites (ChiPseq), open chromatin (ATAC-seq), the epigenome, or the microbiome. Mass spectroscopy can be applied to the study of small metabolites or proteins in organic fluids, including post-translational modifications. In this manner, multiple dimensions of the molecular architecture of biological systems can be interrogated with respect to native and perturbed metabolic states. This technological progress has been accompanied by concomitant enhancements in bioinformatic and analytical tools, often shared publicly in the pre-competitive space.
Crucially, nowadays all of these advances can increasingly be deployed in the organism of interest, the human. Health care systems have digitalized clinical information, and increasingly made the electronic medical record available to clinical investigation. Large private and even national biobanks have been created to streamline this research function, and both funding bodies and scientific journals have required data sharing in central repositories as a condition for research support or publication. We therefore live in the midst of a revolution of big data across all domains of the human experience, ranging from the molecular to the societal dimensions. We practice medicine and conduct research within an unprecedented whirlwind of data, spanning from populations to the individual. It will soon be possible to capture the metabolic state of a single patient at the molecular and cellular levels with great precision through multiple time points in his/her development.
An outstanding but crucial challenge to the field is our ability to integrate these disparate data sources in a manner that informs a holistic view of an organism, such that synergy begets understanding. While genomic explorations have only explained a minor fraction of the genetic contribution to the phenotype, in conjunction with physiological measures they can be used to improve our nosology of the disease, and begin to characterize the clusters that may define specific subtypes.
The integration of physiologic and pharmacogenetic information with genetic discoveries can offer additional insight. By perturbing a live human with a drug that targets a given gene and assessing the response to the perturbation, one may be able to “close the loop” and demonstrate that a gene associated with disease is indeed involved in producing the phenotype of interest. Conversely, drugs that modulate a specific limb of the glucose homeostatic system (insulin secretion, central or peripheral insulin sensitivity), if shown to have differential responses depending on genotype, may serve to prioritize genes in a given associated region.
The Florez lab works with longitudinal observational cohorts (e.g. Framingham Heart Study, the CHARGE Consortium, SEARCH, CAMP), richly phenotyped clinical trials (e.g. the Diabetes Prevention Program, Look AHEAD, TODAY), healthcare biobanks (Partners Biobank, UK Biobank), or our own pharmacogenetic or nutrigenetic studies (SUGAR-MGH, SIGMA) to study genotype-phenotype correlations and analyze physiological measurements to link genetic variation to human organismal biology.
Ultimately, we wish for discovery to improve human health. Genomic inquiry may impact disease categorization, patient stratification, clinical prediction, drug discovery or therapeutic targeting. Genomic discovery that is agnostic to pre-existing knowledge has uncovered dozens of loci that influence glycemic dysregulation. Physiological investigation has begun to define disease subtypes, clarifying heterogeneity and suggesting molecular pathways for intervention. Convincing genetic associations have paved the way for the identification of effector transcripts that underlie the phenotype, and in select cases genetic or experimental proof of gain or loss of function has clarified the direction of effect to guide therapeutic development. Genetic studies can also examine off-target effects and furnish causal inference. By curating this information and making it widely available to all stakeholders, we hope to help enhance therapeutic development pipelines by accelerating efficiency, maximizing cost-effectiveness and raising ultimate success rates.
Pharmacogenetics in the Florez lab remains a key objective and approach, subdivided into three separate but related goals:
1) Patient stratification- genetic data may be used to categorize individuals into subgroups based on clinical response to the drug of interest.
2) Target identification- agnostic genome-wide studies may identify genes that encode drug targets, elucidating their mechanism of action and enabling the design of novel drugs that act on the same pathway.
3) Functional characterization- because drugs perturb the human organism in vivo, detecting a differential response based on a given genetic variant may illuminate the function of the gene produce encoded by the gene that harbors the variant or whose expression is influenced by it.
Thus the Florez lab uses drugs as handles that perturb the human organism in vivo. Whether in retrospective clinical databases (e.g. PHARMGen), prospective clinical trials (e.g. the Diabetes Prevention Program), or our newly designed physiological studies (e.g. SUGAR-MGH), we capture the human response to diabetes drugs and examine how genetic variation influences this response.
We hope to transform clinical practice. In our vision, once robust pharmacogenetic associations are discovered and confirmed, they will be aggregated into genetic risk scores that explain a substantial proportion of the variance in glycemic response or the appearance of side effects. They will be included into multi-trait genotyping arrays that capture all known clinically actionable genetic variants for most common diseases and approved drugs. This “megachip” would only need to be deployed once in the lifetime of an individual at an affordable cost, and the information could become part of that person’s electronic medical record. Statistical algorithms that define the likelihood of response could be created, tested, refined, and automatically triggered once a prescription order is initiated; decision support tools elaborated by experts would then inform the clinician, at the point of care, whether this specific patient is a good candidate for the selected agent. The methods and expertise exist to realize this vision: but it will not come to fruition unless we generate the required knowledge base, the data stand rigorous scrutiny, and cost-effectiveness analyses demonstrate that the benefit of clinical outcomes outweighs the expense incurred. We are invested in carrying out the studies needed to accomplish this goal.