Skip to main content icon/video/no-internet

Genetic epidemiology aims to identify genetic variation related to risk for disease. Because it is currently not feasible to fully sequence the genomes of every person in a sample, the field has traditionally relied on genetic markers with known locations to act as surrogate information for the surrounding sequence. These markers are typically called polymorphisms to reflect the concept that they are locations in the genome with variability within and across individuals (i.e., they have multiple forms or ‘spellings’). The ability for markers to act as surrogates for surrounding sequence is a function of a genetic property called linkage and a related concept of linkage disequilibrium, which results in correlation between polymorphisms and surrounding sequence. Because markers are often simply proxies for unmeasured sequences that can influence risk for disease, markerbased approaches are often termed indirect association studies. Emerging technology has greatly increased the catalogue of such variable sites in the human genotype and the ability to accurately and affordably genotype individuals at these markers, such that marker-based genetic epidemiology is now the paradigm for most studies.

The field of genetic epidemiology aims to identify genetic variation that is related to disease. This can be a daunting task, considering that the size of the human genome is around 3 billion base pairs. Finding a single genetic variant that influences risk for a disease would be like finding a single misspelled word among 3 billion letters. This is often compared with trying to find a single misspelling in an entire encyclopedia. While one could carefully read the entire encyclopedia to identify misspellings, this could take an enormous amount of time and many misspellings may simply be overlooked. Furthermore, technology has traditionally limited the ability to sequence (‘read’) the entire genome of each participant in a study. Instead, the field has relied on genetic ‘markers’ located at known locations in the genome to represent the surrounding sequence. Continuing the encyclopedia example, this would be like marking the first three sentences of every entry, so that the specific location in the context of the encyclopedia is known.

Genetic markers are ‘polymorphisms,’ meaning they contain variable sequence (literally ‘multiple forms’) within and across individuals of a population. The most common types of markers in the human genome are single nucleotide polymorphisms (SNPs), simple tandem repeat polymorphisms (STRs), and insertion/deletions (indels) (see Figure 1). SNPs are defined by the existence of more than one nucleotide at a particular position in the genome. For example, at a genomic location with the sequence ACCTGA in most individuals, some may contain ACGTGA instead. The third position in this example would be considered an SNP with either a C or G allele. Because each individual inherits one copy of their genome from each parent, each person has two copies, and therefore three types of individuals can be distinguished based on this polymorphism: those with two copies of the C allele (homozygous CC genotype), those with one C and one T allele (heterozygous CT genotype), and those with two T alleles (homozygous TT genotype). The three genotype groups of this marker can be used as ‘exposure’ categories to assess association with an outcome of interest in a genetic epidemiology setting. Should such an association be identified, researchers may investigate that ‘marked’ genomic region further to identify the particular DNA sequence in that region that has a direct biological effect on the outcome of interest.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading