Whither the genotype-phenotype relationship? An historical and methodological appraisal
Abstract
More than a century ago, Wilhelm Johannsen proposed the terms “genotype” and “phenotype” to study heredity. Much of what we know about genetics and behavior has evolved since then, especially how causality from genotypes can be inferred from observational studies of phenotypes. Unfortunately, there are genotypes that produce complex clinical-behavioral phenotypes—pleiotropy. In addition, there are often many genotypes that produce the same phenotype, adding a layer of complexity in establishing valid genotype-phenotype relationships. Unlike the relative simplicity of some phenotypes, behavioral phenotypes, especially those characteristics considered aberrant, are multidimensional and often not easily defined operationally. An alternate approach which attempts to identify less evident manifestations below the level of the phenotype but along the pathway to the prospective genotype—endophenotypes—could prove useful in detecting genes that generate these markers. However, operational definitions of intermediate phenotypes vary, less overt neurobiological expressions for some disorders—autism—have not been found, and studies of endophenotypes associated with schizophrenia have been not been very successful. Another approach, suggested by Sewall Wright, uses path analysis to identify causal variables that produce phenotypes. Innovative models of causality have been developed recently by genetic epidemiologists that incorporate Mendel's second law, and Mendelian randomization has been successful in identifying genotypes associated with some diseases, for example, diabetes and cancer. Regrettably, shortcomings regarding genetic markers associated with intermediate phenotypes have been found, although there are statistical procedures to remedy matters. As in any science, genetic researchers need to consider carefully the models of causality they choose.
1 INTRODUCTION
… the students [of biology] have again and again tried to conceive or “explain” the presumed transmission of general or peculiar characters and qualities “inherited” from parents and remote ancestors
- Wilhelm Johannsen,“The Genotype Conception of Heredity,” March, 1911
Personal qualities are then the reactions of the gametes joining to form a zygote; but the nature of the gametes is not determined by the personal qualities of the parents or ancestors in question. (P. 130)
It is parental genotype, not the parental phenotype, that is transmitted and which produces the phenotype in their offspring. Johannsen's exposition on heredity was his desire to incorporate into the Darwinian theory of evolution what had been recently re-discovered about Mendelian inheritance, and to dismiss the Lamarckian theory of acquired characteristics as a basis of inheritance. A more modern evolutionary perspective was formulated by Stadler & Stadler (2006). They refer to the concept of “fitness maps” introduced by Wright (1932) in which mutations that occur in the genotype affect the fitness—the ability to adapt—in the phenotype. That is, genotype-phenotype relationships are dynamic and continue to play a central role in the evolutionary process of living organisms.
Although, Johannsen did not specifically refer to it as such, behavior is also a phenotypic trait. A half century later, behavior geneticists would embrace his archetypal model, Gene + Environment + Gene X Environment = Behavior, as the paradigm by which one could examine the genetic contributions to multidimensional behavioral phenotypes. Multifactorial statistical models developed in the 1960s employed sib pairs and twins to determine the extent to which genetics affected “traits” (behavior). Since its inception in 1970, the journal, Behavior Genetics, has published nearly 1000 articles on genotype-phenotype associations concerning various behavioral attributes, and the probable extent to which the phenotype has been altered by environmental factors. The concept of a behavioral phenotype as acquired from a specific genotype has been employed in both typical and abnormal development (For reviews, see Boomsma, Busjahn, & Peltonen, 2002; Johnson, Turkheimer, & Gottesman, 2010).
Much of what we know about genetics and behavior has also evolved and progressed significantly in the 100 years since Johannsen's address to the American Society of Naturalists. And, although the concept of the phenotype and its causal genotype remain a platform from which we model genotype-phenotype relationships, there are good reasons to be disquieted by Johannsen's original formulation as it relates to genetic disorders. A second, related methodological concern is the concept of causality and how causality from a given genotype can be inferred from observational studies of phenotypes.
Conceptual and methodological issues about behavioral phenotypes were addressed by Flint (1998), who raised three major challenges. First, how strictly should the defining criteria be; the so-called “quality” and “reactions” of the organism as noted by Johanssen. Second, what evidence would be considered sufficient for including a particular behavior or pattern of behavior as a component of the phenotype? Third, what would be achieved by delineating a behavioral phenotype?
It would seem that, having decided upon the defining criteria, one should have established the necessary and sufficient conditions for the behavioral phenotype in question. On closer inspection, however, Flint's second concern entails a detailed assessment of the first. The importance of the third question was satisfied in part by Johannsen much earlier: “The genotypes can be examined only by the qualities and reactions of the organisms in question” [P. 133]. The importance of establishing a well-defined phenotype was illustrated by Flint (1998), citing Lesch-Nyhan syndrome as an example. In elucidating the genotype-phenotype relationship, Johannsen did not consider the possibility of a genetic mutation producing a phenotype, but it does not seem inappropriate to do so. The behavioral phenotype stemming from a genetic abnormality also presumes a causal relationship, and the phenotype is not just “syndrome-specific behavior” (Flint, 1998).
The concept of a behavioral phenotype associated specifically with a genetic abnormality was first articulated by Nyhan (1972), in his Presidential Address to the Society for Pediatric Research. In it, he describes a behavioral phenotype, which came to be called Lesch-Nyhan disorder (LND) or syndrome, as a collection of abnormal behaviors and biochemistry produced by HGPRT deficiency. LND was first diagnosed as an inborn error of metabolism and later discovered to be the product of an X-linked recessive disorder in the HPRT1 gene. Another early report of an association of an abnormal behavioral phenotype produced by an inborn error of metabolism which was later found to be the result of a genetic abnormality was phenylpyruvic oligophrenia phenylketonuria (PKU). First commented on by Penrose in 1935 (Scriver, 1995), PKU was found to produce varying degrees of intellectual disability (ID) and later, of autism (Friedman, 1965). In 1951, Louis Woolf demonstrated that the chemical phenotype of PKU responded to the dietary restriction of phenylalanine (Cf. Alonso-Fernández & Colón, 2009 for a review), and was subsequently shown by Dobson, Kushida, Williamson, and Friedman (1976) to improve cognitive function as expressed by increased IQ scores. Subsequently, haploinsufficiency of the gene, PAH, was found to produce PKU (Ledley, Grenett, DiLella, Kwok, & Woo, 1985), thus establishing a genotype-phenotype relationship.
Although these two disorders are presented as exemplars of abnormal behavioral phenotypes of the sort of genotype-phenotype relationship that might have been depicted by Johannsen, the link, and the model, became more complex as outcomes from later research emerged. According to Blau (2016), there are more than 950 variants of the PAH mutation producing three different phenotypes: Classic PKU, mild PKU, and mild hyperphenylalaninaemia (HPA). Neurophysiological symptoms of the phenotype can be ameliorated by diet; and, for some mutations, dosing with BH4. In their study of the Japanese population, Dateki et al. (2016) noted that, in patients with HPA, more than 500 mutations have been reported; and, in individuals with PKU, more than 60 different mutations were discovered. These researchers also found several different mutations associated with the classic PKU phenotype, but milder forms of HPA were associated with two specific mutation types. Aldámiz-Echevarría et al. (2016) examined PKU genotype-phenotype relationships in a Spanish population and found certain mutations correlated well with some known phenotypes, others did not. As a consequence, phenotypic variants in PKU have been associated with different genotypes, but not all mutational genotypes correlate with a particular phenotype. In addition, it has been observed that not all forms of HPA map to the PAH locus; and, that dietary treatment, even administered from birth onward, may not ameliorate cognitive decline or neuropsychological dysfunction as individuals age (Scriver, 1995). In other words, PKU always produces HPA, but not all HPA can be defined as PKU.
Similarly, in LND there are many mutation types in the HPRT1 gene leading to deficiencies in the enzyme, HGPRT (Ceballos-Picot et al., 2015). In addition to the overproduction of uric acid, the classic clinical LND phenotype consists of motor and neurocognitive disabilities, and persistent self-injury. However, as Fu et al. (2014) note, mutations occur within HPRT1 at various loci within the gene and include a broad array of mutation types. And, although the severity of the disorder is associated with enzyme activity, some mutation variants—deletions, insertions, duplications—are associated with a milder phenotype. As for the latter, the mildest form, although rare, consists of uric acid overproduction only (HRH). Based on Flint's (1998) second proviso, should LND be ascribed to an individual with HRH only?
Findings in which genotypes produce diverse clinical-behavioral phenotypes such as in PKU and LND suggest these disorders engender pleiotropic phenotypes.
1.1 Pleiotropy as a possible confound in genotype-phenotype associations
Coterminous with Johannsen's address in 1910, Ludwig Plate introduced the term, “pleiotropic,” to the genetic vocabulary. Although Mendel did not use that term, he described three attributes of a particular strain of flower that always occurred together. He considered the three correlated and produced by a single factor (Cf. Stearns, 2010). According to Stearns (2010), Plate's original definition is: “[a] unit of inheritance [is] pleiotropic if several characteristics are dependent on it … [and] will then always appear together.” In his researches, Johannsen also found that the phenotype was not always expressed in the same manner from its given genotype; and, that a particular phenotype could have stemmed from one of several different genotypes. The logical consequence of both observations alters the necessary and sufficient conditions by which a causal connection between genotype and phenotype can be established.
Although studies of pleiotropic mechanisms took a backseat in genetics during the one-gene-one-protein era, interest in pleiotropy was restored as molecular genetics and genetic sequencing techniques advanced. The original paradigm of pleiotropy became more complex when gene loci were found to have multiple or overlapping reading frames. Specifically, genetic sequences could be read at one or several start/stop codons with a given locus, the consequence of which produced protein variants with modified functions. More recently, it was discovered that single gene proteins could be used for more than one function; or, may have different functions in different tissues (Stearns, 2010). Statistical modeling and analysis by Mitchell et al. (1996) found extensive pleiotropy between insulin levels and other features related to insulin resistance syndrome. Other types of genetic abnormalities, such as the microdeletion del22q11, produce manifold phenotypes. Del22q11, alternately referred to as DiGiorgio syndrome, Shprintzen syndrome, or velocardiofacial syndrome, contains the COMT gene in its deleted region. The COMT gene has been reported to generate various psychiatric behavioral phenotypes including: psychosis, schizophrenia, bipolar disorder, depression, oppositional defiant disorder, ADHD, and autism (Antshel et al., 2007; Karayiorgou et al., 1999; Lazzaretti et al., 2013; Papolos et al., 1996; Radoeva et al., 2014).
1.2 The psychiatric phenotype as a behavioral phenotype
Human beings are complex and dangerous creatures.
-Andre Gregory from “My Dinner with Andre” (1981)
Unlike the color of flowers, human traits, especially those characteristics considered aberrant, are complex behavioral phenotypes. Despite their complexity, Leboyer et al. (1998) stated that DSM criteria are moderately reliable for clinical diagnoses of psychiatric disorders. Stoltenberg & Burmeister (2000) agree with Leboyer, citing the structured algorithms employed by DSM that were tested at length and modified to increase their reliability and validity. However, reliability does not ipso facto confer valid biological or neurobiological markers strictly associated with a specific disorder. In addition to reliability, a disorder's diagnostic validity is a function of how it has been operationally defined and whether the criteria used are logically and factually sound. Serious methodological problems are incurred otherwise. As an example, over the past 75 years, the clinical definition of autism has changed considerably, from a form of childhood schizophrenia produced by “refrigerator parents” to a neurodevelopmental disorder with a convincing genetic origin, and, most recently, to perhaps a developmental neurobiological disorder with a likely genetically heterogeneous basis (Minshew, Scherf, Behrmann, & Humphreys, 2011).
In arguing for a more scientific psychiatric nosology and to establish valid hereditary genotypes with specific psychiatric phenotypes, Kendler (1990; 2006) proposed that corroborating diagnostic criteria must have extremely high sensitivity and specificity. In a related remark, Hyman (2010) noted that the criteria to be found in DSM—and, presumably, ICD—may depend too heavily on inter-rater reliability to make diagnoses of psychiatric dysfunctions and may miss the mark regarding the validity of the criteria used to identify phenotypes associated with psychiatric disorders. If behavioral researchers and clinicians cannot agree as to the necessary and sufficient criteria defining any complex trait—the first criterion suggested by Flint (1998)—then any attempt to determine bona fide genotype-phenotype relationships will not be methodologically sound. Hyman (2010) was aware of this issue, noting that certain elements of operational definitions of psychiatric disorders appear arbitrary. As an example, the criteria used to define autism in DSM-III [1980] included age of onset before 30 months. Later, in DSM-IIIR [1987], age of onset was increased to 36 months. Now in its most recent incarnation, DSM-V defines, not autism, but autism spectrum disorder (ASD), such that “individuals with ASD must show symptoms from early childhood, even if those symptoms are not recognized until later.” [DSM-V; American Psychiatric Association, 2013].
To resolve the difficulties in producing a proper operational definition, Leboyer et al. (1998) suggested two approaches. One: the essential features of any psychiatric disorder be the main focus by which to select study participants so that a homogeneous phenotype will emerge from the sample, yielding a set of “candidate symptoms” by which various molecular genetic techniques can be harnessed to bring about a genotype-phenotype connection. Two, an alternate approach which attempts to identify shared biological or neurobiological markers below the level of the phenotype but along the pathway with the prospective genotype—often referred to as “endophenotypes”—that could prove useful in detecting the gene or genes that generate these markers. Leboyer et al. (1998) cited examples of the function and success of each approach; candidate symptoms shared by monozygotic twins for the first, endophenotypes as subclinical features of epilepsy for the second. Stoltenberg & Burmeister (2000) concurred, noting that the serotonin transporter promoter polymorphism, 5-HTTLPR, had been enlisted as a candidate gene associated with a wide variety of psychiatric disorders, while a behavioral marker such as temperament could be recruited as an endophenotype.
1.3 Endophenotypes as markers for behavioral phenotypes
The concept of the endophenotype was introduced to psychiatry by Gottesman & Shields (1973) at about the time Nyhan (1972) was explicating the behavioral phenotype in LND. Gottesman & Gould (2003) later extended the definition of the endophenotype, citing Johanssen's earlier work with self-fertilizing beans. More recently, Kendler & Neale (2010) emphasized the role of the endophenotype as an intermediate construct to identify genetic etiologies in psychiatric disorders. To identify genetic causes, Gottesman & Gould (2003) proposed four criteria be applied to endophenotypes: (1) the endophenotype is associated with a putatively psychiatric disorder; (2) it is heritable, that is, there is an underlying genetic component; (3) it exists whether or not it is expressed as a behavioral phenotype; and (4) it can be found in both affected and unaffected family members more frequently than in the general population.
To support their argument, Gottesman & Gould (2003) referred to the earlier work by John & Lewis (1966), in which the neologism, “endophenotype,” was introduced. John & Lewis (1966) asserted that variations in genetic polymorphisms have a more direct effect on chromosomal phenotypes compared to their effects on the structural and physiological phenotypes further downstream from the genotype and which may be influenced by other, possibly environmental, factors. John & Lewis (1966) also asserted that changes at the chromosomal phenotypes have no effect on the “exophenotype” for the individual organism in which the chromosomal changes occur, nor on its ability to adapt to the environment. On the other hand, those endophenotypic changes will affect offspring and future generations as a consequence. Gottesman & Gould's (2003) interpretation of the endophenotype differs in that respect. Gottesman & Gould (2003) claimed the endophenotype would provide evidence closer to the genotype from which a potential psychiatric phenotype in the same organism could be identified. They also differentiated the term “endophenotype”—the preferred term to be associated with hereditary etiology—from “biological marker,” which may result from other causes. Later, however, Gottesman & Gould (2003) remarked in their discussion of schizophrenia and endophenotypes “(we) are …hopeful that… current… research on families with schizophrenia will discover an endophenotype either biological or behavioral…” (Cf. Gottesman & Gould, 2003).
In summary, the chromosomal endophenotypes described by John and Lewis (1966) could produce either positive or negative outcomes only in subsequent generations of offspring. Gottesman & Gould's (2003) concept of the psychiatric endophenotype differed in that it may or may not occur concurrently with the disorder in individuals with the genetic abnormality, but could likely produce it in offspring in subsequent generations. Gottesman & Gould (2003) expanded the original definition of the endophenotype to include both current and future generations.
As noted earlier, pleiotropy limits the unambiguous affirmation of a genotype-phenotype association, as does the fact that heterogeneous genetic abnormalities can produce the same phenotype. The logical consequence of both statements challenges the necessary and sufficient conditions by which a causal relation between genotype and phenotype can be ascertained. Neurobiological, neurophysiological, neurodevelopmental, and neurobehavioral processes that may also insinuate themselves along the path between the two endpoints will make attempts at establishing valid genotype-phenotypes less certain. Moreover, as Smoller (2013) and others have observed, genotype-phenotype associations may not be well-established since many psychiatric disorders have overlapping symptoms and the likelihood of misclassification is considerable. For psychiatric disorders, therefore, Gottesman & Gould (2003) argued that the nosology for a syndromic phenotype may not be the best approach for ascertaining the underlying genetic etiology. Their solution was to identify an endophenotype detectable by “biochemical test or microscopic examination” (Gottesman & Gould, 2003), the second approach suggested by Leboyer et al. (1998).
Flint & Munafò (2007) concur with Gottesman and Gould, stating that “(e)ndophenotypes in psychiatry retain the notion of an internal process, but one that can be objectively measured…” (Flint & Munafò, 2007, p. 163). In their meta-analysis of psychiatric disorders and their putative genotypes, Flint & Munafò (2007) state that, by identifying endophenotypes in psychiatric disorders, researchers would have a more parsimonious pathway to the genotype. To that end, they analyzed the results from seven well-known psychiatric disorders in which putative endophenotypes have been identified, five case-control studies of different genes that may be associated with schizophrenia, and 26 samples from 16 studies examining the COMT genotype with two measures of cognitive “endophenotypes” associated with schizophrenia. Results from all three analyses were less than convincing, with effect sizes ranging from 0.1% to 0.5%. Flint & Munafò (2007) concluded that the effect sizes they found for endophenotypes were not much different from those found at the level of the phenotype. Moreover, they argue that the genetic architecture may be as multifaceted as the psychiatric disorder itself, a point made earlier about heterogeneous genotypes.
1.4 Modeling causality in a genotype-phenotype environment
The epistemological paradigm for genotype-phenotype associations is a model of causality. However, given what is known about pleiotropy as well as multiple genetic abnormalities producing the same phenotype, modeling causality is not as simple as the genotype-phenotype prototype proposed by Johannsen. In addition to these difficulties, attempts to circumvent problems associated with phenotype assessment by evaluating an intermediate type or endophenotype have met with limited success. In all these instances, the obstacles in establishing a valid genotype-phenotype associations in the manner originally described by Johannsen can be considered a subcategory of the problem of ascertaining a valid model of causality in an observational setting as opposed to experimental environment.
The earliest methodical attempt at assigning causality to observational studies specifically related to behavioral-psychiatric disorders can probably be attributed to Robert Burton, (1621) Anatomy of Melancholy, first published in 1621. In it, Burton lists a variety of causes to melancholy (depression) including “a heap of accidents,” events inherent in an open system, the observational study. One important difference between observational studies and experiments is that, in a closed system, the putative causal factor is manipulated first, after which its alleged expected outcome is assessed; whereas, in an open system, one typically observes the effect event first, from which one attempts to infer the cause or causes afterward. This is particularly problematic in non-randomized observational studies (Cochran & Chambers, 1965) and was a predicament Johannsen faced more than a century ago.
… if the appearance of the first event is followed with a high probability [my italics] by the appearance of the second, and there is no third event that [can be used] to factor out the probability relationship between the first and second events. (P. 10)
The utility of Suppes’ definition will become more evident as models of causality are considered.
1.5 Path models of causality
A decade after Johannsen's paper was published, Wright (1921) wrote about the problem of causality in the biological sciences in which there exists a “complex of interacting, uncontrollable, and often obscure causes.” (P. 557) According to Wright (1921), there are studies in which it would be known that several specific factors were causes of variation, but there may be other (unknown) influences which may affect the outcome measure as well. That is, there may be multiple “paths” other than from the known or strongly suspected variables which could influence the outcome. As an example, Wright provided a study of guinea pigs and their birth weights in which he constructed a path diagram of known hereditary and other factors which may have produced the outcome.
From a statistician's point of view, the essential difference between a controlled experiment and an observational study with humans is the ability to implement randomization. In observational studies, participants are not always selected at random from a known population by the investigator, nor are they randomly assigned to “treatment” and “control” groups. The inability to randomize will likely produce selection bias. Under such circumstances, researchers typically apply techniques for balancing out and adjusting features, for example, matching participants along salient dimensions between the two group types that may otherwise confound identification and detection of the sought after causal factor(s) (Cochran & Chambers, 1965). Observational studies which have presumed treatment-like conditions, outcome measures, and “experimental units” (participants) over which the investigator is unable to exercise randomization procedures, but from which inferences are to be drawn about a treatment-cause, have been referred to as quasi-experiments (Campbell & Stanley, 1963). Importantly, there are circumstances in quasi-experimental settings in which nature has performed the requisite randomization, for example, natural disasters.
While not a natural disaster, the possibility of randomization in genetics embodied in Mendel's second law has been reintroduced recently in commentaries by genetic epidemiologists and referred to currently as Mendelian randomization. Mendelian randomization has been likened to randomized controlled trials, both in terms of design and analysis, and sources of bias (Nitsch et al., 2006). Its application to an epidemiological study was first described by Katan (1986), when he examined the relationships among APOE isoforms, serum cholesterol, and cancer. The controversy at the time was whether low serum cholesterol was a causal factor in cancer, or if the opposite was the case (reverse causation); or, perhaps diet or some other confounding background factor was involved. Katan's solution was to investigate a genetic determinant of low cholesterol, the APOE gene and its polymorphisms. In effect, Katan (1986) argued that a causal link between low cholesterol—an intermediate phenotype—and cancer could be established by examining the relationship between the APOE gene and serum cholesterol. If the E2 allele, which is associated with low cholesterol levels, was associated cancer, then individuals with E2 alleles would be at greater risk to develop cancer than those with other allele types. Otherwise, the relationship between low cholesterol and cancer could be attributed to reverse causation.
Subsequently, Trompet et al. (2009) surveyed 5804 elderly individuals stratified by allele type and found no significant increase in cancer among those with E2 alleles and those with other allele types, suggesting that any relationship between low cholesterol and cancer was one in which cancer was the cause, not the effect. In his later commentary, Katan (2004) remarked that in order to perform studies utilizing Mendelian randomization properly, two important issues need to be considered: (1) a simple, well-defined phenotype as suggested earlier by Leboyer et al. (1998) and Flint (1998); and (2), a large enough sample that would provide satisfactory power, as described by Flint, Timpson, and Munafò (2014).
Since Katan's landmark study, there have been many applications of Mendelian randomization attempting to establish genotype-phenotype associations in human disorders (Smith & Ebrahim, 2003]. However, for Mendelian randomization to be serviceable as a valid model of causality, Smith & Hemani (2014) state that three additional criteria must be met: (1) the genotype—sometimes referred to as a genetic variant or instrumental variable—must be reliably associated with a known intermediate phenotype; (2) the genotype itself must be associated must be associated with the (outcome) phenotype only through the intermediate phenotype; and (3) that the genotype be independent of possible confounding factors that may influence the intermediate phenotype and phenotype by analyzing a model that includes suspected covariates.
The importance of addressing issues of causality in observational studies and the need to extend Sewall Wright's path analysis into statistical modeling was addressed by Cochran & Chambers (1965) and recapitulated by Cox & Wermuth (2004). Cox & Wermuth (2004) distinguished between real-world properties and statistical models of causality, but sought to construct statistical models that would be useful in the real world. To avoid confusion, Cox & Wermuth (2004) refer to their argument as models of causality rather than causal modeling. Their models of causality are based on Pearl (2000) extensive work with path diagrams and causal analysis, and Lindley (2002) analysis of Pearl's description of causality as it pertains to the broad field of statistics. The issues raised by Cox & Wermuth (2004) and others as they would pertain to genotype-phenotype associations can be illustrated by the following sets of models of causality.
Four basic models of causality associated with pre-GWAS, pre-endophenotype era cytogenetics are presented in Figure 1, in which a genotype (G), phenotype (P), and background variables (B) are the only factors. In some of these models, genetic factors are also contained in background factors. The first model in Figure 1a indicates that the phenotype is contingent on both genetic and background factors, the latter of which may consist of other genetic, or epigenetic, physiological, neurobiological, developmental, or environmental variables. Background factors modeled in this way would confound the genotype-phenotype association. Also possible in this model would be that the genotype could be influenced by background factors. As Cox & Wermuth (2004) observed, it is possible that in models comparable to Figure 1a, not all relevant background variables may be identified and included in B. For these reasons, the extent to which phenotype and genotype may be linked may be estimated incorrectly. In Figures 1b and 1d, the link between genotype and phenotype is missing. In Figure 1b, the phenotype, P, and the genotype, G, are each contingent upon B. In such a model, a correlation between genotype and phenotype could be observed but would be spurious. In Figure 1c, P and B are contingent upon G, but there is no link between P and B. Thus, studies which found an association between P and some factor in B would be spurious. In situations such as this, investigators would, in addition to demonstrating an association between the presumed genetic factor and the phenotype, need to establish that there was no association between some background factor and the phenotype. An example of such a model using Mendelian randomization might be the study cited by Little & Khoury (2003) which examined the mechanism by which the MTHFR C677T polymorphism had been associated with cardiovascular disease (CVD). Using Mendelian randomization, Yang et al. (2012) found that the MTHFR C677T polymorphism (G) was associated with higher homocysteine levels (B), but that homocysteine concentration was not statistically associated with CVD or mortality (P). On the other hand, these researchers found that the odds ratio for the TT genotype (G) was associated with a significantly lower risk for CVD (P).

In Figure 1d, P is contingent upon B and B is contingent upon G, but there is no direct link between G and P. Such a model would satisfy the three criteria proposed by Smith & Hemani (2014) for establishing a valid model of causality. Genes which modify the causal effect of an (intermediate) environmental factor on the phenotype may be considered a case in point. Smith & Ebrahim (2003) cite as an example the association of the MTHFR polymorphism in parents and the risk of neural tube defects (NTD) in offspring. It was found that mothers with the TT genotype have a twofold risk of having a child with NTD than do mothers with the CC polymorphism, but fathers with the TT polymorphism have no statistically significant effect on NTD. From what is known about the effect of maternal folate intake on reducing the risk of NTD, Smith & Ebrahim (2003) surmise that the maternal TT polymorphism influences the intrauterine environment, but is not directly linked to NTD.
Simple genotype-phenotype models of the types shown in Figure 1a-d are relatively rare, as Wright (1921) illustrated, and studies of LND and PKU have shown. Models become increasingly complex when an Intermediate Phenotype (IP) is identified and introduced into the model from among the background variables. Indeed, for most studies in genetic epidemiology in which Mendelian randomization has been employed, the search for a genetic variant begins with an association between an intermediate phenotype and phenotypic outcome of interest. Archetypal models of causality for studies of this type are illustrated in Figure 2a, which attempts to examine all paths among genotype, intermediate phenotype, phenotype, and background variables. However, given the conditions proposed by Smith & Hemani (2014), that the genotype must be independent of possible confounding factors that may influence the intermediate phenotype and phenotype, Figure 2b is the model employed generally.

Mendelian randomization investigations involving models of causality of the type represented in Figure 2b have examined disease phenotypes such as cancer (Bonilla et al., 2013) multiple sclerosis (Devorak et al., 2016; Mokry et al., 2016), cardiovascular disease (Hägg et al., 2015; Nelson et al., ; Yang et al., 2012) as well as behavioral phenotypes involving cognitive function (Almeida, Hankey, Yeap, Golledge, & Flicker, 2014; Zhan et al., 2015; Zhao et al., 2016) and psychiatric disorders: ADHD (Attermann et al., 2012; Nigg, Elmore, Natarajan, Friderici, & Nikolas, 2016) depression (Sallis, Steer, Paternoster, Davey Smith, & Evans, 2014; Taylor et al., 2014; Wium-Andersen, Ørsted, Tolstrup, & Nordestgaard, 2015) and bipolar disorder and schizophrenia (Hartwig et al., 2016; Prins et al., 2016; Wium-Andersen, Ørsted, & Nordestgaard, 2016).
While the model of causality depicted in Figure 2b has been useful in establishing the strength of genotype-phenotype relations, violations in any of the three criteria essential for a valid model of causality will likely introduce bias into the analysis. The types of errors that can occur vary. They may be related to errors stemming from the genotype—linkage disequilibrium or pleiotropy; and/or phenotype—misclassification of bipolar disorder as major depression; or, sampling homogeneous populations, that is, population stratification, in which the genetic variant affects diverse subpopulations differentially; or, misspecification of the statistical model, using linear instead of non-linear relationships; or, canalization or developmental compensation, that is, developmental processes that influence the outcome measure (Lawlor, Harbord, Sterne, & Timpson, 2008; Smith and Ebrahim, 2003, 2004; Smith and Hemani, 2014).
Vander Weele, Tchetgen Tchetgen, Cornelis, and Kraft (2014) noted the shortcomings regarding the genetic marker associated with the intermediate phenotype; if the association is weak, biases with respect to other criteria will be inflated. Violations of the so-called “exclusion criterion” in which the phenotype is affected by the genotype solely through the intermediate phenotype, will produce bias in Mendelian randomization. This can occur in several ways and are depicted in Figure 3 (Models of causality in Figure 3 are adapted from Vander Weele et al. (2014), their Figures 3).

In Figure 3a, the phenotype is contingent not only on the intermediate phenotype and background factors, but on an unknown factor in the path between the genotype and the intermediate phenotype. In Figure 3b, the phenotype is contingent upon two independent intermediate phenotypes, each of which in turn is contingent upon the genotype. In Figure 3c, the phenotype is contingent upon two intermediate phenotypes, and each intermediate phenotype is contingent upon the genotype, but at different points in the sequence. Figure 3d depicts a situation in which the intermediate phenotype is measured incorrectly (IPerror).
Errors affecting the exclusion criterion may be corrected by using sensitivity analysis (Burgess, Bowden, Fall, Ingelsson, & Thompson, 2017). Sensitivity analysis may also be employed when another genetic variant may affect the phenotype through linkage disequilibrium (Vander Weele et al., 2014). As a statistical technique, sensitivity analysis is a means by which various alternatives of the model are tested as they may influence the genotype-phenotype association. When additional factors are inserted into the statistical model but have little if any effect on the outcome, the results of the original model are considered robust (Burgess et al., 2017; Thabane et al., 2013).
While sensitivity analysis is an extremely useful statistical technique for validating a Mendelian randomization genotype-phenotype model, it is not a panacea for establishing valid genotype-phenotype associations. There are many factors that may moderate the efficacy of the model, as noted earlier: weak genotype-phenotype associations produce small effect sizes, accounting for fractions of a percent of the phenotypic variance, as Flint & Munafò (2007) found in studies of schizophrenia. The resulting low statistical power would argue for an unusually large sample size in such studies.
Violations of the third criterion—that the genotype be independent of possible confounding factors which may influence the intermediate phenotype—typically occur when genetic variants produce pleiotropy. Fortunately, pleiotropy can be detected by statistical methods. Burgess et al. (2017) suggest the use of Egger regression to test for directional pleiotropy. Del Greco, Minelli, Sheehan, & Thompson, 2015 offer a somewhat different statistical approach for assessing pleiotropy that utilizes the Cochran Q and I2 index. Cochran's Q is often used to detect heterogeneity resulting from pleiotropy in meta-analytic studies that collect data from multiple sources; and, I2 is defined as the proportion of total variation explained by heterogeneity as opposed to sampling error. Del Greco et al. (2015) provide power analysis tables to detect heterogeneity that results from pleiotropy.
2 SUMMARY AND DISCUSSION
Studies of genotype-phenotype relationships have evolved from the nascent model proposed by Johannsen more than a century ago to the sophisticated modeling techniques and complex statistical procedures based ironically on Mendel's second law. Mendelian randomization and GWAS have, in many important ways, revolutionized the way in which researchers attempt to ascertain genetic etiologies in typically and atypically developing humans. Mendelian randomization has been remarkably successful in establishing both the extent to which genetic variants produce diseases such as obesity and diabetes, as well as identifying instances of reverse causation in studies that involve intermediate phenotypes and disease. Less successful have been studies that investigate complex psychiatric disorders. DSM-V criteria notwithstanding, defining and assessing complex behavior has always been problematic. Diagnostic instruments and algorithms have not produced particularly high sensitivity and specificity estimates, especially as overlapping criteria among many disorders produce both false positive and false negative diagnoses. As important are the lack of, or conflicting findings from, investigations of physiological or neurobiological expression as intermediate phenotypes that were supposed to ameliorate the difficulties inherent in multifaceted phenotypes. Results from studies of ASD, schizophrenia, and other Axis I disorders come to mind. The inability to detect and identify associated intermediate phenotypes will limit the prospects of ascertaining valid genotype-phenotype relationships in psychiatric and behavioral disorders by whatever model of causality is invoked.
Furthermore, high heritability does not necessarily translate into convincing genetic models of causality, as Geschwind (2008) and Szatmari et al. (2007) indicated, and as Vander Weele et al. (2014) also discovered. Despite remarks to the contrary, it may be difficult to ascertain a valid association of a genetic abnormality with a psychiatric disorder, next generation sequencing notwithstanding. For example, Vorstman, Parr, Moreno-De-Luca, Anney, & Hallmayer, 2017 observed that more than 800 genes have been linked to ASD, but the strength of the relationship varies markedly since the appearance of an unusual copy number variant or SNP is neither necessary nor sufficient to establish causation. Penetrance and expressivity of ASD varies markedly as well. Consequently, as Fritzsch, Jahan, Pan, and Elliott (2015) also noted, the ability of neurodevelopmental researchers to detect genetic markers associated with phenotypic disorders has been limited. Zhu, Need, Petrovski, and Goldstein (2014) acknowledged that the multifaceted nature of neurodevelopmental and neurobiological disorders are susceptible to stochastic developmental processes. The complexity of the phenotype likely increases with the age of the organism as a consequence of the extent to which other genes have had time to interact with the genetic cause. Therefore, identifying models of causality based on neurobiological intermediate phenotypes may prove somewhat daunting, as Flint & Munafò (2007) discovered.
Although not considered in this review, research has shown that infrahuman organisms and humans have more in common than previously thought. Over the past 30 years, a raft of mouse models have been created to simulate human genetic disorders in order to identify the mechanisms associated with dysfunction. To that end, infrahuman models may be useful in identifying intermediate types associated with behavioral disorders. Katayama et al. (2014) found that mice with CHD8 mutations affects their social behavior in an ASD-like manner, and that average brain weight in embryonic heterozygous mutant mice was greater than that found in wt controls, not unlike what has been observed in human neonates. Mouse models created with the genetic mutations in the FMR1, TSC1, TSC2, SHANK, UBE3A, or PTEN genes, and which have been associated with ASD in humans, have overlapping signaling pathways in brain and are associated with synaptic dysfunction. Unfortunately, a pathophysiological model embracing those genetic abnormalities associated with ASD has remained elusive (Hulbert & Jiang, 2016) and may continue to do so, owing to the fact that any one of these mutations likely affects the entire interconnected neural network, a variant of the model in Figure 3c.
What is more, genetically altered mice may not always produce the desired deficits analogous to those observed in humans or even other infrahuman species. Mice are not small rats, rodents are not small humans, and one of the most important distinctions between human and nonhuman organisms concerns language (Fisch, 2007). Consequently, another level of complexity needs to be layered into genotype-phenotype models of causality when employing animal models to investigate mechanisms associated with multidimensional behavioral disorders. Criteria for diagnosing psychiatric disorders such as schizophrenia, bipolar disorder, and ASD are based on distorted speech and language plus attendant deviant social behavior. Accordingly, the standard criteria used to establish the validity of animal models of ASD has often been less than convincing (Hulbert & Jiang, 2016). In effect, mouse models can never be completely analogous to human psychopathology. Thus, it remains an open question whether intermediate phenotypes identified in mouse or other infrahuman models will find suitable correspondence with humans.
All of the aforementioned issues related to establishing valid genotype-phenotype associations suggest that researchers carefully consider the models of causality they choose, as well as take into account the biological, physiological, developmental, and neurological mechanisms and systems that interpose themselves between both endpoints.
ACKNOWLEDGMENT
I thank John Opitz for taking the time and interest to engage in our many correspondences connected with this topic. Any errors of fact or interpretation are mine alone.
Biography
G. S. Fisch is a retired Biostatistician and Professor of Research from the NYU College of Dentistry in New York. His research interests include research methodology, the genetics of autism, and the developmental trajectories of cognitive and adaptive behavior in children and adolescents with genetic disorders. He is also an Associate Editor for AJMG; and, as an Adjunct Professor, currently teaches courses in Statistics at CUNY/Baruch College in New York City.