Genomic approaches for studying craniofacial disorders
Correspondence to: Huiqing Zhou, Department of Molecular Developmental Biology, Radboud University Nijmegen, Nijmegen, The Netherlands; Department of Human Genetics, Radboud University Medical Center, Nijmegen Centre for Molecular Life Sciences, Geert Grooteplein 26/28, 6525GA Nijmegen, The Netherlands.
E-mail: [email protected], [email protected]
Co-correspondence to: Carine E.L. Carels, Department of Orthodontics and Craniofacial Biology, College of Dentistry, Radboud University Medical Centre, Philips van Leydenlaan 25, 6525 EX Nijmegen, The Netherlands.
E-mail: [email protected]
Abstract
Fast developing technologies in genomics have driven genetic studies of human diseases from classical candidate approaches toward hypothesis-free and genome-wide screening methods. Compared to the low-resolution cytogenetic techniques that were the only available methods to visualize genomic changes at the chromosomal level until some 15 years ago, genome-wide studies including analyses of copy number variation (CNV), genome-wide association and linkage studies, and exome sequencing (ES) provide more accurate information for unraveling the genetic causes of diseases. Moreover, genome sequencing (GS) which interrogates the genome of a single individual at the nucleotide resolution has also been applied in genetic studies. Here we review genomic approaches in craniofacial disorders, with the emphasis on orofacial clefts, and discuss the applications, advantages, limitations, challenges, and future perspectives. © 2013 Wiley Periodicals, Inc.
INTRODUCTION
Craniofacial developmental disorders are congenital structural defects of the face and/or cranium, owing to disruptions in various developmental processes. Orofacial clefts (OFCs) are among the most common craniofacial birth defects and they may present in isolated form or as part of a syndrome [Jugessur and Murray, 2005; Hennekam et al., 2010]. OFCs are characterized by incomplete formation or fusion of structures separating the nasal and oro-pharyngeal cavities and can affect the upper lip, the hard and/or soft palate, as well as other parts of the face. OFCs encompass a broad clinical spectrum and may be uni- or bilateral with complete or incomplete clefts of either lip or/and (hard or soft) palate. Previous genetic and embryologic studies suggest that cleft lip with or without cleft palate (CL/P) and cleft palate only (CPO) result from distinct etiological mechanisms [Fraser, 1955; Sivertsen et al., 2008; Rahimov et al., 2012]. Recent studies also suggest that cleft lip only (CLO) may have a separate etiology than cleft lip with cleft palate (CLP) [Harville et al., 2005; Jugessur et al., 2011; Rahimov et al., 2012].
Complex inheritance patterns are associated with OFC-related disorders. While a variety of syndromic as well as non-syndromic (NS) familial forms of OFCs with presumed Mendelian inheritance have been described, the majority of isolated NS OFCs are considered to represent complex multi-factorial diseases, with implication of gene–gene interactions, gene–environmental interactions, environmental, and mechanical factors [Wyszynski et al., 1997].
Complex inheritance patterns are associated with OFC-related disorders. While a variety of syndromic as well as NS familial forms of OFCs with presumed Mendelian inheritance have been described, the majority of isolated NS OFCs are considered to represent complex multi-factorial diseases, with implication of gene–gene interactions, gene–environmental interactions, environmental, and mechanical factors.
Collectively, NS OFCs affect approximately 1/700 live births worldwide [Dixon et al., 2011]. Evidence for a genetic role in the complex etiology of the NS OFCs has come from the observation that the risk of CL/P and CPO is respectively 32- and 56-fold higher in siblings than in the normal population [Sivertsen et al., 2008]. In addition, the concordance rate in monozygotic twins is approximately 25–45% as compared to 3–6% for dizygotic twins [Mitchell and Risch, 1992].
The presence of both Mendelian and complex forms of OFCs in humans requires tailored analysis approaches to identify specific etiologies. Given that OFCs are among the most common craniofacial disorders [Hennekam et al., 2010], we focus on them in this review. Cytogenetic studies were among the first approaches to study the whole genome. Such studies have identified a broad range of structural and numerical chromosome aberrations involving a large number of genes which cause multiple congenital malformations including OFCs. With the recent availability of novel genomic approaches such as exome and genome sequencing (GS), it is now possible to identify single-gene disruptions in individuals with OFCs. In this review we aim to discuss genome-wide approaches used in genetic studies of OFC including copy number variations (CNVs), genome-wide linkage studies, genome-wide association studies (GWAS), exome sequencing (ES), and the identification of genetic variants in non-coding regulatory elements. The applicability of these approaches and future perspectives of genomic studies in OFCs including GS are also discussed.
DETECTION OF CHROMOSOMAL ABERRATIONS
Many deletions, insertions, inversions, and translocations associated with syndromic CL/P have been identified by traditional cytogenetic techniques such as G-Banding and fluorescent in situ hybridization (FISH) and by comparative genomic hybridization (CGH), array comparative genomic hybridization (aCGH), and single nucleotide polymorphism (SNP) arrays. Although there is a clear progression of the technology, these techniques are currently often used in combination to provide confirmative information. In this review, we summarize key findings in OFC-related chromosomal aberrations (Table I) and discuss the advantages and limitations of each technique.
Locus | Chromosomal aberration | Disease/syndrome | CNV length | Techniques | Number of genes | Best candidate genes | Refs. |
---|---|---|---|---|---|---|---|
2q32–q33 | Translocation | CPO | — | FISH | — | SATB2 | FitzPatrick et al. [2003] |
6q25.1–q25.2 | Deletion | CL/P | 3.2 Mb | aCGH | 11 | ESR1 | Osoegawa et al. [2008] |
10q26.11–q26.13 | Deletion | CL/P | 2.2 Mb | aCGH | 9 | FGFR2 | Osoegawa et al. [2008] |
5p15.33 | Duplication | Syndromic CLP | 300 kb | aCGH | 5 | CLPTM1L | Izzo et al. [2013] |
1p32.2–p32.3 | Deletion | Syndromic CLP | 5.4 Mb | aCGH | 55 | LRP8, DAB1 | Mulatinho et al. [2008] |
8p23.1 | Duplication | 8p23.1 duplication syndrome | 3.68 Mb | aCGH | 26 | — | Barber et al. [2013] |
3q29 | Deletion | Syndromic CLP | 1.5 Mb | Human SNP Array | 22 | — | Petrin et al. [2011] |
G-banding of metaphase chromosomes has been used for more than 50 years and has detected a large number of translocations and other structural chromosomal anomalies associated with isolated and familial forms of CL/P [Lindsten et al., 1965; Walzer et al., 1966]. However, the resolution of G-banding chromosome analysis is low and re-arrangements of genomic segments smaller than 5–10 Mb are usually not detected [Bejjani and Shaffer, 2008; Ledbetter, 2008; Takenouchi et al., 2012]. FISH uses fluorescent probes that can bind to specific regions of the chromosome by sequence complementarity and has better resolution than G-banding chromosome analysis. It can detect genomic re-arrangements as small as 35 kb [Bejjani and Shaffer, 2008]. The utility of these techniques was demonstrated by the identification of SATB2 as a cleft palate gene on 2q32–q33 by high-resolution FISH mapping of a translocation breakpoint [FitzPatrick et al., 2003]. The limitation of FISH is that it can only target pre-selected small regions of interest.
In CGH, differentially labeled test (affected individual) and reference (normal individual) genomic DNAs are co-hybridized to normal metaphase chromosomes and fluorescence ratios along the length of chromosomes provide a cytogenetic representation of the relative DNA copy-number variation. This technique visualizes chromosomal abnormalities with a resolution of 5–10 Mb. A method that combines the strength of genome-wide analysis of CGH with the high resolution offered by FISH is aCGH. In the early versions of aCGH, genomic fragments such as yeast or bacterial artificial chromosomes (YACs and BACs, respectively), P1 phagemids, cosmids, or cDNA clones were used for hybridization instead of metaphase chromosomes in conventional CGH technique. An early aCGH study was performed on syndromic and NS CL/P cases [Osoegawa et al., 2008]. Out of 63 syndromic cases, a deletion of 2.7 Mb at chromosome 22q11.21 was identified in one patient. In 104 NS cases, two deletions, a 3.2 Mb deletion at chromosome 6q25.1–q25.2 in one patient and a 2.2 Mb deletion at 10q26.11–q26.13 in another patient, were identified. These deleted genomic regions included eleven and nine genes, respectively. Further prioritization of the genes in these regions was based on the hypothesis that the novel candidate genes play a similar role or share biological processes with the genes known to be associated with CL/P. Estrogen receptor 1 (ESR1) and fibroblast growth factor receptor 2 (FGFR2) were ranked as most likely causative genes at 6q25.1–q25.2 and 10q26.11–26.13 loci, respectively.
The clone-based aCGH has been further refined by oligonucleotide microarrays. A SNP array is a type of DNA microarray which is used for linkage and association studies as well as for the detection of CNVs. These arrays offer a good genome-wide coverage with millions of SNPs and CNV probes. With these arrays, microdeletions and microduplications of a length as small as a few hundred base pairs have been detected in studies of syndromic forms of CL/P [Mulatinho et al., 2008; Petrin et al., 2011; Barber et al., 2013; Izzo et al., 2013]. One example of using oligoarrays is shown by Izzo et al. [2013]. In this study, a patient with syndromic CLP and developmental delay was initially analyzed with G-banding chromosome analysis and was shown to have normal karyotype. With oligoarray analysis, an approximately 300 kb microduplication was identified at 5p15.33 encompassing only five protein-coding genes, with CLPTM1L hypothesized to be the best candidate gene based on previous cytogenetic and linkage studies [Wyszynski et al., 1996; Yoshiura et al., 1998].
With the advances of these array methods, previously unidentifiable mutations can now be identified as causes for many congenital anomalies. CNVs of the size of a few kb or even a single exon may be detected and therefore these microarray analyses are currently used as the first investigation step in many laboratories
With the advances of these array methods, previously unidentifiable mutations can now be identified as causes for many congenital anomalies. CNVs of the size of a few kb or even a single exon may be detected and therefore these microarray analyses are currently used as the first investigation step in many laboratories.
[Miller et al., 2010]. The identified CNVs can further be examined by searching CNV databases such as Database of Genomic Variants [Iafrate et al., 2004] to exclude common CNVs and to identify rare ones. However, there are limitations to these methods. Balanced chromosomal rearrangements such as translocations and inversions cannot be identified using aCGH or SNP arrays, as these methods only detect gains and losses of chromosomal material. Recently, CNV studies using Next Generation DNA Sequencing (NGS) approaches (CNV-seq) have been reported [Xie and Tammi, 2009]. Although the CNV-seq method needs further validation, balanced and unbalanced re-arrangements may be detected by NGS genomic sequencing methods, which may provide new possibilities for developing diagnostic tools.
GENOME-WIDE LINKAGE STUDIES
Genome-wide linkage analysis has been used for more than 20 years in classical genetic studies. The aim of linkage studies is to map a disease locus to a specific genomic region by identifying genetic markers that co-segregate with the disorder in a family. It is a parametric method and only useful for disorders that are hypothesized to be Mendelian traits. Large families with multiple affected individuals are most suitable for linkage studies. Highly polymorphic DNA markers are ideally used in genome-wide linkage studies as it maximizes the ability to detect recombination events. Such analyses enable narrowing down of regions carrying co-segregating variants and thus ultimately aiding the gene or gene variant to be identified by Sanger sequencing. Such positional cloning approaches have been used successfully to identify CL/P genes, such as IRF6, PVRL1/nectin-1, MSX1, TBX22, and TP63, for syndromic forms of CL/P [Dixon et al., 2011] (Table II).
Gene | Syndrome | Refs. |
---|---|---|
MSX1 | Tooth agenesis and various combinations of CPO and CLP | Van den Boogaard et al. [2000] |
PVRL1 | CL/P-ectodermal dysplasia (ED) syndrome (CLPED1) | Suzuki et al. [2000] |
TBX22 | X-Linked cleft palate and ankyloglossia | Braybrook et al. [2001] |
IRF6 | Van der Woude syndrome and Popliteal Pterygium syndrome | Kondo et al. [2002] |
TP63 | Limb Mammary syndrome/Ectrodactyly-Ectodermal dysplasia-Cleft syndrome | Celli et al. [1999] |
In many of these studies, large families with clinically recognizable syndromes involving OFC phenotype were used. For example, the identification of mutations in IRF6 as being responsible for the CL/P disorders of Van der Woude syndrome (VWS) and popliteal pterygium syndrome (PPS) was assisted by a linkage analysis in large families [Lees et al., 1999] followed by sequencing of the IRF6 gene [Kondo et al., 2002]. PVRL1/nectin-1 was identified as the causative gene for a specific form of CL/P-associated with ectodermal dysplasia that is indigenous in the population of Margarita Island by homozygosity mapping followed by sequencing [Suzuki et al., 2000]. Another example is the identification of mutations in the TP63 gene that encodes a transcription factor for Ectrodactyly-Ectodermal dysplasia-Cleft syndrome (EEC) and related disorders [Celli et al., 1999; Van Bokhoven et al., 2001; Rinne et al., 2007]. EEC syndrome is an autosomal dominant disorder, characterized by ectrodactyly, ectodermal dysplasia, and OFC. Genome-wide linkage analysis was initially carried out in a large family with a syndrome resembling EEC, limb-mammary syndrome (LMS), and later followed by locus-specific linkage analysis in five small families with EEC syndrome [Van Bokhoven et al., 1999]. Sequencing of genes in the identified 3q27 locus showed that mutations causing EEC syndromes are localized in the region of the TP63 gene encoding the DNA binding domain and therefore these mutations may disrupt p63 DNA binding and transactivation of p63 target genes (Fig. 1A,B). In addition to EEC and LMS, two other syndromes exhibiting CL/P as a feature are associated with mutations in the TP63 gene, Hay–Wells syndrome and Rapp–Hodgkin syndrome [McGrath et al., 2001; Rinne et al., 2007]. Later TP63 mutations were also reported for NS CL/P (NSCL/P) [Leoyklang et al., 2006].

CANDIDATE GENE AND GENOME-WIDE ASSOCIATION STUDIES
Association studies have been extensively applied to non-Mendelian forms of OFCs. Association studies compare the frequency of single-locus alleles, genotypes of a particular variant or haplotypes between subjects and healthy controls. In these studies, genes that are predicted to be involved with the disease on the basis of biological hypotheses are selected. For example, common IRF6 variants at the IRF6 gene that is known to be involved in VWS [Zucchero et al., 2004] and PPS [Kondo et al., 2002] have also been associated with isolated CL/P in several studies [Jugessur et al., 2008; Birnbaum et al., 2009a].
For example, common IRF6 variants at the IRF6 gene that is known to be involved in VWS and PPS have also been associated with isolated CL/P in several studies.
Association studies using the candidate gene approach have confirmed a number of genes such as MSX1 and the family of TGF genes in NSCL/P [Dixon et al., 2011].
As opposed to the candidate-gene approach, GWAS investigate association of variants distributed across the entire genome with disease phenotypes. This is commonly done using SNP microarrays. The haplotypes which are found more frequently in patients than in controls are considered to be “associated” with the disease. Such loci are called “risk loci” and they influence the risk of disease. By calculating which variants co-occur with the disease symptoms, a statistical estimate for increased risk associated with each SNP is made. So far, GWAS have identified many genomic regions significantly associated with NSCL/P in different populations (Table III).
Initial sample | Replication sample | Platform used | Region | Strongest SNP-risk allele | Refs. |
---|---|---|---|---|---|
224 cases, 383 controls (CEU) | 462 cases, 954 controls (CEU) | Illumina HumanHap550V3 BeadChip | 8q24.21 | rs987525 | Birnbaum et al. [2009b] |
1q32.2 | rs642961 | ||||
111 cases, 5951 controls (CEU) | NRa | Illumina Infinium II HumanHap550 BeadChip | 8q24 | rs987525 | Grant et al. [2009] |
401 cases, 1323 controls (CEU) | 793 trios (CEU) | Illumina Human610-Quad and HumanHap 550k BeadChips | 8q24 | rs987525 | Mangold et al. [2010] |
10q25 | rs7078160 | ||||
17q22 | rs227731 | ||||
1,908 case–parent trios (825 CEU, 1038 Asian) | 8115 individuals (1965 families) (CEU, Asian, South Asian, South/Central America) | Illumina Human610-Quad v.1_B BeadChip | 8q24.21 | rs987525 | Beaty et al. [2010] |
1q32.2 | rs10863790 | ||||
20q12 | rs13041247 | ||||
1p22 | rs560426 | ||||
1,461 trios [Beaty et al., 2010] and 399 cases, 1,318 controls [Mangold et al., 2010] | Meta-analysis of Beaty et al. [2010] and Mangold et al. [2010] | 1p22 | rs560426 | Ludwig et al. [2012] | |
1q32.2 | rs861020 | ||||
8q24 | rs987525 | ||||
10q25 | rs7078160 | ||||
17q22 | rs227731 | ||||
20q12 | rs13041247 | ||||
1p36 | rs742071 | ||||
2p21 | rs7590268 | ||||
3p11.1 | rs7632427 | ||||
8q21.3 | rs12543318 | ||||
13q31.1 | rs8001641 | ||||
15q22.2 | rs1873147 |
- a NR, not recorded.
The first GWAS for NSCL/P involved 224 CL/P affected individuals and 383 controls of Central European origin and identified loci on 8q24.21 and 1q32.2 with strong evidence of association [Birnbaum et al., 2009b]. These findings were confirmed in larger patient–control sample groups in the same study. As the 1q32.2 locus contains the IRF6 gene, these GWAS supported the notion that IRF6 variants contribute to multifactorial NSCL/P as well as to Mendelian forms of OFCs. In contrast, the locus on 8q24.21 is a gene desert, and therefore the causative element within this region is still to be identified. An extension of this study with increased sample size and added replication populations (401 affected individuals and 1,323 controls) confirmed both the associated regions [Mangold et al., 2010]. This study also identified two additional loci on 17q22 near NOG and 10q25.3 near VAX1. Another group also found strong evidence for involvement of the 8q24 locus with 111 NSCL/P patients and 5,951 controls of European descent [Grant et al., 2009]. However, this study lacked a replication sample. In addition, another confirmation of 8q24 region was provided by a nuclear-trio-based GWAS of Europeans and Asians [Beaty et al., 2010] which included 1,908 case–parent trios of various ancestries with 825 trios of European ancestry and 1,038 of Asian ancestry. In the replication sample, 8,115 individuals from 1,965 families from different populations were included. Two new susceptibility loci, 1p22 and 20q12, were identified with genome-wide significance, and their signals were found to be stronger in Asians than in Europeans. This study was replicated with an independent sample of trios and all previous loci were confirmed [Beaty et al., 2013].
Recently, the first genome-wide meta-analysis study of NSCL/P was published. It included 666 European trios (including European Americans) and 795 Asian trios from Beaty et al. [2010] and 399 patients and 1,318 controls of Europeans origin from Mangold et al. [2010]. This combined sample represented approximately 95% of all previously reported individuals with NSCL/P [Ludwig et al., 2012]. As a result, this meta-analysis supported association with all previously identified loci (8q24.21, 1q32.2, 17q22, 10q25.3, 1p22, and 20q12) and identified six additional susceptibility regions (1p36, 2p21, 3p11.1, 8q21.3, 13q31.1, and 15q22).
As a result, this meta-analysis supported association with all previously identified loci (8q24.21, 1q32.2, 17q22, 10q25.3, 1p22, and 20q12) and identified six additional susceptibility regions (1p36, 2p21, 3p11.1, 8q21.3, 13q31.1, and 15q22).
A further study has also recently confirmed previously identified loci on chromosome 1p36 and 10q25.3 [Butali et al., 2013]. Therefore, meta analysis in GWAS may boost the power to detect more associations by providing bigger patient and control groups, and enables an examination of the consistency or heterogeneity of previously identified associations across diverse datasets and study populations [Zeggini and Ioannidis, 2009]. However, in order to identify the true causal variants, extensive fine-mapping, targeted re-sequencing, and finally functional analyses are required [McCarthy et al., 2008].
Compared to candidate–gene association studies which usually involve a few hundred SNPs at most and rely on assumption of the genomic location or the identity of the potential causal variants, the GWAS approach is genome-wide and hypothesis-free. However, many issues related to the GWAS approach still need to be addressed. GWAS rely on the “common disease, common variant” hypothesis [Reich and Lander, 2001]. SNPs used in GWAS are often common SNPs, and therefore they might not have a major effect on the phenotype. Larger sample sizes are crucial in order to detect those risk variants that have smaller odds ratio and lower allele frequencies. However, large sample sizes may include mixtures of ethnic groups and phenotypes. As different ethnic subgroups may differ in disease prevalence, a mixture of the ethnic groups may lead to detection of a false-positive association related to ethnicity rather than phenotype. In addition, most studies have combined both non-syndromic cleft lip (NSCL) and NSCL/P patients as one group, although these two conditions may have different etiological mechanisms [Marazita, 2012; Rahimov et al., 2012]. Therefore, the collection of large consistently and accurately phenotyped cohorts is the main challenge in GWAS. Another important issue with GWAS is to understand the disease mechanism underlying statistically significant GWAS variants. For example, the SNP rs987525 in 8q24 locus has been found to be significantly associated with NSCL/P in various studies, but until now it is not understood how exactly this SNP, or other variants co-segregating with it, contributes to higher susceptibility to CL/P. As mentioned above, the SNP rs987525 lies within the 640-kb locus of association on 8q24.21 which is a gene desert. No cis- or trans-regulatory effects have been identified for rs987525 or for other SNPs in the same haplotype block. Thus, functional studies are necessary to determine the role of the identified SNP in the biological mechanisms and pathways of the disease.
EXOME SEQUENCING
Since 2005 when the first NGS platform was released [Margulies et al., 2005], many NGS platforms have become available and applied more and more commonly as gene discovery strategies [Bamshad et al., 2011; Veltman and Brunner, 2012]. NGS reduces the cost of DNA sequencing by four orders of magnitude relative to Sanger sequencing [Metzker, 2010]. Although the protein-coding regions cover less than 2% of the genome, they are estimated to contain the majority of the disease-causing mutations in Mendelian disorders. Almost 85% of mutations have been found in coding regions at Mendelian loci [Cooper et al., 1995], although this may be somewhat biased as only positive results are typically reported. Exome Sequencing (ES) is an efficient NGS method to selectively sequence the coding regions (exons) of the genome. Therefore, ES is a powerful and cost-effective tool for understanding the genetic basis of diseases that are proven to be difficult to resolve by using traditional gene-discovery methods, especially those for which the genetic basis is not evident from the phenotype [Choi et al., 2009].
Therefore, ES is a powerful and cost-effective tool for understanding the genetic basis of diseases that are proven to be difficult to resolve by using traditional gene-discovery methods, especially those for which the genetic basis is not evident from the phenotype.
ES is also sufficiently robust to allow the detection of variants in diseases that have variable clinical features (phenotypic heterogeneity) or disorders with causal variants in different genes (genetic heterogeneity) [Ku et al., 2011]. The successful examples include many non-Mendelian diseases such as cancers [Gartner et al., 2013; Makishima et al., 2013; Sato et al., 2013; Stephens et al., 2013] and neurodevelopmental diseases including autism and intellectual disability [Veltman and Brunner, 2012]. ES analysis sometimes starts with Sanger sequencing to pre-screen predicted candidate genes based on the phenotype. If mutations are not identified in candidate genes, ES is performed usually in two or more individuals depending on the pedigree or patient cohort. Many filters and prioritization criteria can be applied to ES data to enrich for causal variants. Filtering and prioritization of variants include the sequencing quality (the number of independent reads and read depth), the inheritance model (percentage variant reads), whether the variants are localized in the coding regions and whether they are synonymous, known or common SNPs, etc. [Gilissen et al., 2011]. These filtering and prioritization steps can also be combined with mapping data from SNP arrays or linkage studies. The presence of candidate variants and their appropriate segregation is confirmed by Sanger sequencing. Further evidence of causality can be obtained by identifying mutations in the same gene in other unrelated but similarly affected patients and with the addition of functional analysis. So far, no genes have been identified by ES for NS OFCs, whereas a number of genes, including many for syndromes with OFC, have been identified by ES [De Ligt et al., 2012; Koolen et al., 2012; O'Roak et al., 2012; Rauch et al., 2012; Roscioli et al., 2012]. We have summarized some of these studies where syndromic OFC patients were included in the studies (Table IV) and discuss one example where CL/P is one of the main features of the condition.
Disease/Trait | Inheritance mode | OFC | Enrichment | Sequencing | Genes | Refs. |
---|---|---|---|---|---|---|
Bartsocas–Papas syndrome | Autosomal recessive | CL/P | Agilent SureSelect Human All Exon 50 MB Kit | SOLiD 4 sequencer (Life Technologies) | RIPK4 | Mitchell et al. [2012] |
Say–Barber–Biesecker–Young–Simpson syndrome (SBBYSS or Ohdo syndrome) | Autosomal dominant | CPO | Agilent SureSelect Human All Exon 50 MB Kit | SOLiD 4 sequencer (Life Technologies) | KAT6B | Clayton-Smith et al. [2011] |
Nager syndrome | Autosomal dominant/recessive | CPO | Agilent SureSelect Human All Exon 50 MB Kit | Illumina HiSeq2000 sequencer | SF3B4 | Bernier et al. [2012] |
CPO | Roche NimbleGen Human SeqCap EZ v2.0 Kit | Illumina HiSeq2000 sequencer | SF3B4 | Czeschik et al. [2013] | ||
Kabuki syndrome | Autosomal dominant | CPO | Custom array (Agilent) covering 31,922,798 bases | Illumina Genome Analyzer II (GAII) | MLL2 | Ng et al. [2010] |
Auriculocondylar syndrome (ACS) | Autosomal dominant | CPO | Roche NimbleGen SeqCap EZ v1.0 Kit | Illumina GAIIx and HiSeq 2000 sequencer | PLCB4, GNAI3 | Rieder et al. [2012] |
Mandibulofacial dysostosis with microcephaly (MFDM) | Autosomal dominant | CPO | Agilent SureSelect Human All Exon 50 MB Kit | Illumina Hiseq | EFTUD2 | Lines et al. [2012] |
Oto-facial syndrome | Autosomal recessive | CPO | Roche NimbleGen Human SeqCap EZ v3.0 Kit | Illumina HiSeq2000 sequencer | EFTUD2 | Voigt et al. [2013] |
Orofaciodigital syndromes (OFD IV, Mohr–Majewski syndrome) | Autosomal recessive | CPO | A 5.3 Mb customized Agilent SureSelect Target Enrichment library | SOLiD 4 sequencer (Life Technologies) | TCTN3 | Thomas et al. [2012] |
Sensenbrenner syndrome | Autosomal recessive | CPO | Roche NimbleGen SeqCap EZ Human Exome Library v2.0 | Illumina HiSeq | WDR35 | Bacino et al. [2012] |
Hartsfield syndrome | Autosomal dominant/recessive | CL/P | TruSeq Capture kit (Illumina) | Illumina HiSeq2000 sequencer | FGFR1 | Simonis et al. [2013] |
- a As CPO is often one of the features in many syndromes, this list of CPO syndromes might not be complete.
Bartsocas–Papas syndrome is an autosomal recessive disorder characterised by popliteal pterygia, ankyloblepharon, filiform bands between the mandible and maxilla, cleft lip and palate, and syndactyly [Veenstra-Knol et al., 2003]. ES was performed in a consanguineous family where one of the children displayed the features of Bartsocas–Papas syndrome [Mitchell et al., 2012]. An analysis was performed using an autosomal recessive disease model and a homozygous nonsense mutation was identified in RIPK4, a gene that encodes an ankyrin repeat-containing kinase which is essential for keratinocyte differentiation. Subsequently, additional mutated alleles were identified in RIPK4 in an unrelated affected individual. Functional analyses demonstrated that RIPK4 is a direct transcriptional target of the transcription factor p63 which is also involved in syndromic OFCs [Celli et al., 1999; Van Bokhoven et al., 2001; Rinne et al., 2007]. A limited number of ES studies for a condition which includes CL/P have been reported to date. There are however many successful examples of ES studies in patients with syndromes which include a variable CPO phenotype (Table IV). The inheritance of these syndromes can be autosomal dominant (e.g., Auriculocondylar syndrome, Kabuki syndrome) or autosomal recessive (e.g., Miller syndrome). It is important to note that the identified variants in some of the syndromes such as Auriculocondylar syndrome could exhibit incomplete penetrance.
As the costs for ES rapidly decrease, and data analysis strategies improve, ES is becoming the methodology of choice in many research and diagnostics laboratories to identify the genetic basis for Mendelian disorders. However, many challenges remain for the ES approach. Not all genes can be sequenced due to technical issues such as the applied exome enrichment methods, repetitive sequences and GC-rich regions [Ku et al., 2012]. Not all conditions can easily be resolved by ES. The success of ES analysis methods is also dependent on the inheritance model, the pedigree or patient cohorts, the accuracy and reliability of the phenotypic data and the availability of additional information such as mapping or linkage analyses. Published ES data confirm that it is a successful approach in families with autosomal recessive disorders, whereas for dominant disorders, more affected individuals and even other mapping information seem to be necessary [Gilissen et al., 2011]. Furthermore, lack of penetrance is an additional factor that may complicate the analysis of families with autosomal dominant OFCs [Dixon et al., 2011]. This may raise an intriguing question of the applicability of ES trio studies in orofacial disorders. ES of parent–child trios has been demonstrated as a powerful approach to identify de novo mutations in various diseases, which may be causative in a dominant model of inheritance [Bamshad et al., 2011; Xu et al., 2012; Chesi et al., 2013; Zaidi et al., 2013]. However, to our knowledge, this approach has not yet been applied to selected cohorts of OFC patients, possibly due to the often incomplete penetrance and heterogeneity in presentation of OFCs [Dixon et al., 2011]. As the genetic basis of isolated OFC is more complex and may be significantly influenced by non-genetic/environmental factors, ES is currently not applied to large numbers of isolated OFC patients as GWAS may be the most appropriate methodology to identify causative variants. However, ES might be a powerful approach in selected OFC families with multiple affected individuals in a Mendelian inheritance pattern to identify novel causative genes.
IDENTIFICATION OF GENETIC VARIANTS IN THE NON-CODING GENOME
Genetic studies of human diseases have so far been mainly focused on protein-coding genes that represent 2% of the human genome. The functional relevance of the non-coding genome has not been explored in detail in most diseases. However, non-coding regions in the genome clearly play important roles in human diseases. For example, Pierre Robin sequence which manifests with a U-shaped cleft palate and mandibular micrognathia, has been reported to be secondary to disruption of non-coding regulatory elements for SOX9 in some cases [Benko et al., 2009].
For example, Pierre Robin sequence which manifests with a U-shaped cleft palate with mandibular micrognathia, has been reported to be secondary to disruption of non-coding regulatory elements for SOX9 in some cases.
In GWAS, the vast majority of the disease- or trait-associated SNPs are found outside protein-coding regions (>80%) [Manolio et al., 2009]. It has become clear that disturbance of gene regulation during development of the embryo can give rise to developmental disorders such as OFC. Orchestrated control of gene expression is often regulated by elements in the non-coding regions of the genome, the arrangement of the chromosomes and nuclear domains. These aspects have recently been characterized by the large scale (epi-)genomics projects such as ENCODE [Dunham et al., 2012] and NIH Roadmap Epigenomics [Bernstein et al., 2010].
One of the functional non-coding elements is non-coding RNAs (ncRNA). Different types of ncRNAs, including long non-coding RNAs (lncRNA) and microRNAs (miRNA) have been shown to regulate gene expression and potentially linked to human diseases [Calin et al., 2005; Hirota et al., 2008; Sahoo et al., 2008; Mencia et al., 2009; Batista and Chang, 2013]. Relevant to OFC, most miRNA studies have been performed in model systems showing that several miRNAs play a role in palatogenesis and gene regulation in orofacial tissues [Eberhart et al., 2008; Mukhopadhyay et al., 2010; Sheehy et al., 2010; Nie et al., 2011]. In one study of NS CPO in the Chinese population, a SNP in the precursor of miRNA-140 that is involved in dispersion and migration of neural crest cells during palatogenesis was shown to lead to increased risk for NS cleft palate in the Chinese population [Li et al., 2010]. Genomic studies in OFCs should therefore also include the analysis of miRNAs.
In addition to ncRNAs, non-coding regulatory elements, sometimes acting over long distances, have been also associated with developmental diseases including orofacial abnormalities [Benko et al., 2009; Anderson et al., 2012; Spielmann et al., 2012]. For many years, the identification of these non-coding functional elements was mainly based on the evolutionary (ultra-) conservation of these sequences, with one excellent example in OFC research. In a study by Rahimov and colleagues, the protein-coding and splice-site regions of IRF6 were sequenced in 160 individuals with NSCL/P, and no obvious causative variants were detected [Rahimov et al., 2008]. Therefore, the authors searched for possible regulatory elements within an approximately 140 kb linkage disequilibrium (LD) block around IRF6 where an NSCL/P-associated SNP rs2235371 lies [Birnbaum et al., 2009b]. In this genomic region, they identified 41 multispecies conserved sequences (MCSs). A SNP rs642961 identified in one of the MCSs (MCS-9.7) showed significant differences in allelic frequencies and genotypic distribution between patients with CLO and controls and was in strong LD with the SNP rs2235371. To understand the underlying mechanism of MCS-9.7 and the SNP, transcription factor binding analyses such as Chromatin Immunoprecipitation (ChIP) were performed and data showed that this risk allele alters DNA binding of AP-2α, a transcription factor involved in craniofacial development. In a mouse transgenic experiment, gene expression controlled by the MCS-9.7 region was consistent with the expression pattern of the endogenous Irf6, which suggested that the MCS-9.7 region functions as an enhancer element for IRF6 expression. This study thus showed proof-of-principle that a non-coding regulatory element could play a role in pathogenesis of OFCs.
Although evolutionary conservation has been successful in some examples for the identification of causative variants in non-coding genomic regions, it is evident that non-coding regulatory elements are not always conserved [Blow et al., 2010; Schmidt et al., 2010], and that factors controlling regulatory elements may not easily be deduced from genomic sequences. The fast developing genomic technology has provided opportunities to study gene regulation with an unprecedented speed. For example, in a ChIP-seq study to map the genome-wide p63 DNA binding profile, a p63 binding site was found to co-localize with the regulatory region MCS9.7 that functions as an enhancer element of IRF6 [Thomason et al., 2010], suggesting that p63 and AP-2α co-regulate IRF6 expression. Disturbance of the fine-tuned expression of IRF6 and other OFC genes by sequence variants in regulatory elements could lead to OFCs (Fig. 1C). Therefore, a systematic study of non-coding regulatory elements such as p63 and AP-2α binding sites and regulated gene expression of OFC genes will be of great interest for understanding orofacial development and related pathogenesis. Such studies will assist in resolving OFCs with currently unknown genetic bases, and provide mechanistic insights for genomic loci that show statistically significant association in CL/P GWAS (Fig. 1D).
GENOME SEQUENCING AND FUTURE PERSPECTIVES
Genome Sequencing (GS) has been used in several studies to identify genes for human diseases [Morin et al., 2013; Shi et al., 2013; Zhang et al., 2013]. Studies also showed that GS data can be used for CNV analyses [Xie and Tammi, 2009] and for studying the combined effect of common and rare variants in association studies [Aschard et al., 2011]. Up until now, most GS studies have focused on the coding regions, mainly due to the difficulties in the interpretation of the non-coding genomic regions. As the 1000 Genome Project proceeds, common and rare genetic variants in individuals from different populations are being identified by GS and ES and have become accessible to the public [Abecasis et al., 2012]. However, the acquisition of sequences alone is not sufficient for understanding the etiology of developmental diseases, especially for diseases with heterogeneous phenotypes that are caused by multiple genetic and environmental factors such as OFCs. The addition of multilayered information is required. At the individual and population levels, more patients with different ethnic backgrounds and with more detailed phenotyping are important. At the molecular and cellular levels, -omics studies including genomics, epigenomics, transcriptomics, and proteomics in various cells and tissues will assist in elucidating the underlying mechanisms of these diseases. In this respect, the international consortiums ENCODE and Roadmap Epigenomics projects have provided global maps of regions of transcription, functional cis-regulatory DNA elements, chromatin structure and genomic organization in a large number of cell lines and tissues [Bernstein et al., 2010; Dunham et al., 2012]. Epigenomic profiling of enhancer-associated chromatin features such as occupancy of co-activator (e.g., p300) binding, DNaseI hypersensitivity sites and enrichment of certain histone marks allows the genome-wide identification of cell-type specific enhancers [Maurano et al., 2012; Thurman et al., 2012]. Gene transcription is also regulated in a higher order of chromatin structure within domains, and chromosomal interaction assays such as 4C [Splinter et al., 2012; Van de Werken et al., 2012], 5C [Sanyal et al., 2012], Hi-C [Dixon et al., 2012; Dekker et al., 2013], and ChIA-PET [Fullwood et al., 2009; Handoko et al., 2011; Li et al., 2012] have shown that enhancers can regulate gene expression from a distance. The data provided by these consortia enable interpretation of functionality of the non-coding genome and of rare variants in the non-coding regions generated by GS in studies of Mendelian disorders. Furthermore, ENCODE data have shown that functional SNPs localized in regulatory elements can be associated with phenotypes either through direct implication from GWAS or indirectly through linkage disequilibrium with a GWAS association [Schaub et al., 2012]. Therefore, multi-layered epigenomic profiling data obtained in relevant cell types can also be used to dissect the disease mechanisms underlying GWAS signals detected in complex diseases such as OFC.
CONCLUSIONS
In the past 20 years, genetic studies of human diseases have evolved from low-resolution cytogenetic techniques (G-banding, FISH, and CGH) and candidate gene-based molecular analyses to the state-of-the-art genome-wide arrays (CNV and GWAS) and NGS (ES and GS) at single nucleotide resolution. These techniques have been developed and used to examine genetic variants at different levels in various disease models including syndromic and NS OFCs. Although the classical genetic tools are still utilized and informative, the fast developing sequencing technology has made ES feasible, and soon GS is also likely to become routine in many research labs and diagnostic settings. Large international collaborations and consortia with combination of different expertise have been set up to obtain sequences at the base pair level in healthy and diseased individuals from different populations and to understand the functional relevance of detected variants. These efforts need to be continued and expanded. The establishment of a disease encyclopedia is the prerequisite for prediction, prevention and treatment of diseases and for the realization of personalized medicine.
ACKNOWLEDGMENTS
H.Z. is supported by NWO-ALW MEERVOUD (836.12.010). We thank Dr. Martin Oti for suggestions and comments on the manuscript. We confirm that all authors have no conflict of interest to declare.