Whole genome sequencing and mutation rate analysis of trios with paternal dioxin exposure
Communicating by George P. Patrinos
Abstract
2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) or dioxin, is commonly considered the most toxic man-made substance. Dioxin exposure impacts human health and diseases, birth defects and teratogenesis were frequently observed in children of persons who have been exposed to dioxin. However, the impact of dioxin on human mutation rate in trios has not yet been elucidated at the whole genome level. To identify and characterize the genetic alterations in the individuals exposed to dioxin, we performed whole genome sequencing (WGS) of nine Vietnamese trios whose fathers were exposed to dioxin. In total, 846 de novo point mutations, 26 de novo insertions and deletions, 4 de novo structural variations, and 1 de novo copy number variation were identified. The number of point mutations and dioxin concentrations were positively correlated (P-value < 0.05). Considering the substitution pattern, the number of A > T/T > A mutation and the dioxin concentration was positively correlated (P-value < 0.05). Our analysis also identified one possible disease-related mutation in LAMA5 in one trio. These findings suggested that dioxin exposure might affect father genomes of trios leading to de novo mutations in their children. Further analysis with larger sample sizes would be required to better clarify mutation rates and substitution patterns in trios caused by dioxin.
1 INTRODUCTION
For defoliation, from 1961 to 1971, US forces sprayed more than 19 million gallons of herbicide mixtures, including Agent Orange (AO), over many regions of Southern Vietnam (Stellman, Stellman, Christian, Weber, & Tomasallo, 2003). The most affected zones were the Truong Son mountain, and 28 former US military bases, including Da Nang and Bien Hoa airbases. About 50% of the AO contained 2,4,5-trichlorophenozyacetic acid (2,4,5-T; Stellman et al., 2003), which was contaminated with varying levels of the most toxic congener 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) or dioxin. TCDD was formed as a by-product in the production of 2,4,5-T. The average concentration of TCDD in AO was 13 mg/kg (Stellman et al., 2003). TCDD was more stable than 2,4-D and 2,4,5-T and persisted in the environment for decades. Their half-life in the body was estimated to be 7 to 11 years (reviewed by Milbrath et al. (2009)). About 40 years after the war, the concentration of dioxin was still very high in the soil around Da Nang and Bien Hoa airbases (Banout, Urban, Musil, Szakova, & Balik, 2014; Huyen, Igarashi, & Shiraiwa, 2015). The concentration of TCDD in blood or milk of the people who lived in or around these hotspot regions was also high (Manh et al., 2015; Schecter et al., 2001; Scialli, Watkins, & Ginevan, 2015). Dioxin is highly toxic and can lead to reproductive and developmental problems, damage to the immune system, interference with hormones, and can also cause cancers such as soft tissue sarcoma and lymphoma (summarized in Veterans and Agent Orange: Update 2014). TCDD was classified into Group 1 (carcinogenic to humans) by The International Agency for Research on Cancer (IARC; Steenland, Bertazzi, Baccarelli, & Kogevinas, 2004). In Vietnam and the United States, teratogenesis or birth defects were observed in children of persons exposed to dioxins (National Academies of Sciences, Engineering and Medicines, 2016). These data suggested that dioxin exposure might induce mutations of the human genome.
De novo mutations (DNM) are either newly formed during gamete formation, or occur very early in embryonic development, and are unique to the child when compared to the parents. Each generation, approximately 30–100 DNMs arise in whole human genome, but individual mutation rates may vary considerably and advanced paternal age at conception can increase the de novo mutation rate (Francioli et al., 2015; Jonsson et al., 2017; Kong et al., 2012). DNMs occur as single nucleotide variant (SNV), short insertions/deletions (Indel), copy number variation (CNV), or structural variation (SV), and are shown to contribute significantly to sporadic genetic disorders, such as intellectual disability (Gilissen et al., 2014; Hamdan, et al., 2014; Vissers, Gilissen, & Veltman, 2016), and autism spectrum disorder (Iossifov et al., 2014; Jiang et al., 2013; Neale et al., 2012). DNMs have also been identified as an important cause of sporadic developmental diseases (development delay, congenital heart disease, or hearing loss; Deciphering Developmental Disorders Study, 2015; Hofrichter et al., 2015; Homsy et al., 2015), and degenerative diseases (Alzheimer's disease, Parkinson's disease; Kun-Rodrigues et al., 2015; Rovelet-Lecrux et al., 2015).
Although the biological influence of dioxin on human health and disease has been established by many epidemiological studies (summarized in Veterans and Agent Orange: Update 2014), the impact to the human genome is not clear and direct measurement of mutation rate in people exposed to dioxin has not yet been performed. In this study, we sequenced whole genomes of nine Vietnamese trios whose fathers had elevated residual dioxin content in their sera, and analyzed their germline variations and de novo mutations, including point mutations, Indels, CNVs, and SVs. We also identified candidate variants that would have association with the diseases of their offsprings.
2 MATERIALS AND METHODS
2.1 Subjects
The subjects for this study were selected based on following criteria: (i) they were all of the Kinh nationality, the major ethnic group of Vietnamese, for at least three generations as determined by questionaires; (ii) they have been exposed or carried out missions for more than 2 years in the sprayed regions of Southern Vietnam, then lived in different unsprayed areas; and (iii) the wives and offspring of the veterans lived in nonexposed regions and had no history of dioxin exposure.
In total, 56 Vietnamese veterans have been screened for the content of dioxin in their sera by high-resolution gas chromatography–mass spectrometry (HRGC–MS) analysis. According to Schecter et al. (2006), the typical blood TCDD levels in Vietnam have been found to contain about 2 ppt in the South and 1 in the North. Recently, Manh et al. (2014) reported that the mean level of TCDD in unsprayed area was about 1.5 ppt. We assumed that 1–1.5 ppt TCDD in serum would be considered as a cutoff value between background and elevated concentrations.
Of the 56 men, only nine with elevated concentrations of dioxin in their sera, together with their spouses and children were recruited in this study as nine trios (Table 1). Biological sampling and genome analysis in this study were approved by the Institutional Review Board of Hanoi Medical University, Hanoi, Vietnam and Institutional Review Board of RIKEN, Japan.
Levels of dioxin in blood sera of veterans (ppt) | Number of de novo mutations in offspring | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Trio | TCDD | PeCDD | TCDD +PeCDD | TEQ (PCDD/F) | (TCDD+PeCDD)/TEQ (%) | Father age at time of conception (years) | Point mutation | Indel | SV | CNV |
Trio1 | 1.79 | 2.95 | 4.74 | 9.28 | 51.07 | 41 | 79 | 1 | 1 | 0 |
Trio2 | 1.45 | 2.33 | 2.78 | 7.5 | 31.06 | 34 | 103 | 4 | 1 | 0 |
Trio3 | 2.14 | 3.62 | 5.76 | 10.74 | 53.63 | 32 | 92 | 0 | 0 | 0 |
Trio4 | 3.99 | 2.66 | 6.65 | 11.94 | 55.69 | 29 | 83 | 2 | 1 | 0 |
Trio6 | nd | nd | nd | 8.6 | nd | 29 | 60 | 5 | 0 | 1 |
Trio7 | 12 | 20 | 32 | 69.52 | 46.02 | 29 | 117 | 1 | 0 | 0 |
Trio9 | 8.19 | 14.68 | 22.87 | 51.86 | 44.09 | 37 | 91 | 2 | 0 | 0 |
Trio10 | 4.33 | 3.58 | 7.91 | 18.01 | 43.94 | 33 | 104 | 5 | 0 | 0 |
Trio11 | 2.06 | 2.54 | 4.6 | 8.98 | 51.22 | 30 | 117 | 5 | 0 | 0 |
Total | 846 | 25 | 3 | 1 | ||||||
Mean | 3.99 | 5.81 | 9.80 | 21.82 | 33.3 | 94 | 2.77 |
- nd, not detected; ppt, pg TEQ/g lipid.
2.2 Measurement of dioxins in blood samples
The blood samples (15–20 ml) of fathers were collected during the period of 2006–2007, immediately frozen with dry ice, and stored at −80°C until used. Potassium dichromate was added to blood samples just before delivering to the ERGO Laboratory, Hamburg, Germany (Eurofins ERGO Forschungsgesellschaft mbH) to perform HRGC–MS analyses according to the protocol of Schecter, Pavuk, Papke, and Malisch (2004). Seven polychlorinated dibenzodioxin (PCDD) and 10 polychlorinated dibenzofuran (PCDF) congeners were determined at ERGO, a WHO certified laboratory for human tissue and food dioxin analysis. Briefly, lipid was extracted from whole blood by means of n-hexane and hexane/2-propanol and detected by gravimetry. Before the lipid extraction process, 13C-UL was added to the samples as internal standards. After the extraction, lipid was cleaned up by multicolumn system. PCDDs and PCDFs were measured by HRGC-MS with VG-AutoSpec or VG 70–250 using SP2331 or DB-5 capillary columns. Toxic equivalency value (TEQ) was calculated according to the WHO 2005/I-TEQs. For quality control, one pool sample with already known concentration of PCDD/Fs was analyzed in parallel with unknown samples.
2.3 Genomic DNA extraction and whole genome sequencing
Blood samples (1–2 ml) of the trio members were collected into EDTA-containing tubes and stored at −80°C for genetic studies. Genomic DNA was extracted and purified using the phenol-chloroform method. Whole-genome libraries with 500∼600-bp inserts were prepared according to the protocol provided by Illumina. The libraries were sequenced using the HiSeq2000 platform with paired reads of 101 bp.
2.4 Germline SNV and short Indel calling
Reads were mapped to the reference sequence (GRCh37/hg19) using BWA (Li & Durbin, 2009) and duplicates were marked with Picard (https://picard.sourceforge.net). SNVs and short Indels were identified with Variant Caller with Multinomial Probabilistic Model (VCMM) program (Shigemizu et al., 2013). The pileup file generated by SAMtools (Li et al., 2009) was used as input for VCMM.
2.5 De novo mutation calling
Mutation calling was performed as described previously (Fujimoto et al., 2015; Fujimoto et al., 2016; Fujimoto et al., 2012). In brief, the point mutations should satisfy the following criteria: (1) nonreference calls with a frequency ≥ 0.15, base quality ≥ 10, and mapping quality ≥ 20; (2) supported by at least two base calls including one base call with base quality ≥ 30; (3) a SAMtools consensus quality ≥ 20 and root mean square mapping quality ≥ 40; (4) did not have three or more SNVs within any 10 bp windows; (5) noncoding SNVs were not in a tandem repeat region suggested by tandem repeat finder (Benson, 1999); (6) were not in RepeatMasker repeat regions (https://www.repeatmasker.org) within 1 Mb from the centoromeric or telemeric gaps; and (7) Did not have a base with consensus quality lower than 20 occuring within 3 bp on either side of the target SNV. For short Indels, in addition to the filters in previous studies (Fujimoto et al., 2012, 2015), we also removed short Indels that were supported only by edges of reads (10 bp from the start and end of the read) to exclude false positives. We separately compared the mother and the child, as-well-as the father and the child. Variants identified in both comparisons were considered as de novo mutation candidates. CNVs were identified by DNAcopy software (Andersson et al., 2008) with depth of coverage, and de novo CNVs were selected manually.
2.6 Identification of origin of mutation
To identify the origin of de novo mutations, we detected informative SNVs neighboring within 1000 bp from each de novo point mutation. Informative SNVs were heterozygous in the child and: (1) heterozygous in one parent, absent in the other; or (2) homozygous in one parent, and absent in the other; or (3) homozygous in one parent, heterozygous in the other. For example, if the informative SNV originated from father, then the de novo mutation linked to the SNV also originated from the father and vice versa.
2.7 Statistical analysis
Since the number of de novo mutation is expected to follow a Poisson distribution, we used the Poisson regression model to examine the influence of dioxin and age of parents on the mutation rate (Yuen et al., 2016). Since the amounts of dioxin were highly correlated with each other and it was difficult to select the dioxin with the strongest impact, we tested the impact for all dioxins (TCDD, 1,2,3,7,8-pentachlorodibenzo-p-dioxin [PeCDD], TCDD + PeCDD, and toxic equivalency value [TEQ; PCDD/F]) and parents’ age separately. We assumed a Poisson distribution on the number of mutations and examined the influence of each independent variable on the average number of mutation (λ) in the Poisson distribution. In the Poisson regression model, for the number of mutation (yi), we assume that the mean (λ) of the Poisson distribution depends on a log-scaled parameter (x) (age or dioxin concentration), and therefore λ = exp(β1 + β2x), where β1 is the intercept and β2 is the regression coefficient of x. β2 is a parameter for impact of dioxin or age. β1 and β2 were estimated by glm() function in R (https://www.r-project.org). To test the result of the Poission regression analysis, we used a likelihood ratio test (LRT). In the LRT, deviance of null model (β2 = 0) was compared to that of the alternative model (β2 ≠ 0). We calculated the difference of deviances between the null and alternative models for each independent variable (dioxin or age). Then, we simulated the number of mutation of each sample using a Poisson distribution, and calculated the deviance with the simulated data. This process was replicated 100,000 times and obtained a null distribution. The deviance of the real data was tested with the null distribution.
3 RESULTS
3.1 Dioxin level in the father blood samples
TCDD and PeCDD have been shown to be most toxic dioxin congeners in laboratory animal studies (Bruggeman et al., 2003; Peters et al., 1999). In this study, concentrations of TCDD and PeCDD, and the toxic equivalency value (TEQ, ppt) of dioxin and dioxin-like compounds for the fathers of the nine trios, respectively, were obtained.
Regarding TCDD and PeCDD, the concentration of TCDD in the sera of the other eight fathers were from 1.45 to 12 ppt (mean 3.99 ppt), while the mean level of PeCDD was 5.81 ppt (from 2.33 to 20 ppt) (Table 1, Supporting Information Table S1) (except the father of Trio6, whose serum did not contain TCDD and PeCDD but other PCDD and PCDF congeners were observed). The half-life of TCDD is from 7 to 11 years (Milbrath et al., 2009). So it could be that the father of Trio6 may have had little exposure at the time he carried out missions in the sprayed regions.
Concerning TEQ, the TEQ concentration based on PCDD and PCDF congeners in all samples ranged from 7.5 to 69.52 ppt (mean: 21.82 ppt) (Table 1, Supporting Information Table S1) which was two times higher than in unexposed regions of previous report (Manh et al., 2014) and in the general population that was reported in the review by Consonni, Sindaco, and Bertazzi (2012).
3.2 Whole genome sequencing and identification of germline variants
Whole genomes of nine Vietnamese trios were sequenced to an average of 32× coverage (Supporting Information Table S1). The VCMM (Shigemizu et al., 2013) program was used to call SNVs and short Indels. We detected between 3,553,852 and 3,738,090 germline SNVs, and the average number at the whole-genome level was 3,695,798. The average number of short Indels was 441,051 (Supporting Information Table S1).
We used ANNOVAR (Yang & Wang, 2015) to annotate SNVs and short Indels with variant databases downloaded from the annovar website (https://annovar.openbioinformatics.org). After filtering for SNVs present in dbSNP version 138, the 1000 Genomes databases (Genomes Project Consortium, et al., 2015), and other variant databases, the remaining SNVs were considered as novel SNVs. We obtained at least 13,440 novel SNVs for each individual (Supporting Information Table S3), mainly located in intronic regions. In the coding regions, the number of novel SNVs ranged from 204 to 297 per individual, while the numbers of novel Indels ranged from 45 to 98 (Supporting Information Table S3).
Using depth of coverage, we detected two homozygous deletions in chromosome 4 and 5 in the child of Trio1. The offsprings of Trio2 and Trio3 had one homozygous deletion also found in chromosome 11 and chromosome 2, respectively.
3.3 Identification of de novo mutations
We detected de novo point mutations by identifying offspring-specific variants as previously mentioned (Fujimoto et al., 2016; Kong et al., 2012). In total, 846 de novo point mutation candidates, and 25 de novo short Indel candidates were identified in the nine trios (Table 1 and Supporting Information Tables S3 and S4). To examine the specificity of our analysis, we performed validation by the Sanger sequencing method. We randomly selected 158 de novo point mutations for the validation and confirmed 150 of 158 (specificity = 94.5%) as correct. All de novo Indels were successfully validated (specificity = 100%).
Point mutations located primarily in the intergenic regions, amounted to 58%, followed by 37% in intronic regions, and 5% in coding regions. Of the 846 de novo point mutations, 43 were in coding regions. More than half of the de novo Indels were located in intergenic regions, and 10 were in intronic regions. One de novo deletion (NC000007.13:g.105177154_105177155delAT) was found in the exonic region of the RINT1 (RAD50-Interacting Protein 1) gene in Trio10 (Supporting Information Table S5).
We identified a de novo variant at position NC_000020.10:g.60913153G > A (p.R604D) in LAMA5 (Laminin Subunit Alpha 5) gene in the offspring of Trio11, which was predicted to have an effect on protein function by Polyphen-2 (Adzhubei et al., 2010), Provean (Choi, Sims, Murphy, Miller, & Chan, 2012), and SIFT (Kumar, Henikoff, & Ng, 2009; Table 2) using default parameters supported by PolyPhen-2 (https://genetics.bwh.harvard.edu/pph2/bgi.shtml) and PROVEAN (https://provean.jcvi.org/genome_submit_2.php?species=human), respectively. The child was born in 1982, and has expressed quadriplegia and mental retardation from an early age. A previous study identified a mutation on LAMA5 gene in a presynaptic congenital myasthenic syndrome patient, a disease characterized by muscle weakness and fatigability (Maselli et al., 2017). We also detected a causative de novo point mutation, was predicted by the above in silico tools, at NC_000016.9:g.18908115C > A (p.G86C) in the SMG1 (nonsense mediated mRNA decay associated PI3K related kinase) gene of Trio3 child (Table 2).
In silico prediction tool | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Trio | Gene | Position | Reference | Child | Father | Mother | AA change | Polyphen-2 (score) | PROVEAN (score) | SIFT (score) |
Trio3 | SMG1 | NC_000016.9:g.18908115 | C | CA | CC | CC | G86C | Probably damaging (0.998) | Deleterious (-2.56) | Damaging (0.000) |
Trio11 | LAMA5 | NC_000020.10:g.60913153 | G | GA | GG | GG | R604C | Probably damaging (0.999) | Deleterious (-5.43) | Damaging (0.001) |
Using read pair information, we found one de novo inversion about 16.4 Mb in size (NC_000012.11:g.46243504_62644564inv) in the child of Trio2 (Supporting Information Figure S1). We also identified two de novo deletions of 6.5 kb (NC_000004.11:g.58333755_58340282del) and 1.2 kb (NC_000004.11:g.46220192_46221398del) in the offspring of Trio4 and Trio1, respectively. Using depth of coverage, one de novo deletion of 174.7 kb (NC_000015.9:g. 44100561_44275292del) was detected in Trio6 child (Supporting Information Figure S2).
3.4 Pattern of de novo mutations
The number of point mutations corresponded to a rate of 1.42 × 10−8 mutations per nucleotide per generation, which was consistent with previous reports (1.1 × 10−8 to 3.8 × 10−8 mutation per nucleotide per generation; Conrad et al., 2011; Jonsson et al., 2017; Kong et al., 2012). The rate of transition to transversion was 1.82, which was also similar to a previous study (Jiang et al., 2013). Of the point mutations, 706 (83.5%) were in the non-CpG regions, the remaining 16.5% were located in CpG regions. Among the substitution patterns, C > T/G > A and A > G/T > C were predominant. In CpG sites, C > T mutations were above 20 times more frequent than other substitutions (Figure 1a). The pattern of substitution was similar to a previous study (Figure 1a; Jonsson et al., 2017).

De novo mutation occurs during the development of the zygote in the mother or father. We estimated the origin of mutations using read-pairs. In total, 151 de novo point mutations were linked to informative SNVs. Of these, 116 (76.8%) were estimated to originate from the father, suggesting a higher male mutation rate (Figure 1b). The higher mutation rate in fathers could be caused by the higher number of cell divisions in sperm development compared to the egg. The higher proportion is consistent with male driven evolution hypothesis, and previous studies (Francioli et al., 2015; Jonsson et al., 2017; Kong et al., 2012).
3.5 Influence of dioxin on mutation rate
To examine the influence of dioxin on mutation rate, we carried out a statistical analysis. Since the age of parents was known to influence on the mutation rate (Kong et al., 2012), we performed Poisson regression analyses for the number of point mutations, with the age of conception and the dioxin concentration as independent variables. The total number of point mutations was not significantly correlated with age (Figure 1c, d). However, positive correlation was observed between the total number of point mutations and concentration (ppt) of TCDD, PeCDD, TCDD + peCDD, and TEQ (PCDD/F) and (TCDD + PeCDD)/TEQ (%) (TCDD; P-value = 0.0089, PeCDD; P-value = 0.017, TCDD + PeCDD; P-value = 0.015, TEQ (PCDD/F); P-value = 0.039, (TCDD + PeCDD)/TEQ (%); P-value = 0.0039) (Figure 1e–i). Considering the substitution pattern, the number of A > C/T > G was negatively correlated with the parent age of conception (father; P-value = 0.016, mother; P-value = 0.0083) (Supporting Information Figures S4 and S5). The number of A > T/T > A substitution was positively correlated with the concentration of TCDD, PeCDD, TCDD+PeCDD, and TEQ (PCDD/F) (TCDD; P-value = 0.019, PeCDD; P-value = 0.013, TCDD+PeCDD; P-value = 0.015, TEQ (PCDD/F); P-value = 0.017) (Figure 2, Supporting Information Figures S6–S10). The number of C > T/G > A at CpG was negatively correlated with mother's age (P-value = 0.027; Supporting Information Figure S4). The number of C > T/G > A at non-CpG was negatively correlated with (TCDD + PeCDD)/TEQ (%) (P-value = 0.026; Supporting Information Figure S4).

4 DISCUSSION
We conducted a WGS study of nine Vietnamese trios with highly-elevated concentrations of dioxin in the father's sera. To our knowledge, this is the first genome sequencing study of dioxin-exposed individuals and their family members. Our comprehensive analysis identified de novo SVs and CNVs, in addition to point mutations and short Indels, suggesting that WGS enables us to analyze genome variation comprehensively. Sensitivity and specificity of the sequencing analysis are important criteria for comparing de novo mutation rates. Sensitivity and specificity depend mainly on the depth of coverage and data analysis method. In our analysis, the average depth of coverage was 32.4× (Supporting Information Figure S3) and the depth was quite uniform among the samples, indicating that the amount of data should be sufficient to evaluate and compare the de novo mutation rates with previous studies (Supporting Information Table S1). The false discovery rate was estimated to be 5.5% by the Sanger sequencing method, and false negative rate was estimated to be 8.5% in our previous study (Fujimoto et al., 2012), suggesting that our analysis method has sufficient accuracy for analyzing de novo mutations.
The estimated mutation rate and the substitution pattern in these dioxin-exposed trio studies were not different from previous studies (Conrad et al., 2011; Kong et al., 2012). However, unlike previous reports, no strong positive correlation was observed with the paternal and maternal ages. Possible reasons for the lack of correlation are the small sample size and the narrower distribution of parent's age. In previous studies, the age of conception was distributed from teens to 50s (Jonsson et al., 2017). In contrast, the majority of our samples in this study were in their 30s. This narrower range of age distribution, and the small sample size, would reduce the statistical power for analyzing the age effect, and cause a lack of correlation between the age and the number of mutations.
In addition to the small sample size, we note that our method has several limitations to be assessed in future. First, it is difficult to analyze repetitive regions (microsatellite, transposon telomeric, and centromeric regions) with high accuracy with current short read technologies, therefore, mutations in such regions were not analyzed in the current study. Second, the current depth of coverage (>30×) is considered to be sufficient for identifying SNVs (Bentley et al., 2008), but accurate detection of structural variations and short Indels would require higher depth. A larger sample size and higher depth may overcome these limitations in the future.
Our analysis identified significant positive correlation between the total number of point mutations and TCDD, PeCDD, TCDD+PeCDD, and TEQ (PCDD/F). These results suggest that dioxin has an influence on the human germline mutation rate. The effect of the dioxin concentration seems to be linear (Figure 1). A trio with the lowest dioxin concentration (Trio6) had the smallest number of point mutations (number of point mutation = 60), and a trio with highest dioxin concentration (Trio7) showed the highest number of point mutations (number of point mutation = 117). This result suggests that additional trio samples exposed to high dosage of dioxin and unexposed trios would be important to clarify the impact of dioxin. Considering the substitution pattern, the number of A > T/T > A substitutions was significantly correlated with TCDD, PeCDD, TCDD+PeCDD, and TEQ (PCDD/F). Although the number of A > T/T > A mutations is small in each trio, dioxin may preferentially cause A > T/T > A substitution. Since the dioxin concentrations (TCDD, PeCDD, and TCDD+PeCDD) were strongly correlated with each other, we could not determine the dioxin with the strongest impact (Supporting Information Figure S11). However, analysis of a larger number of trios may reveal compounds in dioxin with strong effects.
Our study detected significant positive correlation between the dioxin concentration and the number of mutations. To deeply clarify the mutation rate and substitution pattern caused by dioxin, further studies are required. However, as mentioned above, the number of individual cases with highly-elevated dioxin content of ∼10 ppt or higher is extremely rare and a larger sample size would not be easily available.
ACCESSION CODES
Sequencing data were deposited into NBDC database (https://biosciencedbc.jp/en/) with accession number (Study: JGAS00000000137).
ACKNOWLEDGMENTS
This work was supported by the Scientific Program KHCN-33/11-15 (grant KHCN-33.06/11-15), the Ministry of Natural Resources and Environment, and by the Ministry of Science and Technology (grant DTDL.CN-05/15). N.D.T. was grateful to RIKEN for the financial support for his research stay at RIKEN. Supercomputing resource ‘SHIROKANE’ was provided by the Human Genome Center at The University of Tokyo. We would like to thank all trio members who have participated in this study.