Volume 39, Issue 9 pp. 1238-1245
RESEARCH ARTICLE
Full Access

Targeted resequencing reveals genetic risks in patients with sporadic idiopathic pulmonary fibrosis

Yanhan Deng

Yanhan Deng

Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China

The authors contributed equally to this work.

Search for more papers by this author
Zongzhe Li

Zongzhe Li

Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

The authors contributed equally to this work.

Search for more papers by this author
Juan Liu

Juan Liu

Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China

Search for more papers by this author
Zheng Wang

Zheng Wang

Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China

Search for more papers by this author
Yanyan Cao

Yanyan Cao

Division of Cardiology, Departments of Internal Medicine and Genetic Diagnosis Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author
Yong Mou

Yong Mou

Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China

Search for more papers by this author
Bohua Fu

Bohua Fu

Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China

Search for more papers by this author
Biwen Mo

Biwen Mo

Department of Respiratory Medicine, Affiliated hospital of Guilin Medical University, Guilin, China

Search for more papers by this author
Jianghong Wei

Jianghong Wei

Department of Respiratory Medicine, Affiliated hospital of Guilin Medical University, Guilin, China

Search for more papers by this author
Zhenshun Cheng

Zhenshun Cheng

Department of Respiratory Medicine, Zhongnan Hospital of Wuhan University, Wuhan University, Wuhan, China

Search for more papers by this author
Liman Luo

Liman Luo

Department of Pediatrics, The 306 Hospital of People's Liberation Army, Beijing, China

Search for more papers by this author
Jingping Li

Jingping Li

Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China

Search for more papers by this author
Ying Shu

Ying Shu

Department of Respiratory Medicine, Qianjiang Central Hospital, Qianjiang, China

Search for more papers by this author
Xiaomei Wang

Xiaomei Wang

Department of Geriatrics, Southwest Hospital, Army Medical University, Chongqing, China

Search for more papers by this author
Guangwei Luo

Guangwei Luo

Department of Respiratory Medicine, Wuhan No. 1 Hospital, Wuhan, China

Search for more papers by this author
Shuo Yang

Shuo Yang

Department of Respiratory Medicine, Wuhan No. 1 Hospital, Wuhan, China

Search for more papers by this author
Yingnan Wang

Yingnan Wang

Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China

Search for more papers by this author
Jing Zhu

Jing Zhu

Department of Respiratory and Critical Care Medicine, Renmin Hospital of Three Gorges University, Yichang, China

Search for more papers by this author
Jingping Yang

Jingping Yang

Department of Respiratory and Critical Care Medicine, The Third Affiliated Hospital of Inner Mongolia Medical University, Baotou, China

Search for more papers by this author
Ming Wu

Ming Wu

Department of Respiratory and Critical Care Medicine, The Third Affiliated Hospital of Inner Mongolia Medical University, Baotou, China

Search for more papers by this author
Xuyan Xu

Xuyan Xu

Department of Respiratory Medicine, Xianning Center Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, China

Search for more papers by this author
Renying Ge

Renying Ge

Department of Respiratory Medicine, Xianning Center Hospital, The First Affiliated Hospital of Hubei University of Science and Technology, Xianning, China

Search for more papers by this author
Xueqin Chen

Xueqin Chen

Department of Respiratory and Critical Care Medicine, Wuhan University Renmin Hospital, Wuhan University, Wuhan, China

Search for more papers by this author
Qingzhen Peng

Qingzhen Peng

Department of Respiratory Medicine, Xiaogan Central Hospital, Xiaogan, China

Search for more papers by this author
Guang Wei

Guang Wei

Department of Respiratory Medicine, Xiaogan Central Hospital, Xiaogan, China

Search for more papers by this author
Yaqing Li

Yaqing Li

Department of Respiratory Medicine, Zhejiang Provincial People's Hospital, Hangzhou, China

Search for more papers by this author
Hua Yang

Hua Yang

Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China

Search for more papers by this author
Shirong Fang

Shirong Fang

Department of Respiratory Medicine, University Hospital of Hubei University for Nationalities, Enshi, China

Search for more papers by this author
Xiaoju Zhang

Corresponding Author

Xiaoju Zhang

Department of Respiratory Medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou, China

Correspondence

Weining Xiong, Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan 430030, China.

Email: [email protected]

Xiaoju Zhang, Department of Respiratory medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou 450003, China.

Email: [email protected].

Search for more papers by this author
Weining Xiong

Corresponding Author

Weining Xiong

Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan, China

Correspondence

Weining Xiong, Department of Respiratory and Critical Care Medicine, Key Laboratory of Pulmonary Diseases of Health Ministry, Key Cite of National Clinical Research Center for Respiratory Disease, Tongji Hospital, Tongji Medical College, Huazhong University of Sciences and Technology, Wuhan 430030, China.

Email: [email protected]

Xiaoju Zhang, Department of Respiratory medicine, Henan Provincial People's Hospital & the People's Hospital of Zhengzhou University, Zhengzhou 450003, China.

Email: [email protected].

Search for more papers by this author
First published: 19 June 2018
Citations: 26

Communicated by Garry R. Cutting

Funding information:

Integrated Innovative Team for Major Human Diseases Program of Tongji Medical College, Huazhong University of Science and Technology; Clinical Research Physician Program of Tongji Medical College, Huazhong University of Science and Technology.

Abstract

Idiopathic pulmonary fibrosis (IPF) is a genetic heterogeneous disease with high mortality and poor prognosis. However, a large fraction of genetic cause remains unexplained, especially in sporadic IPF (∼80% IPF). By systemically reviewing related literature and potential pathogenic pathways, 92 potentially IPF-related genes were selected and sequenced in genomic DNAs from 253 sporadic IPF patients and 125 matched health controls using targeted massively parallel next-generation sequencing. The identified risk variants were confirmed by Sanger sequencing. We identified two pathogenic and 10 loss-of-function (LOF) candidate variants, accounting for 4.74% (12 out of 253) of all the IPF cases. In burden tests, rare missense variants in three genes (CSF3R, DSP, and LAMA3) were identified that have a statistically significant relationship with IPF. Four common SNPs (rs3737002, rs2296160, rs1800470, and rs35705950) were observed to be statistically associated with increased risk of IPF. In the cumulative risk model, high risk subjects had 3.47-fold (95%CI: 2.07–5.81, = 2.34 × 10−6) risk of developing IPF compared with low risk subjects. We drafted a comprehensive map of genetic risks (including both rare and common candidate variants) in patients with IPF, which could provide insights to help in understanding mechanisms, providing genetic diagnosis, and predicting risk for IPF.

1 INTRODUCTION

Idiopathic pulmonary fibrosis (IPF) is a chronic fatal interstitial pulmonary disease characterized by the progressive loss of lung function with diagnosis based on clinical and radiologic or histologic criteria (Raghu et al., 2011). Typically, IPF presents as late-onset pulmonary dysfunction with an average onset age of 60–70 years and a median survival of 2–3 years after the initial diagnosis (Raghu et al., 2011). Because the pathogenesis is poorly understood, there are no curative treatments except for lung transplantation (Raghu et al., 2011).

In recent years, there has been growing evidence that genetic factors play an important role in both sporadic and familial IPF cases (Fernandez et al., 2012; Garcia-Sancho et al., 2011). Recent independent studies have shown that up to 20% of IPF patients have a family history and can present earlier, indicating that both the frequency of familial pulmonary fibrosis (FPF) and the genetic risk of sporadic IPF could be underestimated (Fernandez et al., 2012; Garcia-Sancho et al., 2011). Previous investigation of genetic data from FPF cases and sporadic patients have led to the identification of rare pathogenic variants in multiple genes, such as surfactant-associated genes (surfactant protein C, SFTPC; surfactant protein A2, SFTPA2; and ATP-binding cassette member A3, ABCA3) (Lawson et al., 2004; Wang et al., 2009) and telomerase-related genes (telomerase reverse transcriptase, TERT; telomerase RNA component, TERC; regulator of telomere elongation helicase 1, RTEL1) (de Leon et al., 2010). In addition, two large genome-wide association studies (GWASs) conducted in patients with sporadic and familiar IPF not only confirmed known associations with TERC, TERT, and mucin 5B gene (MUC5B), but also found novel variants associated with IPF susceptibility, including variants within toll interacting protein (TOLLIP) and signal peptide peptidase like 2C (SPPL2C) (Fingerlin et al., 2013; Noth et al., 2013). Additionally, a common polymorphism (rs35705950) in the promoter of MUC5B is significantly more prevalent in individuals with both sporadic and familial IPF (Peljto et al., 2013; Zhu et al., 2015). Importantly, pulmonary fibrosis can occur in some rare genetic disorders such as dyskeratosis congenita, Hermansky-Pudlak syndrome (HPS), and tuberous sclerosis, indicating a shared genetic pathogenesis (Hisata et al., 2013; Islam & Roach, 2015; Vicary, Vergne, Santiago-Cornier, Young, & Roman, 2016). Current data suggest that at least one-third of the sporadic and familial IPF can be explained by common genetic variants identified in large GWASs, some of the variants differ in different populations, some associated with disease prognosis or response to treatment (Fingerlin et al., 2013).

Since no previous studies investigated all of the above candidate genes comprehensively in Chinese IPF patients, we used high-throughput targeted-resequencing to sequence 92 potentially IPF-related genes in 253 Chinese patients with IPF and 125 matched controls. We report here the spectrum of variants in these genes and identify novel rare variants and common SNPs that may be potentially associated with IPF.

2 MATERIALS AND METHODS

2.1 Study population

In this study, a total of 253 IPF patients and 125 matched controls were enrolled. All subjects were unrelated, and of Han origin from seven different provinces in mainland China. Criteria for selection of controls are: (1) gender and age-matched, (2) unrelated individuals of Han ancestry, and (3) exclude from pulmonary fibrosis and genetic disease. The diagnostic criteria of IPF cases was based on the ATS/ERS/JRS/ALAT guidelines published in 2011, which include the exclusion of other known causes of interstitial lung disease, the presence of a usual interstitial pneumonia pattern on high-resolution computed tomography (HRCT) in patients not subjected to surgical lung biopsy, specific combinations of HRCT and surgical lung biopsy findings in patients subjected to surgical lung biopsy, and abnormalities of lung function tests (Raghu et al., 2011). At least two experts in pulmonary disease and two radiologists independently reviewed each patient's clinical and biopsy findings and HRCT scans. For each participant, medical history, family history, and other basic information and peripheral blood were collected after informed consent was obtained. Ethical approval for this study was obtained from the Institutional Review Board of Tongji Hospital.

2.2 Gene selection and primer design

We based the selection of potentially IPF-related genes on literature (up to January 2016) and several online databases (OMIM: https://www.ncbi.nlm.nih.gov/omim/; GeneCards: https://www.genecards.org/; HGMD: https://www.hgmd.cf.ac.uk/ac/-index.php; GEO: https://www.ncbi.nlm.nih.gov/geoprofiles/) and our previous IPF data. The target panel included genes accounted for FPF, genes from GWASs and animal experiments, genes for rare genetic syndromes such as dyskeratosis congenita, HPS, and tuberous sclerosis that may be associated with pulmonary fibrosis. This list of 92 genes was submitted to Ion AmpliSeq Designer software (Version 4.24) for primer design (Supp. Table S1). Then primers covering all coding regions and at least 5 bp flanking regulatory regions were synthesized and pooled into two multiplex reactions (Supp. Table S2).

2.3 Library preparation and next-generation sequencing

Genomic DNA (gDNA) was extracted from peripheral blood samples by a Blood DNA kit (TIANGEN BIOTECH, Beijing, China) and was diluted to 5 ng/μl. Then gDNA was amplified and libraries were constructed using the Ion AmpliSeq™ Library Kit 2.0 and customized multiplex PCR primer pools (Life Technologies, Carlsbad, California, USA) on the Ion Torrent platform (Thermo Fisher, San Jose, California, USA) as we previously described (Li et al., 2017). Briefly, after purification, libraries were quantitated using a Qubit 2.0 fluorometer (Invitrogen, Carlsbad, California, USA) and pooled at equal ratios for emulsion PCR on an Ion OneTouch System. Then, templated Ion Sphere particles were enriched by using the Ion OneTouch ES (Life Technologies, Carlsbad, California, USA). The template-positive Ion Sphere particles were loaded for sequencing on the Ion Torrent Proton (Life Technologies, Carlsbad, California, USA).

2.4 Bioinformatics analysis

The process of bioinformatics analysis was shown in the diagram (Supp. Figure S1). Sequencing raw data were first processed with the Ion Torrent Suite v5.0.4 to align to the human genome reference (hg19/GRCh37), to call variants and to analyze coverage. Then, detailed annotation of all variants was processed using Ion reporter v5.0 and Annovar software (2017 July) to obtain information including ExonicFunc, AAChange, minor allele frequency (MAF) in 1000Genome, MAF in the Exome Aggregation Consortium (ExAC) database, MAF in the Exome Sequencing Project (ESP), SIFT and PolyPhen-2 score and prediction, SNP entries and InterVar prediction (Wang, Li, & Hakonarson, 2010). Finally, the degree of conservation across multiple species of the nonsynonymous variants was estimated using GERP++ scores.

The DNA mutation numbering system is based on cDNA sequence, and +1 means the A of the ATG translation initiation codon in the reference sequence, with the initiation codon as codon 1.

In filtering, firstly variants with a read depth <20 or with an imbalanced reference/variant allele read depth >3:1 were considered false calls and were removed. Then, we divided the variants into loss-of-function (LOF) variants (frameshift, nonsense, splicing-site, initiation codon break), rare nonsynonymous coding variants, and common variants according to variant type and MAF. Rare nonsynonymous coding variants (MAF ˂ 0.01) were picked out and evaluated by burden tests. Common variants with a MAF ≥ 0.01 in the 1000Genomes Project database, the UCSC common SNP database, the ExAC, or the ESP database were defined as common SNP and were picked out for association analysis. Finally, variants predicted as benign and likely benign by InterVar were removed. Potential functional LOF variants were recorded at a MAF no more than 0.1% among the sequencing population, the ones observed in control groups were filtered out.

We divided the IPF-associated variants into pathogenic, likely pathogenic and LOF (frameshift, nonsense, and splice site) variants according to the ACMG guideline (Richards et al., 2015).

2.5 Sanger sequencing

PCR primers were designed for all LOF variants, likely pathogenic variants, and the MUC5B rs35705950 SNP reported to be associated with IPF were designed using Primer Premier 5.0 software and confirmed to have unique genomic product of sizes between 300–800 bp by UCSC in-silico PCR (https://genome.ucsc.edu/cgi-bin/hgPcr). PCR amplification was optimized in accordance with the manual for Taq™ Hot Start version (TaKaRa, Kyoto, Japan). Sanger sequencing was performed using the Big Dye v.1.1 terminator cycle sequencing kit and an Applied Biosystems 3500xl capillary sequencer (Applied Biosystems, Foster City, CA).

2.6 Relative telomere length measurement

The relative repeat copy number of telomere (T) and single copy gene (36B4a) (S) were measured by real-time PCR in a StepOne Plus real-time PCR system (Applied Biosystems) as described previously (Cawthon, 2009). The primers for the telomere PCR were Telo-F (Forward): ACACTAAGGTTT-GGGTTTGGGTTTGGGTTTGGGTTAGTGT; Telo-R (Reverse), TGTTAGGTAT-CCCTATCCCTATCCCTATCCCTATCCCTAACA; 36B4-F (Forward), CAGCAA-GTGGGAAGGTGTAATCC; 36B4-R (Reverse), CCCATTCTATCATCAACGGGT-ACAA.

All samples for both the telomere and single copy gene amplifications were performed in triplicate in 10 μl reaction system. One reference DNA was serially diluted (twofold) with deionized water to create eight concentrations of DNA ranging from 1.0 to 128 ng/μl to determine the standard curve. The relative length of telomere was expressed as T/S ratio, reflecting the average telomere repeat copy number of each DNA sample calculated relative to the reference DNA.

2.7 Copy number variation analysis

To evaluate the copy number across the targeted genes and to identify potential large heterozygous or homozygous deletions, we analyzed the copy number of all sequenced regions of the 253 cases and 125 controls using Ion reporter 5.0.

2.8 Network analysis

Prediction of gene–gene networks for candidate genes, and their potential interactions with IPF and related phenotypes was performed using Phenolyzer, a tool for phenotype-based prioritization of candidate genes in human diseases (Yang, Robinson, & Wang, 2015). Each candidate gene was given a normalize score ranging from 0 to 1 and ranked according to their relationships with disease/phenotype and related genes (Yang et al., 2015).

2.9 Statistical analysis

Statistical analyses were carried out with the statistical program SPSS version 19.0 and results were expressed as the mean ± SD (continuous variables) and as percent totals (categorical variables). Associations for common SNPs with IPF susceptibility were evaluated by Fisher's exact test providing odds ratios (ORs), 95% confidence interval (CIs), and level of significance (P). Cumulative effect of associated alleles on the risk of IPF was estimated by ORs and 95% CIs from multivariate logistic regression analyses. The association between telomere length and age was assessed by linear regression analysis in IPF patients and age-matched controls.

We detected the associations between 647 rare nonsynonymous variants (MAF ˂ 1%) and IPF by burden test, including adaptive Sum Statistic (ASUM), cumulative minor-allele test (CMAT), and weighted sum statistic (WSS) (Han & Pan, 2010; Madsen & Browning, 2009; Zawistowski et al., 2010). These tests were performed by R software and AssotesteR package (https://github.com/gastonstat/AssotesteR). Genes with evidence for disease-associated rare variants were those with significant association (P < 0.05) by at least one burden test.

Common SNPs (MAF ≥ 0.05) were tested for Hardy–Weinberg equilibrium by Pearson's Chi-square (χ2) test. Allelic model of associations for common SNPs with IPF susceptibility were evaluated by Fisher's exact test providing ORs, 95% CIs, and level of significance (P of < 0.05).

In the cumulative analysis, the risk score of each subject was calculated as the number of risk alleles. If a subject had a single identified risk allele, the risk score was 1, and the maximum risk score was 6. Subjects who carried 0–3 risk scores were assigned to the low risk group, and those with 4–6 risk scores were assigned to the high risk group. The cumulative effect of associated alleles on the risk of IPF was estimated by ORs and 95%CIs from multivariate logistic regression analyses. A standard with P value < 0.05 was considered as significant.

3 RESULTS

3.1 Baseline characteristics

A total of 253 IPF patients and 125 matched controls were included in this study (Table 1). No significant difference was found in age (65.4 vs. 65.3 years), sexual proportion (66.8% vs. 67.2%) or tobacco use (38.7% vs. 37.6%) between the cases and controls.

Table 1. Baseline characteristics
Variables IPF cases (n = 253) Controls (n = 125) P value
Age (years) 65.4 ± 11.1 65.3 ± 10.8 0.24
Male (%) 66.8 67.2 0.94
Tobacco use (%) 38.7 37.6 0.83
Body mass index (kg/m2) 23.4 ± 4.1 24.2 ± 3.8 0.31
Cough (%) 247 (97.6) 0
Chronic exertional dyspnea (%) 142 (56.1) 0
Finger clubbing (%) 74 (29.2) 0
Bibasilar inspiratory crackles (%) 137 (54.2) 0
Pulmonary function test
FVC% pred 75.2 (28.5–122.6)
DLCO% pred 55.4 (15.2–85.6)
  • Age is shown in mean ± SD. For IPF cases, age means onset age. FVC % pred, percent predicted forced vital capacity; DLCO% pred, percent predicted diffusion capacity for carbon monoxide.

3.2 Targeted sequencing output

A total of 1,451 amplicon for 92 targeted genes were amplified and sequenced in 253 sporadic IPF cases and 125 healthy controls. High throughput sequencing covered 94.33% of the target region with an average base coverage depth of 776.5 folds, and 98.88% of the amplicons had at least 20 independent reads, indicating the high quality of the targeted sequencing. (Supp. Table S3)

3.3 Identification of pathogenic, LOF, likely pathogenic variants, and copy number variation

Using rigid filter criteria: (1) LOF variants (frameshift, nonsense, splicing-site, initiation codon break) with a MAF no more than 0.1% among the sequencing population, or missense variants predicted to be pathogenic or likely pathogenic by InterVar according to the ACMG guideline; (2) the ones observed in control groups were filtered out; (3) validated by Sanger sequencing, we identified two reported pathogenic variants (TERT rs121918666; TERT rs199422294), 10 LOF variants, including three frameshift insertion variants, two frameshift deletion variants, four stopgain variants, and one splicing variant (Table 2). The two RTEL1 variants were predicted to be likely pathogenic by InterVar. In total, pathogenic, LOF or likely pathogenic variants in the 92 genes were found in 4.74% (12 out of 253) of the IPF patient and six out of 12 (50%) were previously unreported. All these 12 variants were validated by Sanger sequencing and excluded from the 125 controls. Potential copy number variations were searched and further validated by real-time PCR in 92 selected genes but no large deletions were detected.

Table 2. Pathogenic and loss-of-function variants identified in 253 IPF cases
Gene OMIM Type Variant function Variant Novelty Pathogenicity
CTC1 613129 Candidate Frameshift NM_025099.3:c.400dupT:p.Y134fs Novel
DTNBP1 607145 Candidate Nonsense NM_183040.5:c.G286T:p.E96X Novel
HPS4 606682 Candidate Frameshift NM_152841.9:c.1087dupG:p.D363fs Novel
LAMA3 600805 Candidate Nonsense NM_001127717.18:c.C2116T:p.R706X rs759225610
MMP1 120353 Candidate Frameshift NM_002421.7:c.988delG:p.A330fs rs753853224
MMP19 601807 Candidate Nonsense NM_002429.8:c.T1155A:p.Y385X Novel
RTEL1 608833 Known Frameshift NM_032957.4:c.387_388del:p.T129fs Novel PVS1/Likely pathogenic
RTEL1 608833 Known Nonsense NM_032957.34:c.C3631T:p.Q1211X Novel PVS1/Likely pathogenic
IL1RN 147679 Candidate Splicing NM_173841.3:c.74-2A > G rs763872895
TERT 187270 Known Missense NM_198253.10:c.G2594A:p.R865H rs121918666 Pathogenic
TERT 187270 Known Missense NM_198253.4:c.G1892A:p.R631Q rs199422294 Pathogenic
RTKN2 Candidate Frameshift NM_145307.9:c.952dupT:p.Y318fs rs563733406
  • Novelty indicates whether the variant has been reported; PVS1 indicates the variant is regarded as an “evidence of pathogenicity very strong” variant based on the ACMG guideline; Pathogenic indicates the variant is reported as pathogenic variant in the ClinVar database.

3.4 Burden tests of rare missense variants

To reveal novel associations of selected candidate genes and IPF, we performed three kinds of burden tests (ASUM, CMAT, and WSS) of genes with identified rare missense variants in 253 IPF cases and 125 healthy controls. These tests are alternative approaches to test for associations of rare and low-frequency variant effects. In the tests, three genes were found to be statistical significant (P value < 0.05) in at least one test and had a higher burden of variants in IPF group than in control (Table 3).

Table 3. Gene-based burden tests of rare missense variants
Gene name ASUM CMAT WSS
CSF3R 0.138 0.012 0.04
DSP 0.02 0.122 0.11
LAMA3 0.004 0.568 0.356
  • ASUM, adaptive sum test; CMAT: cumulative minor-allele test; WSS: weighted sum statistic. Shown are P values for burden tests, P < 0.05 was considered as significant, genes with no significance are not shown. CSF3R, colony stimulating factor 3 receptor; DSP, desmoplakin; LAMA3, laminin subunit alpha 3.

3.5 Risk stratification model construction using common SNPs

To identify novel SNPs associated with IPF, we performed allelic-based genetic model association analysis. Four SNPs in three genes were identified to be significantly different between IPF cases and controls (Table 4). Two SNPs, (rs3737002, rs2296160) located on complement C3b/C4b receptor 1 (CR1), are firstly revealed associations with IPF susceptibility. The other two SNPs were previously reported risk SNPs for IPF (TGFB1 rs1800470, MUC5B rs35705950).

Table 4. Significant common SNPs for IPF
Gene SNP Allele OR (95% CI) P
CR1 rs3737002 C/T 1.77 (1.28–2.45) 0.001
CR1 rs2296160 A/G 1.54 (1.13–2.09) 0.006
TGFB1 rs1800470 G/A 1.47 (1.08–2.00) 0.013
MUC5B rs35705950 G/T 4.84 (1.12–20.94) 0.018

To further construct a risk stratification mode using these identified risk SNPs, we evaluated the cumulative effects of risk scores in our study (Table 5). As the data shown, patients with more risk scores had a higher risk of IPF. Specifically, compared with individuals with 0–3 risk scores (low risk group), individuals carrying 4–6 (high risk group) had higher risk (OR = 3.47, 95%CI: 2.07–5.81, = 2.34 × 10−6).

Table 5. Cumulative effect of risk alleles for IPF
Risk alleles Control (n = 125) IPF (n = 253) OR (95% CI) P value
0–3 102 (81.6%) 142 (56.1%) 1.00
4–6 23 (18.4%) 111 (43.9%) 3.47 (2.07–5.81) 2.34 × 10−6
  • 0–3, low risk group; 4–6, high risk group.

3.6 Relative telomere length in IPF patients and controls

The mean telomere length of IPF patients (0.96 ± 0.32) was substantially shorter than age-matched controls (1.14 ± 0.27, P < 0.001). Telomere length distribution with age is depicted in Figure 1. The decline slope with age in telomere length in IPF patients (b = −0.015) was steeper when compared with that of the age-matched controls (b = −0.010).

Details are in the caption following the image
Age-adjusted telomere lengths in patients with idiopathic pulmonary fibrosis (IPF) and controls. The mean age-adjusted telomere length of IPF patients (0.96 ± 0.32) was substantially shorter than the mean age-adjusted telomere length of age-matched controls (1.14 ± 0.27, P < 0.001). The decline slope with age in telomere length in IPF patients (b = −0.015) was obviously steeper, when compared with that of the age-matched controls (b = −0.010)

3.7 Network analysis among candidate genes

Our 15 statistically significant candidate genes (from LOF located genes, burden tests identified genes, and allelic-based genetic model association analysis of SNPs identified genes) were analyzed by Phenolyzer. Network analysis of these genes showed potential gene–gene interaction and interaction with IPF-related phenotypes (Supp. Figure S2). Of these five are known IPF causal genes, and another five (ILR1, MMP1, MMP19, CSF3R, and LAMA3) were identified as significantly associated with IPF for Chinese in our study. More details are available at online mode (https://phenolyzer.wglab.org/done/56161/FEmwlynpQDNTsaIs/index.html).

4 DISCUSSION

This is the first comprehensive study using targeted resequencing (92 IPF-related genes) approach to assess the role of both common and rare variants for IPF risk in the Chinese Han population. We identified two reported FPF pathogenic variants (TERT rs121918666; TERT rs199422294), 10 LOF variants including two PVS1 likely pathogenic variants in 12/253 (4.74%) of our cohort of sporadic IPF patients, which provide potential new clues of the pathogenesis of IPF. Our study demonstrated that previously reported FPF-related genes and additional candidate genes that may also contribute to sporadic IPF cases that may, in some cases, actually be undetected familial cases. We also found that likely pathogenic variants in telomerase-related genes are still the leading genetic causes of IPF. We performed burden tests of rare variants in selected genes, and found three genes (CSF3R, DSP, LAMA3) have a statistically significant relationship with IPF. For common variants, we not only revealed four SNPs that had a statistically significant relationship with risk of IPF, but also constructed a new IPF risk-stratification model with them. In our cumulative risk model, high risk subjects had 3.47-fold (95%CI: 2.07–5.81, P = 2.34 × 10−6) risk compared with low risk subjects.

4.1 Variants in telomerase-related genes

In our study, five out of 253 (∼2%) sporadic IPF cases were identified to bear pathogenic, likely pathogenic or LOF variants in three telomerase-related genes (CTC1, RTEL1, TERT). The two likely pathogenic variants (PVS1) of RTEL1 (c.387_388del, p.T129fs; c.C3631T, p.Q1211X) are unreported novel variants.

According to previous studies, pathogenic variants of TERT and TERC occur in approximately 8%–15% of FPF and 1%–3% of sporadic IPF patients (Armanios et al., 2007; Tsakiri et al., 2007). Shorter telomere lengths are found in and are associated with decreased survival for IPF patients (Dai et al., 2015; Stuart et al., 2014). In addition, recent studies showed that rare variants in regulator of telomere elongation helicase 1 (RTEL1) are also involved in the telomere shortening and FPF (Alder et al., 2015; Hisata et al., 2013; Kannengiesser, Borie, & Revy, 2014; Kropski et al., 2014; Stuart et al., 2015). Another telomerase-related gene is CST telomere replication complex component 1 (CTC1), a causal gene of dyskeratosis congenita—a rare genetic disorder may occur pulmonary fibrosis and shorter telomere lengths (Mason & Bessler, 2011). We report here a CTC1 frameshift insertion (c.400dupT, p. Y134fs) in a Chinese sporadic IPF patient.

4.2 Variants in collagen and extracellular matrix related genes

Current knowledge of IPF pathogenesis suggests that genetic factors trigger repetitive epithelial cell injury, abnormal repair responses and matrix accumulation, and subsequently leads to progressive fibrosis and loss of lung function (Puglisi, Torrisi, Giuliano, Vindigni, & Vancheri, 2016). Excess matrix accumulation is thought to be an important part in the pathological process of IPF, and a related protein such as MMP1 that is strongly upregulated in IPF is proposed to be a potential peripheral blood biomarker (Rosas et al., 2008). In addition, a case–control study found that polymorphisms in the MMP1 promoter may confer increased risk for IPF (Checa et al., 2008). A knockout mice model of another related gene MMP19 showed a significantly increased lung fibrotic response to bleomycin compared with WT mice (Yu et al., 2012). Based on these observations, we hypothesize that variants of collagen and extracellular matrix related genes may also have potential relevance to IPF.

Our study identified a frameshift deletion (c.988delG, p.A330fs) in MMP1 and a stop gain (c.T1155A, p.Y385X) in MMP19, respectively. Interestingly, we found that our youngest patient, with an onset at 35 years of age, carried three potential functional variants in related genes (MMP1, c.988delG, p.A330fs, LOF; ITGA3, c.C469T, p.R157C, rs557579280; MMP19, c.C712T, p.R238W, rs754912368). These two missense variants were extremely rare in the ExAC database and they were predicted in silico to change the protein function. These findings suggest that interactions between multiple variants may predispose to IPF. Functional studies are needed for confirmation.

4.3 Variants in HPS-related genes

Hermansky-Pudlak syndrome is a heterogeneous and rare autosomal recessive genetic disorder (El-Chemaly & Young, 2016; Vicary et al., 2016). Patients with HPS-1, HPS-2, and HPS-4 tend to develop pulmonary fibrosis (Vicary et al., 2016). We included HPS-related genes in our study, and identified one novel frameshift insertion in HPS4, and one stop gain variant in DTNBP1 (HPS7) (Table 2). The IPF patients who carried these candidate variants were carefully examined and findings of HPS were not seen. We propose that some subtypes of IPF and HPS may result from the same genetic factors.

4.4 Correlations between phenotype and genotype

Thus far at least 13 genes are known to cause IPF but their genotype–phenotype correlation is poorly understood. We firstly found candidate variants known to cause specific syndromes with lung fibrosis (CTC1, Dyskeratosis Congenita; DTNBP1, HPS, and HPS4, HPS) that may also contribute to non-syndromic sporadic IPF pathogenesis. We did not find signs or symptoms of these disorders in our IPF patients, despite careful examination. We compared the average onset age of our IPF cases with the number of identified variants (pathogenic, likely pathogenic and LOF) (Figure 2). Our results suggest that IPF cases who carried more variants had an earlier average onset age.

Details are in the caption following the image
Correlation between age related penetrance and genotype. Average onset age of IPF patients carried different number of pathogenic or loss-of-function variants

4.5 Common SNPs and cumulated risk analysis for IPF

Previous studies indicate that the gain-of-function promoter variant (MUC5B, rs35705950) is associated with both FPF and sporadic IPF in different populations (Horimasu et al., 2015; Lee & Lee, 2015; Peljto et al., 2013; Seibold et al., 2011). In our study, the frequencies of the high risk T allele were 3.70% and 0.80% in IPF patients and healthy controls respectively (= 0.018), similar to previous results in Chinese and no TT homozygote was detected (Wang et al., 2014). Considering that IPF is a highly heterogeneous and complex disease, we performed cumulated risk analysis of four significant common SNPs (rs3737002, rs2296160, rs1800470, and rs35705950). In the analysis, subjects who carried 4–6 risk scores (high risk group) had a 3.47-fold increased risk compared with subjects who carried 0–3 risk scores.

Although our study identified several pathogenic, likely pathogenic and LOF variants in 253 sporadic IPF patients in Chinese population, it has several limitations. First, only 92 candidate IPF genes were included and whole exome sequencing was not done. Second, we only studied pathogenic, likely pathogenic and LOF variants that were in exon or short exonic flanking regions, and synonymous variants and variants in intronic or untranslated regions were not studied. Thus, synonymous and noncoding variants are likely underestimated. Third, functional studies of our candidate variants are needed to confirm their causative role.

In conclusion, we report here the first study of the role of both rare and common variants for IPF risk in the Chinese population. Our study identified multiple novel rare and common variants that are associated with increased risk of IPF. Our cumulative risk model analysis results suggest the possibility of risk prediction and stratification for IPF in Chinese.

ACKNOWLEDGMENTS

We would like to acknowledge all the participants who volunteered for this study.

    DISCLOSURE STATEMENT

    The authors declare no conflict of interest. All variants in Table 2 have been submitted to the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/).

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.