RESEARCH ARTICLE

Open Access

Cell-free epigenomes enhanced fragmentomics-based model for early detection of lung cancer

Yadong Wang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Qiang Guo,

Qiang Guo

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Zhicheng Huang,

Zhicheng Huang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Liyang Song,

Liyang Song

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Fei Zhao,

Fei Zhao

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Tiantian Gu,

Tiantian Gu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Zhe Feng,

Zhe Feng

Department of Cardiothoracic Surgery, the Sixth Hospital of Beijing, Beijing, China

Search for more papers by this author

Haibo Wang,

Haibo Wang

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Bowen Li,

Bowen Li

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Daoyun Wang,

Daoyun Wang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Bin Zhou,

Bin Zhou

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Chao Guo,

Chao Guo

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Yuan Xu,

Yuan Xu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Yang Song,

Yang Song

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Zhibo Zheng,

Zhibo Zheng

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Zhongxing Bing,

Zhongxing Bing

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Haochen Li,

Haochen Li

orcid.org/0000-0003-0104-8818

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Xiaoqing Yu,

Xiaoqing Yu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Ka Luk Fung,

Ka Luk Fung

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Heqing Xu,

Heqing Xu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Jianhong Shi,

Jianhong Shi

orcid.org/0000-0003-2232-1000

Department of Scientific Research, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Meng Chen,

Meng Chen

Department of Scientific Research, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Shuai Hong,

Shuai Hong

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Haoxuan Jin,

Haoxuan Jin

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Shiyuan Tong,

Shiyuan Tong

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Sibo Zhu,

Sibo Zhu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Chen Zhu,

Chen Zhu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Jinlei Song,

Jinlei Song

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Jing Liu,

Jing Liu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Shanqing Li,

Shanqing Li

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Hefei Li,

Corresponding Author

Hefei Li

[email protected]

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

Xueguang Sun,

Corresponding Author

Xueguang Sun

[email protected]

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

Naixin Liang,

Corresponding Author

Naixin Liang

[email protected]

orcid.org/0000-0001-7995-4226

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

Yadong Wang,

Yadong Wang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Qiang Guo,

Qiang Guo

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Zhicheng Huang,

Zhicheng Huang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Liyang Song,

Liyang Song

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Fei Zhao,

Fei Zhao

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Tiantian Gu,

Tiantian Gu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Zhe Feng,

Zhe Feng

Department of Cardiothoracic Surgery, the Sixth Hospital of Beijing, Beijing, China

Search for more papers by this author

Haibo Wang,

Haibo Wang

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Bowen Li,

Bowen Li

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Daoyun Wang,

Daoyun Wang

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Bin Zhou,

Bin Zhou

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Chao Guo,

Chao Guo

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Yuan Xu,

Yuan Xu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Yang Song,

Yang Song

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Zhibo Zheng,

Zhibo Zheng

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Zhongxing Bing,

Zhongxing Bing

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Haochen Li,

Haochen Li

orcid.org/0000-0003-0104-8818

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Xiaoqing Yu,

Xiaoqing Yu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Ka Luk Fung,

Ka Luk Fung

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Heqing Xu,

Heqing Xu

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Jianhong Shi,

Jianhong Shi

orcid.org/0000-0003-2232-1000

Department of Scientific Research, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Meng Chen,

Meng Chen

Department of Scientific Research, Affiliated Hospital of Hebei University, Baoding, China

Search for more papers by this author

Shuai Hong,

Shuai Hong

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Haoxuan Jin,

Haoxuan Jin

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Shiyuan Tong,

Shiyuan Tong

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Sibo Zhu,

Sibo Zhu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Chen Zhu,

Chen Zhu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Jinlei Song,

Jinlei Song

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Jing Liu,

Jing Liu

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Search for more papers by this author

Shanqing Li,

Shanqing Li

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Search for more papers by this author

Hefei Li,

Corresponding Author

Hefei Li

[email protected]

Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

Xueguang Sun,

Corresponding Author

Xueguang Sun

[email protected]

Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

Naixin Liang,

Corresponding Author

Naixin Liang

[email protected]

orcid.org/0000-0001-7995-4226

Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Correspondence

Hefei Li, Department of Thoracic Surgery, Affiliated Hospital of Hebei University, Baoding, China.

Email: [email protected]

Xueguang Sun, Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China.

Email: [email protected]

Naixin Liang, Department of Thoracic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Email: [email protected]

Search for more papers by this author

First published: 05 February 2025

https://doi.org/10.1002/ctm2.70225

Citations: 1

Yadong Wang, Qiang Guo, Zhicheng Huang, Liyang Song and Fei Zhao contributed equally to this work.

Share a link

Email
Wechat
Bluesky

Abstract

Background

Lung cancer is a leading cause of cancer mortality, highlighting the need for innovative non-invasive early detection methods. Although cell-free DNA (cfDNA) analysis shows promise, its sensitivity in early-stage lung cancer patients remains a challenge. This study aimed to integrate insights from epigenetic modifications and fragmentomic features of cfDNA using machine learning to develop a more accurate lung cancer detection model.

Methods

To address this issue, a multi-centre prospective cohort study was conducted, with participants harbouring suspicious malignant lung nodules and healthy volunteers recruited from two clinical centres. Plasma cfDNA was analysed for its epigenetic and fragmentomic profiles using chromatin immunoprecipitation sequencing, reduced representation bisulphite sequencing and low-pass whole-genome sequencing. Machine learning algorithms were then employed to integrate the multi-omics data, aiding in the development of a precise lung cancer detection model.

Results

Cancer-related changes in cfDNA fragmentomics were significantly enriched in specific genes marked by cell-free epigenomes. A total of 609 genes were identified, and the corresponding cfDNA fragmentomic features were utilised to construct the ensemble model. This model achieved a sensitivity of 90.4% and a specificity of 83.1%, with an AUC of 0.94 in the independent validation set. Notably, the model demonstrated exceptional sensitivity for stage I lung cancer cases, achieving 95.1%. It also showed remarkable performance in detecting minimally invasive adenocarcinoma, with a sensitivity of 96.2%, highlighting its potential for early detection in clinical settings.

Conclusions

With feature selection guided by multiple epigenetic sequencing approaches, the cfDNA fragmentomics-based machine learning model demonstrated outstanding performance in the independent validation cohort. These findings highlight its potential as an effective non-invasive strategy for the early detection of lung cancer.

Keypoints

Our study elucidated the regulatory relationships between epigenetic modifications and their effects on fragmentomic features.
Identifying epigenetically regulated genes provided a critical foundation for developing the cfDNA fragmentomics-based machine learning model.
The model demonstrated exceptional clinical performance, highlighting its substantial potential for translational application in clinical practice.

1 INTRODUCTION

Lung cancer is one of the leading causes of cancer mortality worldwide.¹ One of the main reasons is that approximately 75% of lung cancer patients are diagnosed at the advanced stage, resulting in a 5-year survival rate of less than 10%.² In contrast, early-stage lung cancer dramatically improves the 5-year survival rates to 68–92%, underscoring the critical importance of early detection and timely intervention.

While low-dose computed tomography (LDCT) screening is effective for early lung cancer detection and reducing mortality,³ its high false-positive rate often leads to unnecessary psychological distress and radiation exposure during the follow-up procedure. Therefore, there remains an urgent need for more precise and non-invasive screening methods for lung cancer, particularly those targeting early-stage detection. Liquid biopsy techniques including the analysis of circulating tumour cells (CTCs), circulating tumour DNA (ctDNA) and exosomes, have emerged as promising non-invasive alternatives in the early detection of lung cancer.⁴ However, their clinical translation is hindered by the challenge of accurately capturing these biomarkers due to the typically low abundance, particularly in early-stage lung cancer.⁵

Cell-free DNA (cfDNA), released during cell apoptosis, necrosis and secretion processes, carries abundant epigenomic molecular signatures that make it an ideal biomarker for cancer detection.⁶ For instance, the fragmentation profiles of cfDNA, including fragment size, genomic distribution, breakpoint locations and end motifs, provide insights into nucleosome positioning, chromatin structure and nuclease activity during cell death.^7-10 Patients with cancer exhibit distinct cfDNA profiles, characterised by a higher proportion of shorter fragments and a reduced preference for C-end motifs compared with healthy controls.¹¹ Leveraging these fragmentomic features, machine learning models have shown strong performance in detecting advanced lung cancer¹² and hold promise for multi-cancer early detection.¹³ Despite this potential, the sensitivity in detecting early-stage lung cancer, especially stage I cases that would most benefit from early detection and intervention, remains unsatisfactory.¹⁴ This highlights an urgent need for more accurate approaches to analyse fragmentomic features.

Recent studies have confirmed a strong link between fragmentomic patterns and gene expression in the cells of origin.^15-17 Since the gene expression landscape of cancer cells cannot be directly measured from blood, a promising strategy to enhance diagnostic performance is integrating fragmentomic data with various epigenomic information, which is encapsulated by cell-free nucleosome complexes. Cell-free epigenomic features like CpG DNA methylation,^{18, 19} histone modifications^{20, 21} and chromatin accessibility, particularly in nucleosome-depleted regions (NDR) at transcription factor binding sites,²² are recognised as valuable markers for early cancer detection. Hypermethylation of CpG island on promoters and modification states of histone are sufficient to alter cell fate and result in cancer tumourigenesis, even in the absence of driver mutations.²³ Plasma cfDNA histone modifications have facilitated the non-invasive profiling of lung cancer transcriptomes, demonstrating concordant enrichment profiles across various lung cancer subtypes.^{24, 25} H3K4me3, a common histone modification, is predominantly enriched at promoters and exhibits a strong positive correlation with transcriptional activity.²⁰ Furthermore, the induction of NDRs by transcription factor binding is a critical mechanism for regulating gene expression and modulating chromatin structure.^{26, 27} Therefore, integrating these epigenomic layers with fragmentomic features could provide deeper insights into cfDNA-derived cancer signals and facilitate the development of more accurate detection models.

The detection performance of integrated models has outperformed single-fragmentomic approaches in hepatocellular carcinoma²⁸ and early-stage breast cancer.²⁹ However, it is noteworthy that only a limited number of epigenomic layers were included in these studies, and research on early detection of lung cancer remains scarce. Furthermore, the development of lung cancer, particularly adenocarcinoma, is recognised as a continuous progression from atypical adenomatous hyperplasia (AAH) to adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA) and ultimately to invasive adenocarcinoma (IAC).³⁰ The dynamic changes in epigenetic modifications and fragmentomic features of cfDNA during this process are also highly worth exploring.

To address this gap, we conducted a multi-centre prospective cohort study to integrate multiple epigenomic layers with diverse fragmentomic features and establish an accurate model for early detecting lung cancer. The epigenomic landscape of plasma cfDNA was characterised using chromatin immunoprecipitation sequencing (cfChIP-seq), reduced representation bisulphite sequencing (cfRRBS) and low-pass whole-genome sequencing (lpWGS). These three approaches target the same molecular entity, the cell-free nucleosome complex and offer a synergistic framework to explore their crosstalk, an area that remains underexplored. This study aims to provide novel insights into cfDNA biology and open new avenues for early detection of lung cancer.

2 METHODS

2.1 Study design and participants enrolment

This is a multi-centre, prospective, cohort study. Participants with suspicious malignant lung nodules, and healthy volunteers in the Affiliated Hospital of Hebei University (AHHU; training cohort) and the Peking Union Medical College Hospital (PUMCH; validation cohort) were consecutively enrolled from November 2022 to December 2023.

Inclusion criteria for individuals with suspicious malignant lung nodules included: (1) age 18 years or older; (2) underwent surgery or biopsy to obtain a definitive pathological diagnosis; (3) able to provide the written informed consent and qualified blood samples. Exclusion criteria included: (1) history of cancer; (2) received anti-cancer therapy prior to the blood sampling; (3) multiple primary lung cancer. For the age- and sex-matched healthy volunteers, participants were required to meet the following criteria: be at least 18 years old, have no history of cancer and exhibit no indications of suspicious malignant lung nodules based on chest CT screening, which was carefully evaluated by two experienced clinicians. The corresponding demographic and clinical information of the participants was collected and used for the subsequent analyses. Tumour stages were determined according to the eighth edition of the American Joint Committee on Cancer classification.

The study was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of AHHU (Approval No. HDFYLL-IIT-023-005) and PUMCH (Approval No. I-23PJ1205). Written informed consent was obtained from all enrolled participants prior participation.

2.2 Collection and preparation of samples

Peripheral blood (10 mL) was collected by venipuncture from each subject in Cell-Free DNA BCT tubes (Streck). Plasma was separated within 2 h by centrifugation at 1600×g for 10 min, followed by a second centrifugation at 16 000×g for 10 min at 4°C. Haemolysed samples or samples with insufficient material could not be assayed and were excluded. One millilitre aliquots of plasma was stored at −80°C for cfChIP-seq until analysis. cfDNA was extracted from plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen) according to the manufacturer's instructions. cfDNA concentration and size distribution were assessed by Qubit fluorometer (Invitrogen; Q33230) and LabChip GX Touch DNA High Sensitivity Assay (PerkinElmer: CLS140158). Quantified cfDNA was stored at −20°C for cfRRBS and lpWGS.

2.3 Library preparation for cfChIP-seq, cfRRBS and lpWGS

For nucleosome capture cfChIP-seq, 200 µg of H3K4me3 Recombinant Polyclonal Antibody (Invitrogen; 711958) were conjugated to 20 mg of epoxy M270 Dynabeads (Invitrogen; 14301) according to manufacturer's instructions. The antibody covalently conjugated beads were washed and resuspended in PBS containing 0.01% azide preservative at 30 mg/mL and stored at 4°C for use on the same day. In 1 mL of plasma thawed in a water bath, 6.6 µL of antibody covalently conjugated beads were added, along with 1× protease inhibitor cocktail (Roche; 4693132001) and 10 mM EDTA. The reaction was mixed by rotating overnight at 4°C. The beads were magnetised and washed eight times with 200 µL of blood wash buffer (50 mM Tris–HCl, 150 mM NaCl, 1% Triton X-100, 0.1% sodium deoxycholate, 2 mM EDTA, 1× protease inhibitor cocktail) and three times with 150 µL of 10 mM Tris pH 7.4 on ice. The beads were resuspended in 50 µL of chromatin elution buffer (10 mM Tris pH 8.0, 5 mM EDTA, 300 mM NaCl, 0.6% SDS, 2.5 µL of NEB proteinase K) and incubated for 1 h at 55°C. After magnetisation, the supernatant containing cfDNA was purified using 1.8× Agencourt AMPure XP beads (Agencourt). Double strand library construction of cfDNA was processed by NEBNext Ultra II End Prep kit, NEBNext Ultra II Ligation Master Mix and NEBNext Ligation Enhancer (NEB). The DNA library was amplified by 15 cycles of PCR and purified using 1× AMPure XP. 10 Gbp DNA sequencing data were obtained using the illumina 150 PE program.

For cfRRBS, 10 ng purified cfDNA was cleaved by MspI restriction enzyme (NEB, R0106L) at CCGG sites. Bisulphite conversion of the adapter-ligated product was carried out with EZ DNA Methylation-Lightning Kit (ZYMO, D5030). Library construction for cfRRBS was performed using the protocol described in detail elsewhere with minor modifications.³¹ The converted library was amplified using KAPA HiFi HotStart Uracil + ReadyMix Kit (Roche, 07959079001) for 15 cycles. After 1.3× AMPure beads purification, 150PE sequencing was performed on the illumina machine for 20 Gbp data.

For lpWGS, 5 ng of purified cfDNA was processed using the xGen Prism DNA Library Prep Kit (IDT; 10006202), according to the manufacturer's instructions, and then amplified by seven cycles of PCR. 1.3× AMPure X purified library was quantified by Qubit 1× dsDNA HS and LabChip GX Touch DNA High Sensitivity Assay. 10 Gbp DNA sequencing data were obtained using the illumina 150PE program.

2.4 Processing of high-throughput sequencing data

FASTQ files were subjected to processing using Fastp software (v0.23.4) to remove adapters and sequences with low average sequencing quality. For cfChIP-seq data, bowtie2 (v2.5.3) and sambamba (v1.0) were used for mapping and deduplicating. MACS2 narrow peak BAMPE method (v3.0.1) and bedtools (v2.31.0) were used to call and visualise the enrichment peak. After calculating on-target reads in consensus peak by featureCounts (v2.18.0), samples with fewer than 0.1 million on-target reads or an on-target rate below 30% were excluded. For cfRRBS data, the trimmed reads were aligned to the reference genome hg19 using the Bismark aligner (v0.24.1) and methylation call was performed with methylation extractor script. The results were converted to bedgraph format using Bismark for subsequent analysis. For lpWGS data, the resulting clean data were aligned to the hg19 reference genome using bwa-mem2 (v2.2.1) and sorted using samtools (v1.17) to obtain positional information for each DNA fragment. PCR-induced duplicates were removed using sambamba, and reads with low alignment quality, unalignment or unmatched ends were filtered using samtools view with specific criteria (-f3-F3852). The remaining DNA fragments were converted to bedpe format using bedtools for subsequent analysis.

2.5 Calculation of cfDNA fragmentomic features

We developed in-house scripts to extract fragmentomic features of cfDNA 6 bp end motif and 2+4 bp breakpoint motif from lpWGS data. The cfDNA 6 bp end motif was determined from the terminal 6-nucleotide sequence. The cfDNA 2+4 bp breakpoint motif was defined as a 2 bp extension in the 5′ direction and a 4 bp extension in the 3′ direction from the aligned cfDNA 5′ breakpoints within the reference genome. For each sample, we calculated the frequency of each specific motif relative to the total number of motifs, ensuring that the frequencies summed to 1.

The fragmentation size ratio (FSR) was calculated as the proportion of the 151–220 bp fragments within each 1 Mb window relative to the total number of fragments in that window. The fragmentation size distribution (FSD) involved grouping fragments within the 65–400 bp range into 5 bp intervals and calculating the proportion of each group on every chromosome arm.

Transcription factor binding often leads to the formation of NDRs at regulatory regions, influencing local chromatin accessibility and thereby affecting cfDNA fragmentation patterns. To assess this, we calculated the transcription start site (TSS) NDR scores by quantifying the relative depletion of cfDNA fragments within the promoter region, which are known to be strongly associated with transcriptional activity and nucleosome dynamics. Specifically, the relative coverage was calculated as the ratio of the average read coverage within the NDR to the average coverage of its upstream and downstream flanking regions, a measure reflecting the degree of nucleosome depletion at the promoter, as previously reported.²² Briefly, for the promoter region (−150 to 50 bp relative to TSS), the mean raw coverage was divided by the mean coverage of upstream (−2000 to −1000 bp relative to TSS) and downstream (1000 to 2000 bp relative to TSS) flanking regions to yield the relative coverage. This normalisation approach was chosen to account for potential biases in sequencing depth and to focus on the relative depletion of nucleosomes in the NDRs.

2.6 Gene-level multi-omics feature analysis

Using data from the training set, we calculated cell-free epigenomics and fragmentomics features for each gene based on the UCSC hg19 knownCanonical gene annotation. The H3K4me3 level was quantified using normalised RPKM values within each promoter region. DNA methylation was measured by the mean CpG methylation ratio within each promoter region, derived from cfRRBS data. Fragmentomic characteristics, including the proportions of 0–150 and 151–220 bp fragments, motif proportions and deconvolution contributions, were computed using reads from 1500 bp upstream and the gene body region of each gene. The deconvolution contribution of end-motifs was calculated by performing a dot product of 4-mer end motif frequencies with the pre-trained F-profile frequencies matrix from Zhou et al.³² According to biological research on the six major types of end-motif components,³² in this study, F-profile I to VI were annotated as DNASE1L3, DNASE1, DFFB, Non-DNase C-end, Non-DNase G-end and Non-specific diverse end motifs, respectively. Motif entropy was calculated using 4-mer Shannon entropy.

For the comparison between cancer and non-cancer samples, multi-omics features were averaged within each group, and genes with detectable signals in fewer than 10 samples were filtered out. Cancer-specific differences were quantified using the Z-score derived from the Mann–Whitney U rank sum test. The Z-score was calculated using the formula:

\begin{equation*}\ {{Z}_{{\mathrm{score}}}} = \frac{{{{U}_{{\mathrm{statistic}}}} - \frac{{\left( {{{n}_1}.{{n}_2}} \right)}}{2}}}{{\sqrt {\frac{{{{n}_1}.{{n}_2}.\left( {{{n}_1} + {{n}_2} + 1} \right)}}{{12}}} }}\end{equation*}

In this formula, U_statistic refers to the Mann–Whitney U statistic, n₁ is the number of cancer samples and n₂ is the number of non-cancer samples. To improve the performance of rank sum methods for genes with high dynamic range H3K4me3 levels, background correction was applied using the H3K4me3 program from Nir Friedman's cfChIP-seq software.²⁰

2.7 Identification and functional analysis of multi-epigenetically regulated genes

To quantify cfChIP-seq and cfRRBS data, we utilised the promoter annotations provided by the Ensembl Regulatory Build (GRCh37). Promoter raw count data from cfChIP-seq were normalised, and differential analysis between lung cancer patients and non-cancer controls was performed using edgeR software. Meanwhile, the Mann–Whitney U test was employed to compare the mean methylation levels of promoter regions obtained from cfRRBS and the NDR scores between the groups.

To capture as many signal differences as possible between patients with lung cancer and non-cancer controls, we conducted three sets of comparisons: (1) lung cancer versus benign lung nodules; (2) lung cancer versus healthy volunteers; (3) lung cancer versus benign lung nodules and healthy volunteers. Results from the three sets of comparisons were integrated to identify genes regulated by multiple epigenetic modifications in lung cancer. The filtering criteria for multi-epigenetically regulated gene (MERGE) required that a gene consistently show significant results in at least two differential analyses, with the results being biologically consistent across different omics layers. For instance, significant upregulation in cfChIP-seq H3K4me3 and significant downregulation in cfRRBS DNA methylation both align with the biological context of gene activation.

The clusterProfiler³³ tool was utilised to perform gene function and pathway enrichment analysis, encompassing GO Molecular Function, Reactome and WikiPathway databases. Motif enrichment analysis was conducted using the MEME Suite tool in conjunction with the JASPAR2024 CORE non-redundant database, focusing on regions within ±1 kb of the TSS of genes. Additionally, Genetic Perturbation Similarity Analysis was carried out using the GPSAdb database.³⁴

2.8 Machine learning model construction and cross-validation analyses

Fragmentomic features from both the whole genome and MERGE regions were screened for subsequent model training and validation. For MERGE regions specifically, the FSR for each gene interval was calculated as the proportion of 151–220 bp fragments relative to the total number of fragments within that gene. Motif analysis within MERGE regions followed the same methodology as the genome-wide analysis but was restricted to the defined MERGE intervals.

Initially, each fragmentomic feature was modelled independently to estimate the probability of lung cancer for every participant in the training dataset. The performance of whole genome-based models and corresponding MERGE-based models was compared, and the better-fitting models were selected as the base models for further analysis. Subsequently, an ensemble model was developed by integrating the predicted probabilities from each base model using the Extra Trees algorithm provided by scikit-learn python library. We then used the ensemble model trained with the best set of hyper-parameters (1000 for n_estimators, 5 for max_depth and 5 for min_samples_split) for performance measurement. To assess the accuracy of the ensemble model, 10-fold cross-validation was performed during the training phase. The calculated score for each participant from the ensemble model was termed the MERGE score, ranging from 0 to 1, with higher scores indicating a greater likelihood of being predicted as lung cancer. After that, the untouched validation dataset was used to evaluate the detection performance of the multi-dimensional ensembled machine learning model. Due to clinical controversies regarding its true classification, AAH was not incorporated in the formal efficacy evaluation of the ensemble model but was only used for exploratory analyses.

2.9 Clinical benefits estimation

To evaluate the clinical utility of the ensemble model, an interception model was introduced³⁵ and adopted with epidemiological data on lung cancer,³⁶ stage at diagnosis³⁷ and 5-year survival rate³⁸ in China. Under the most conservative dwell time scenario (aggressive fast mode), the benefit of cancer stage shifted from late (stage III/IV) to early (I/II) and the improvement of 5-year survival rate were evaluated at different screening intervals (from 6 months to 5 years). The original code is available at Hubbell_CEBP_Inteerception.

2.10 Statistical analysis

All statistical analyses were performed using R version 4.4.0. Continuous variables were described with medians and interquartile ranges and categorical variables with numbers and percentages. The Mann–Whitney U rank sum test was used to compare continuous variables between two groups and the Kruskal–Wallis test for multiple groups. Categorical variables were analysed using the Chi-squared test or Fisher's exact test, as appropriate. Based on true-positive (TP), true-negative (TN), false-positive (FP) and false-negative (FN) of lung cancer prediction, the sensitivity [TP/(TP + FN)], specificity [TN/(TN + FP)], positive predictive value (PPV) [TP/(TP + FP)] and negative predictive value (NPV) [TN/(TN + FN)] values were calculated. The receiver operating characteristic (ROC) curves were generated using the pROC package. Areas under the ROC curves (AUC) of prediction models were compared using the DeLong test. For the AUC, the 95% confidence interval was computed using 2000 stratified bootstrap replicates. For sensitivity, specificity, PPV and NPV, the Wilson method from the binom package was employed to calculate the corresponding 95% confidence intervals.

3 RESULTS

3.1 Study overview and cohort characteristics

The cell-free epigenome landscape is distinct from that in cells. Due to the complex processes of cell death and circulating digestion, as well as the mixture of cfDNA from various sources and digestion stages, cfDNA carries a rich array of versatile epigenetic modifications with unique biases.³⁹ We believe that for liquid biopsy based on cfDNA, specifically examining the epigenetic information in cfDNA, should perform better than relying solely on prior knowledge from tissues. Therefore, we hypothesise that cancer screening based on fragmentomics will greatly benefit from systematic analysis of the cell-free epigenome.

To test this hypothesis, we focused on three types of cell-free epigenomic characteristics: histone modification, DNA methylation and chromatin accessibility. The study measured cell-free nucleosome H3K4me3 levels at promoter regions, assessed the CpG methylation status of these regions and examined NDRs induced by transcription factor binding. Specifically, nucleosome depletion refers to the phenomenon in which nucleosomes are removed or less abundant in specific genomic regions, thereby enhancing the accessibility of the underlying DNA to the transcriptional machinery.⁴⁰ By comprehensively investigating the synergistic impact of these types of epigenomic features on fragmentomics, we aimed to classify a series of cancer-derived MERGEs regions for a fragmentomics-based cancer screening model (Figure 1A).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Schematic of overall approach for lung cancer early detection. (A) Diagram illustrating the sequencing, data analysis and modelling methodology. cfDNA extracted from plasma samples underwent cfChIP-seq, cfRRBS and lpWGS. Cell-free epigenomic features were extracted and comprehensively analysed. Cancer-derived epigenetically altered genes were screened based on their cell-free epigenomic profiles to identify MERGE candidates. Fragment features, including fragment size and end motifs from lpWGS, were analysed to develop a MERGE-enhanced cancer detection model. (B) Cohorts used for the development and validation of the lung cancer screening model. The training cohort was used for MERGE selection, model training and cut-off determining. External validation cohort was used for model validation and further biological function research. cfChIP-seq, cell-free chromatin immunoprecipitation sequencing; cfRRBS, cell-free reduced representation bisulphite sequencing; lpWGS, low-pass whole-genome sequencing; NDR, nucleosome-depleted regions; MERGE, multi-epigenetically regulated genes; BN, benign nodules; HC, healthy controls.

A total of 376 participants were included in this study for model construction (AHHU; n = 191) and validation (PUMCH; n = 185) (Figure 1B). The training cohort included 97 subjects with malignant nodules, 14 with benign nodules and 80 healthy controls, whereas the validation cohort comprised 114 subjects with malignant nodules, 19 with benign nodules and 52 healthy controls. We then stratified all participants into cancer (malignant) and non-cancer (benign and healthy) groups for subsequent analysis. Among those with lung cancer, the training set included 60 (61.9%) subjects in stage I and 20 (20.6%) in stage II, while the validation set comprised 24 (23.1%) subjects in stage 0, 61 (58.7%) in stage I and 2 (1.9%) in stage II. The detailed clinical characteristics of participants in each cohort were summarised in Tables S1 and S2.

3.2 Multiple cell-free epigenomes synergistically affect fragmentomic features

We first depicted the relationships among multiple cell-free epigenomes. Since H3K4me3 modification, CpG methylation and nucleosome depletion by transcription factor binding are all closely related to gene expression,^{22, 41} we focused on gene-level analysis. We calculated the average epigenomic features in cancer and non-cancer samples from the training set. All genes were ranked by their H3K4me3 levels, from highest to lowest, and grouped into 100 percentiles. The results showed that as the H3K4me3 level of a gene increased, the DNA methylation and NDR occupancy near the promoter exhibited a consistent decrease (Figure 2A). Our cell-free epigenomes analysis aligned with cellular studies showing that highly expressed genes exhibit lower promoter methylation and stronger nucleosome depletion (i.e., lower NDR scores).⁴² This clear trend was confirmed in cancer samples as well (Figure S1A).

The dynamic changes in fragmentomics arise from the behaviour of various DNases on different chromatin states during the cell death process, reflecting the influence of epigenetic regulation before cell death.⁴³ To elucidate the impact of cell-free epigenomes on fragmentomics, we computed fragment size and end-motifs within gene bodies and their 1 kb upstream regions. We included the 1 kb upstream region due to the distinct characteristics observed in the promoter region's fragmentome. In both H3K4me3-enriched cfDNA and whole-genome cfDNA, promoter region fragments were found to be shorter than those in the background genome, showing a noticeable left shift in the distribution plot (Figure S2A-D). This underscored the significance of analysing the upstream region of genes.

After building a single-gene-resolution fragment feature matrix, we profiled these features according to the ranking of genes determined by their H3K4me3 levels. In terms of fragment size, we observed that activated genes marked by cell-free epigenomes (with higher H3K4me3 levels, lower DNA methylation and reduced NDR occupancy) tend to have shorter fragment sizes. For genes with higher upregulated rank, mononucleosomes showed more fragments on the left side of peak (<167 bp) and fewer on the right, while dinucleosomes showed both peak's left enrichment and a gradual leftward shift of peak's summit (Figure 2B). This distribution dynamic indicated that epigenetically upregulated regions undergo more intense digestion for both mononucleosomes and dinucleosomes. Additionally, the observed dinucleosomes peak's summit shift in promoters and activated genes implied a shorter nucleosome spacing. This aligns with previous findings that open chromatin and transcription regions are associated with more intense degradation and shorter nucleosome repeat lengths.⁷

For fragment end motifs, we performed entropy calculation and F-profiles deconvolution analyses³² for each gene in our lpWGS data. This approach allowed us to understand the potential origins during nucleosome release. For epigenetically upregulated genes, the fragments showed a more chaotic motif pattern and significantly fewer DNase1L3 and DFFB-contributed motifs (Figure 2C), suggesting that upregulated genes were strongly related to the disorganisation of DNA cleavage processes during cell death, making the DNA more likely to form amorphous end motifs.

In lung cancer samples, fragment size and end motifs correlated with H3K4me3 levels in a trend similar to that observed in non-cancer samples, highlighting the widespread impact of epigenomes on fragmentomics (Figure S1B,C). Moreover, there was no apparent difference in the distribution of fragment size proportions among cancerous, benign and healthy samples (Figure S2E). Therefore, a detailed analysis of fragmentomics at the specific gene level, rather than at the whole genome scale, may provide more valuable information for the accurate detection of lung cancer.

3.3 Cancer-derived fragmentomic changes enriched in epigenetically dysregulated gene hotspots

Recognising that dynamic changes in fragmentomic features are marked by cell-free epigenomes, we focused on capturing the subtle signals of lung cancer at a high-resolution scale and exploring their relationship with these epigenomic markers. By combining all non-cancer and cancer samples in the training set, we were able to analyse fragmentomic features in bins at near-nucleosome resolution (500 bp). Indeed, fragment size and end motif features showed fluctuations around the H3K4me3 peak, CpG islands and regions of open chromatin (Figure 3A).

Surprisingly, while fragmentomics exhibited minimal cancer-specific alterations in the majority of genomic regions, significant changes nearly perfectly matched the positions of these multi-epigenetic modifications. For example, on chromosome 4 at q21.22, significant changes (>2σ) appeared at the promoters of the HNRNPD and HNRNPDL genes, aligning with cfChIP-seq H3K4me3 peak (Figure 3A). However, nearby regions such as the TMEM150C gene and intergenic areas showed very few cancer-derived changes. We observed a tendency for shorter fragments in the HNRNPD gene region in cancer samples, but not in TMEM150C (Figure 3B). Moreover, HNRNPD was functionally active in cancer samples, characterised by cell-free epigenomes with upregulated H3K4me3 levels, lower DNA methylation and reduced NDR occupancy (Figure 3C). HNRNPD and its similar gene HNRNPDL have been reported to be strongly upregulated in various cancers, including lung cancer, and play key roles in inducing tumour growth and metastasis,⁴⁴ while TMEM150C is not well studied in cancer biology. These results together suggested that cancer-specific alterations in fragmentomics were more enriched in genes regulated by epigenetic modifications, rather than being evenly distributed across all genomic regions.

To further investigate these relationships across all genes, we utilised a single-gene level multi-omics features matrix to conduct non-parametric statistical tests, comparing cancer samples with non-cancer samples in the training set. This allowed us to quantify the direction and magnitude of cancer-derived changes by calculating a gene-wide Z-score, estimated using the Mann–Whitney U test. We then ranked all genes based on their H3K4me3 changes in cancer and mapped the corresponding fragmentomic changes onto that ranking (Figure 3D).

For cancer-upregulated genes with increased H3K4me3 levels, we observed a relatively higher distribution of fragments in the 50 to 160 bp range, along with a greater prevalence of cancer-enriched motifs, DNase1 origin motifs and non-DNase C-end motifs (Figure 3D). In contrast, cancer-downregulated genes with decreased H3K4me3 levels showed a notable increase in size distribution around 200 bp and an abundance of cancer-depleted motifs (Figure 3D). Statistical comparisons between non-cancer and cancer samples at the top 1% of genes with H3K4me3 upregulation, downregulation and minimal changes revealed distinct differences in fragmentomic features. (Figures 3E and S3A,B). Similar statistical analyses were performed for cfDNA methylation and NDR rank, confirming these trends (Figure S3B).

These results support our hypothesis that cancer-specific fragmentomic features are enriched in hotspots near gene promoters and gene bodies, which are epigenetically dysregulated and marked by cell-free epigenomes. This insight provides a more targeted approach for isolating cancer-derived signals from background noise.

3.4 Identification and characterisation of MERGEs in lung cancer

Our comprehensive analysis of cell-free epigenomics revealed that cancer-derived epigenetically regulated genes exhibited more significant fragmentomic characteristics compared with the whole genome, potentially enhancing the precision of lpWGS-based early cancer screening methods.

To identify these genes, we conducted statistical tests on histone modification, DNA methylation and chromatin accessibility signals across three comparative groups: cancer versus non-cancer, cancer versus healthy and cancer versus benign (Figure S4). Integrating the results from these comparisons, we identified a total of 609 MERGEs. Specifically, 245 were identified in the cancer versus non-cancer group, 323 in the cancer versus healthy group, and 187 in the cancer versus benign group (Figure 4A). Among these genes, 217 were co-regulated by H3K4me3 and NDR, 185 by H3K4me3 and methylation and 180 by methylation and NDR. Additionally, 27 genes exhibited abnormal regulation by all three epigenetic modifications (Figure 4B and Tables S3–S6).

Functional annotation of MERGEs using multiple databases revealed significant enrichment in GTPase-mediated signal transduction (GO: 0007264, padj = .001; Reactome: R-HSA-9013148, padj = .016) and the EGF/EGFR signalling pathway (WikiPathway: WP437, padj = .068) (Figure 4C). Key genes within these pathways, including CAV2, AP2A1 and PRKCI, exhibited significant multi-omics epigenetic regulatory associations (Figure 4D). These pathways are well-established contributors to tumour invasion, metastasis and metabolic reprogramming,^{45, 46} underscoring the potential role of epigenetic dysregulation in driving their aberrant activation during early-stage lung cancer.

Motif enrichment analysis within ±1 kb of MERGEs’ TSS revealed significant enrichment of binding motifs for C2H2 zinc finger DNA-binding proteins, particularly members of the Sp/KLF family (Figure 4E). Corroborating these findings, genetic perturbation similarity analysis demonstrated enrichment of MERGEs regulated by C2H2 zinc finger transcription factors, including SP2, KLF5/6 and CTCF (Figure 4F). These converging lines of evidence suggested that MERGEs are likely subject to transcriptional regulation by the Sp/KLF family through epigenetic mechanisms during early-stage lung cancer development.

3.5 Fragmentomics-based ensemble MERGE model enables accurate detection of lung cancer

The cfDNA fragmentomic features include break point motifs (BPM), MERGE-based BPM, end motifs (EDM), MERGE-based EDM, FSR, MERGE-based FSR and FSD were screened for subsequent model construction. First, the single feature-based classification performance was compared for BPM, EDM and FSR across the whole genome and the MERGE regions. Overall, the corresponding MERGE-based models all displayed superior discrimination in the training set as indicated by higher AUCs (Figures 5A and S5A). Moreover, considering that BPM and EDM were similar dimensional features and BPM had better predicative ability, MERGE-based BPM was retained. Finally, the three single base classifiers, MERGE-based BPM, MERGE-based FSR and FSD, were integrated to construct the MERGE-based ensemble model.

Based on the 3× coverage lpWGS data, the MERGE-based ensemble model outperformed the three base classifiers and the whole genome-based ensemble model, achieving an AUC of 0.94 (95% CI: 0.90–0.97) in the training set (Figures 5B and S5B). The locked ensemble model was then independently verified in the external validation set and reached an AUC of 0.94 (95% CI: 0.90–0.98) (Figure 5C). Furthermore, MERGE scores of lung cancer subjects were found to be significantly higher than non-cancer controls in both the training set and the validation set (Figure 5D and Table S7). Using 0.5 as a cutoff, the ensemble model achieved a sensitivity of 88.7% at the 85.1% specificity in the training set and a sensitivity of 90.4% at the 83.1% specificity in the external validation set, respectively (Table 1). It is especially noteworthy that the ensemble model maintained its sensitivity even in stage I cases (86.7% in the training set, 95.1% in the validation set and 90.9% in the combined set) (Figure 5E). Additionally, the model performance across different pathological and radiological subgroups was further examined (Figure 5F and Table S8). The ensemble model correctly classified 96.2% of MIA and 75% of AIS cases in the validation set and seven out of ten AAH subjects had MERGE scores above the cut-off value.

TABLE 1. The diagnostic performance of the MERGE-based ensemble model in the training, validation and combined sets.

	Training set n = 191	Validation set n = 175	Combined set n = 366
All
Sensitivity	0.887(0.808–0.935)	0.904(0.832–0.947)	0.896(0.846–0.931)
Specificity	0.851(0.765–0.909)	0.831(0.727–0.900)	0.842(0.779–0.890)
PPV	0.86(0.779–0.915)	0.887(0.812–0.934)	0.874(0.821–0.912)
NPV	0.879(0.796–0.931)	0.855(0.753–0.919)	0.869(0.808–0.913)
Stage l
Sensitivity	0.867(0.758–0.931)	0.951(0.865–0.983)	0.909(0.845–0.948)
Specificity	0.851(0.765–0.909)	0.831(0.727–0.900)	0.842(0.779–0.890)
PPV	0.788(0.675–0.869)	0.829(0.724–0.899)	0.809(0.735–0.866)
NPV	0.909(0.831–0.953)	0.952(0.867–0.983)	0.927(0.873–0.959)
MIA
Sensitivity	1(0.806–1)	0.962(0.811–0.993)	0.976(0.877–0.996)
Specificity	0.851(0.765–0.909)	0.831(0.727–0.900)	0.842(0.779–0.890)
PPV	0.533(0.361–0.698)	0.676(0.515–0.804)	0.612(0.492–0.72)
NPV	1(0.954–1)	0.983(0.911–0.997)	0.993(0.961–0.999)
<1 cm
Sensitivity	0.778(0.453–0.937)	0.765(0.6–0.876)	0.767(0.623–0.868)
Specificity	0.851(0.765–0.909)	0.831(0.727–0.900)	0.842(0.779–0.890)
PPV	0.333(0.172–0.546)	0.684(0.525–0.809)	0.559(0.433–0.678)
NPV	0.976(0.915–0.993)	0.881(0.782–0.938)	0.933(0.881–0.963)
pGGO
Sensitivity	0.815(0.633–0.918)	0.864(0.733–0.936)	0.845(0.743–0.911)
Specificity	0.851(0.765–0.909)	0.831(0.727–0.900)	0.842(0.779–0.890)
PPV	0.611(0.449–0.752)	0.76(0.626–0.857)	0.698(0.594–0.785)
NPV	0.941(0.87–0.975)	0.908(0.813–0.957)	0.927(0.873–0.959)

Abbreviations: MIA, minimally invasive adenocarcinoma; NPV, negative predictive value; pGGO, pure ground-glass opacity; PPV, positive predictive value.

When applying the same cutoff score of 0.50, the ensemble model achieved an AUC of 0.816 (95% CI: 0.725–0.907) and a sensitivity of 89.6% at 60.6% specificity for distinguishing lung cancer from benign lung nodules (Figure 5G). This suggested that the ensemble model can also assist in determining the nature of suspicious malignant lung nodules detected by LDCT screening. Moreover, the model score significantly increased with later stages of disease, more invasive phenotype, larger tumour diameter and higher consolidation/tumour ratio (Figure 5H). Other baseline clinical characteristics, including gender, age, smoking status and heredity, did not exhibit any statistically significant effects on the model scores (Figure S6A). Taken together, these analyses suggested that the fragmentomics-based ensemble model harboured explicit biological plausibility and was suitable in different clinical scenarios.

In the intended-use population with a prevalence of malignant nodules at 0.107% in the age group of 40–74 years in China,³⁶ the NPV of our model reached as high as 99.9%. This indicated that the ensemble model would not only enhance lung cancer detection but also reduce unnecessary procedures. Furthermore, an adapted interception model was employed to evaluate the clinical value of the ensemble model in real-world practice.³⁵ Under the most aggressive dwell time scenario, the interception model showed that annual screening would shift 81% of advanced lung cancers to early stages at initial diagnosis and improve the 5-year overall survival rate from 38.80 to 67.47% (Figure S6B and Table S9).

3.6 Epigenetic patterns of MERGEs mirror lung adenocarcinoma progression

The score of the MERGE-based model increased progressively with the aggressiveness of lung adenocarcinoma (LUAD), suggesting that MERGE may contribute to its development. Considering the evolutionary trajectories from AAH to IAC and the significant impact of epigenetic regulation on tumour development, the associations between H3K4me3 modification patterns of MERGE and LUAD subtypes were further explored (Figure 6A). Unsupervised clustering of H3K4me3 profiles revealed distinct patterns among LUAD subtypes (Figure 6B). Notably, aberrant H3K4me3 modifications were detectable as early as the AAH stage, potentially indicating their role in early carcinogenesis. As the disease progressed from preneoplastic AAH to preinvasive AIS, microinvasive MIA and finally IAC, we observed a gradual increase in correlation and a concurrent decrease in Euclidean distance across the H3K4me3 profiles of different subtypes (Figure 6C,D). Of these, MIA was found to have the closest relationship with IAC (correlation: 0.96, distance: 89.88), highlighting their ability to breach the basement membrane and acquire metastatic potential.

In addition, analysis of sample-to-sample Euclidean distances between LUAD subtypes and healthy controls revealed a progressive increase in epigenetic divergence from AIS to IAC (Figure 6E). This trend may quantitatively demonstrate the cumulative effect of epigenetic alterations during tumour progression. Remarkably, the median Euclidean distance for healthy-AAH comparisons exceeded that of AIS and approached MIA, potentially reflecting the heterogeneity of AAH samples. Examination of H3K4me3 enrichment patterns for five selected MERGEs (KDM4C, OXSR1, RAD17, RUNX1 and NPR3) across pathological stages revealed distinct profiles between early (AAH and AIS) and later (MIA and IAC) stages (Figures 6F and S7). This suggests their potential involvement in tumour progression and invasion.

4 DISCUSSION

Our study primarily elucidated the regulatory relationships between epigenetic modifications and their impact on fragmentomic features in real-world clinical cohorts. The identification of epigenetically regulated genes formed the critical foundation for the cfDNA fragmentomics-based machine learning model. More importantly, the ensemble model achieved excellent sensitivity for patients with MIA and stage I disease when validated on an independent external cohort, underscoring its great potential for clinical translation in the early detection of lung cancer.

Lung cancer, particularly early-stage non-small cell lung cancer (NSCLC), often exhibits one of the lowest ratios of ctDNA in cfDNA compared with other cancer types.⁴⁷ While previous studies have shown that ctDNA is enriched in shorter fragments within mono- and di-nucleosomes,^{48, 49} it is important to note that significant enrichment of cfDNA fragments shorter than 150 bp was not observed in our lung cancer cohort. This underscores the necessity for even more detailed analyses to detect subtle differences.

Our finding shows that the subtle differences appear at near-nucleosome scale, much finer than previous genome-level, arm-level or 5Mb bin analyses.^{12, 50} Recent papers have also explored gene-level fragmentomics and their relationship to tissue gene expression.^{12, 15-17} However, knowledge derived from tissue samples does not always translate seamlessly to blood due to the complex processes of digestion and mixing.⁵¹ Therefore, integrating fragmentomic data with multi-modal epigenomic information provides an innovative strategy for boosting the diagnostic capabilities of lung cancer. The combination of cfChIP-seq, cfRRBS and lpWGS in this study was carefully chosen to accurately correspond to the three most extensively studied areas of epigenomics, including DNA methylation, histone modifications and chromatin accessibility.⁵² These three epigenomic layers target the same molecular entity, the cell-free nucleosome complex, offering a unique opportunity to capture the complementary and interconnected aspects of gene regulation and chromatin dynamics in cfDNA biology. This integration enables the exploration of cancer-specific signals with unprecedented depth and precision.

We found that cancer-upregulated genes marked by cell-free epigenomes contribute more significantly to ctDNA fragmentomic identity. These genes exhibited a substantially higher proportion of short fragments, DNase1 origin motifs and cancer-specific motifs compared with downregulated genes, suggesting that they undergo stronger digestion and are closely associated with cancer-related DNase activities. Building on previous research on the stepwise process of cfDNA fragmentation,⁴³ our findings imply that the characteristic features of ctDNA are primarily shaped by extracellularly circulating DNASE1.

We exploited the opposing effects of H3K4me3 with DNA methylation⁵³ and chromatin accessibility¹⁵ on gene expression in our method. Our results confirmed that cfDNA H3K4me3 modifications inversely correlate with DNA methylation and NDR. DNA methylation serves as a stable epigenetic marker, rich in lineage information but limited in capturing transient expression changes.⁵⁴ In contrast, H3K4me3 reflects more dynamic regulatory patterns, not always aligning with methylation or NDR. Thus, our analysis revealed a limited number of genes consistently exhibiting the expected regulatory relationships across all three omics layers in the MERGEs.

The H3K4me3 cfChIP-seq analysis revealed a progressive decline in genes activity across LUAD stages (from AAH to IAC), corresponding with reports of increasing promoter methylation abnormalities during precancerous progression.⁵⁵ Early global hypomethylation may drive chromosomal activity and tumour immune microenvironment changes.^{56, 57} MERGE revealed candidate genes, including novel Nkx-family activators Nkx6.2, is likely reinforcing NKX motif activity early in tumour progression.⁵⁸ Additionally, RUNX1 alterations were observed significantly during AAH to IAC progression, suggesting RUNX1/2 activation initiates ECM protein expression, fostering an EMT niche in LUAD.⁵⁸ This will help advance our understanding of lung cancer stratification and facilitate the discovery of systemic biomarkers for anti-tumour drugs. Our study is the first to investigate the progression of LUAD through dynamic H3K4me3 changes in cfDNA, highlighting how these changes, in conjunction with genomic alterations, may drive cancer progression.^{59, 60} Therefore, H3K4me3 profiling holds promise for predicting disease progression, and our technique could complement existing cancer detection methods, particularly for early-stage cancer with low tumour burden.

Despite progress in early detection of lung cancer through cfDNA mutational analysis, its clinical application remains challenging due to limited sensitivity.⁶¹ Several studies using cfDNA methylation or fragmentation-based approaches have shown only mild improvements in sensitivity, which are still insufficient for clinical use, particularly for stage I cases that benefit most from early detection.^{18, 19, 50} Both our previous study and several others have suggested that sensitivity typically hovers around 60% when modelling with single-omics data.^{12, 62} One of the key reasons for that is the inherent noise and fluctuations in single-marker analyses, a challenge that becomes particularly pronounced due to the low cfDNA abundance in early-stage cancer.⁵ The cleavage and fragmentation of cfDNA are influenced by nucleosome organisation, chromatin accessibility and disease status, rather than being random processes.^{7, 10, 28, 50} Our study leveraged their synergistic roles in gene regulation, with H3K4me3 and DNA methylation having opposing effects on gene expression⁵³ and influencing nucleosome stability.^{15, 46} Therefore, the integration of epigenetic and fragmentomic characteristics can exclude possible confounding factors and identify cancer-related features. Previous studies have confirmed that specific genomic features of cfDNA can provide more precise information about cancer compared with the whole genome.⁷ In this study, the superior discrimination in MERGE-based models and the excellent sensitivity of the ensemble model in early-stage lung cancer together illustrated the rationality and importance of our modelling strategy.

Considering the insidious onset and substantial disparities in prognosis between early-stage and advanced lung cancer, the efficacy of detection model in early-stage lung cancer should be particularly concerning. At a specificity of 83.1%, our model achieved a sensitivity of 95.1% for stage I lung cancer and 96.2% for MIA in the validation set, outperforming other reported early detection models.^63-65 Notably, the validation set included a higher proportion of early-stage disease, with 81.8% of patients at stages 0 and I compared with 61.9% in the training set, demonstrating the broad generalisability of the model. Additionally, AIS and AAH were excluded from almost all similar studies, likely due to a lack of detectable signals in the blood.¹⁸ It is noteworthy that 75% of AIS cases were correctly classified and 70% of AAH cases scored above the cutoff in the validation set. More intriguingly, neither AAH nor AIS were included in the training set, suggesting that the findings were objective and our model could offer critical insights into the very early events of tumourigenesis. The estimated 10-year postoperative disease-specific survival rates for both AIS and MIA cases were 100%.⁸ Furthermore, two recently published large randomised controlled studies have provided high-level evidence for the non-inferior outcomes of sublobar resection compared with lobectomy for small-sized lung cancer.^{66, 67} Taken together, the combination of our model and advances in surgical techniques collectively provided an avenue to achieve a radical cure for early-stage lung cancer without excessive loss of lung function.

Significant reductions in cost, along with increased accessibility and standardisation are prerequisites for the wide application of early detection model in clinical practice. Some groups have attempted to improve the model performance by incorporating features from different sequencing platforms or methods, resulting in a considerable increase in costs.^{28, 65} Similarly, integrating radiological features from CT images identified by radiologists in several studies was manpower-consuming and experience dependent.^{68, 69} In marked contrast, epigenomic sequencing data in this study were employed to ensure the rationality of the proposed model, while the final fragmentomics-based model used only the lpWGS data for cost effectiveness and convenience. Therefore, our model provided a feasible solution for implementing early detection of lung cancer in clinical practice.

Some limitations existed in the present study. Although we identified the regulatory relationships between epigenetic modifications and their impact on fragmentation features, the underlying mechanism remains undefined. Future in-depth molecular biological studies are required to fully address this question. Since all participants in the current study were Asian, the generalisability of the model to non-Asian populations remains uncertain due to the distinct mutational landscape observed in female non-smoker Asian patients with LUAD. Further studies with more diverse tumour phenotypes and non-Asian populations are needed. Given the profound differences between small cell lung cancer and NSCLC in terms of biological features and clinical management, the role of cfDNA fragmentomics-based machine learning model in their differential diagnosis warrants further investigation to augment its clinical utility. Moreover, sample bias cannot be excluded owing to the relatively small sample size of AAH and AIS cases. Our findings of this study should be confirmed in larger studies in this patient population.

In conclusion, we demonstrated that lung cancer-related changes in fragmentomic features were unevenly distributed across the genome, with significant enrichment in epigenetically regulated regions of specific genes. The cfDNA fragmentomics-based model exhibited superior detection capacity, particularly for lung cancer at the very early stage. Our study provides an accurate and cost-effective approach for the early detection of lung cancer and paves the way to improved patient outcomes.

AUTHOR CONTRIBUTIONS

Hefei Li, Xueguang Sun and Naixin Liang designed and supervised the study. Yadong Wang, Qiang Guo, Zhicheng Huang, Haibo Wang, Bowen Li, Daoyun Wang, Bin Zhou, Chao Guo, Yuan Xu, Yang Song, Zhibo Zheng, Zhongxing Bing, Hefei Li, Xiaoqing Yu, Ka Luk Fung, Heqing Xu, Jianhong Shi, Meng Chen and Shanqing Li contributed to clinical information collection. Liyang Song and Chen Zhu performed the experiment. Yadong Wang, Liyang Song, Fei Zhao and Tiantian Gu performed technical development, data analysis and figure preparation. Haoxuan Jin, Shiyuan Tong, Sibo Zhu and Chen Zhu provided technical support. Liyang Song, Fei Zhao, Yadong Wang and Tiantian Gu drafted the manuscript. Yadong Wang, Liyang Song, Fei Zhao, Tiantian Gu, Haoxuan Jin, Shiyuan Tong, Sibo Zhu, Jinlei Song and Jing Liu contributed to the revision. Liyang Song arranged figures and drew illustrations. Tiantian Gu and Shuai Hong contributed to project administration. All authors had full access to all the data in the study, discussed the results and accepted the responsibility to submit the final manuscript for publication. All authors have read and approved the final version of the manuscript.

ACKNOWLEDGEMENTS

We gratefully thank the patients, healthy volunteers and their families for participating in this study. We also sincerely thank the Clinical Biobank (ISO 20387) at Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, for their assistance in sample collection. Funding for this study was supported by the National High Level Hospital Clinical Research Funding (2022-PUMCHB-011, 2022-PUMCH-A-188), Chinese Society of Clinical Oncology fund (Y-MSDPU2021-0190) and Shanghai Weihe Medical Laboratory Co., Ltd.

CONFLICT OF INTEREST STATEMENT

L. S., F. Z., T. G., S. H., H. J., S. T., S. Z., C. Z., J. S., J. L. and X. S. are employees of Shanghai Weihe Medical Laboratory Co., Ltd, Shanghai, China. All other authors have declared no conflicts of interest.

CONSENT FOR PUBLICATION

Not applicable.

Open Research

DATA AVAILABILITY STATEMENT

The data supporting the findings of this study have been deposited in Genome Sequence Archive (Genome Sequence Archive for Human in BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences) under accession number PRJCA033843. This study does not employ any new algorithms, and the code used for statistical analysis is available on request from the corresponding author Naixin Liang.

ETHICS STATEMENT AND CONSENT TO PARTICIPATE

The study was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of Affiliated Hospital of Hebei University (Approval No. HDFYLL-IIT-023-005) and Peking Union Medical College Hospital (Approval No. I-23PJ1205). Written informed consent was obtained from all enrolled participants prior participation.

Supporting Information

REFERENCES

1Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024; 74(3): 229-263.
10.3322/caac.21834
PubMed Web of Science® Google Scholar
2Goldstraw P, Chansky K, Crowley J, et al. The IASLC lung cancer staging project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J Thorac Oncol. 2016; 11(1): 39-51.
10.1016/j.jtho.2015.09.009
PubMed Web of Science® Google Scholar
3Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011; 365(5): 395-409.
10.1056/NEJMoa1102873
PubMed Web of Science® Google Scholar
4Chen K, He Y, Wang W, et al. Development of new techniques and clinical applications of liquid biopsy in lung cancer management. Sci Bull (Beijing). 2024; 69(10): 1556-1568.
10.1016/j.scib.2024.03.062
CAS PubMed Web of Science® Google Scholar
5Stejskal P, Goodarzi H, Srovnal J, et al. Circulating tumor nucleic acids: biology, release mechanisms, and clinical relevance. Mol Cancer. 2023; 22(1): 15.
10.1186/s12943-022-01710-w
CAS PubMed Web of Science® Google Scholar
6Zhang K, Fu R, Liu R, et al. Circulating cell-free DNA-based multi-cancer early detection. Trends Cancer. 2024; 10(2): 161-174.
10.1016/j.trecan.2023.08.010
CAS PubMed Web of Science® Google Scholar
7Snyder MW, Kircher M, Hill AJ, et al. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell. 2016; 164(1-2): 57-68.
10.1016/j.cell.2015.11.050
CAS PubMed Web of Science® Google Scholar
8Sun K, Jiang P, Cheng SH, et al. Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin. Genome Res. 2019; 29(3): 418-427.
10.1101/gr.242719.118
CAS PubMed Web of Science® Google Scholar
9Serpas L, Chan RWY, Jiang P, et al. Dnase1l3 deletion causes aberrations in length and end-motif frequencies in plasma DNA. Proc Natl Acad Sci USA. 2019; 116(2): 641-649.
10.1073/pnas.1815031116
CAS PubMed Web of Science® Google Scholar
10Jiang P, Sun K, Peng W, et al. Plasma DNA end-motif profiling as a fragmentomic marker in cancer, pregnancy, and transplantation. Cancer Discov. 2020; 10(5): 664-673.
10.1158/2159-8290.CD-19-0622
CAS PubMed Web of Science® Google Scholar
11Zhu D, Wang H, Wu W, et al. Circulating cell-free DNA fragmentation is a stepwise and conserved process linked to apoptosis. BMC Biol. 2023; 21(1): 253.
10.1186/s12915-023-01752-6
CAS PubMed Web of Science® Google Scholar
12Mathios D, Johansen JS, Cristiano S, et al. Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat Commun. 2021; 12(1): 5060.
10.1038/s41467-021-24994-w
CAS PubMed Web of Science® Google Scholar
13Kim J, Hong SP, Lee S, et al. Multidimensional fragmentomic profiling of cell-free DNA released from patient-derived organoids. Hum Genomics. 2023; 17(1): 96.
10.1186/s40246-023-00533-0
CAS PubMed Web of Science® Google Scholar
14Yotsukura M, Asamura H, Motoi N, et al. Long-term prognosis of patients with resected adenocarcinoma in situ and minimally invasive adenocarcinoma of the lung. J Thorac Oncol. 2021; 16(8): 1312-1320.
10.1016/j.jtho.2021.04.007
PubMed Web of Science® Google Scholar
15Esfahani MS, Hamilton EG, Mehrmohamadi M, et al. Inferring gene expression from cell-free DNA fragmentation profiles. Nat Biotechnol. 2022; 40(4): 585-597.
10.1038/s41587-022-01222-4
CAS PubMed Web of Science® Google Scholar
16Stanley KE, Jatsenko T, Tuveri S, et al. Cell type signatures in cell-free DNA fragmentation profiles reveal disease biology. Nat Commun. 2024; 15(1): 2220.
10.1038/s41467-024-46435-0
CAS PubMed Web of Science® Google Scholar
17Maansson CT, Thomsen LS, Meldgaard P, et al. Integration of cell-free DNA end motifs and fragment lengths can identify active genes in liquid biopsies. Int J Mol Sci. 2024; 25(2): 1243.
10.3390/ijms25021243
CAS PubMed Web of Science® Google Scholar
18Liang W, Zhao Y, Huang W, et al. Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA (ctDNA). Theranostics. 2019; 9(7): 2056-2070.
10.7150/thno.28119
CAS PubMed Web of Science® Google Scholar
19Liu MC, Oxnard GR, Klein EA, et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020; 31(6): 745-759.
10.1016/j.annonc.2020.02.011
CAS PubMed Web of Science® Google Scholar
20Sadeh R, Sharkia I, Fialkoff G, et al. ChIP-seq of plasma cell-free nucleosomes identifies gene expression programs of the cells of origin. Nat Biotechnol. 2021; 39(5): 586-598.
10.1038/s41587-020-00775-6
CAS PubMed Web of Science® Google Scholar
21Baca SC, Seo JH, Davidsohn MP, et al. Liquid biopsy epigenomic profiling for cancer subtyping. Nat Med. 2023; 29(11): 2737-2741.
10.1038/s41591-023-02605-z
CAS PubMed Web of Science® Google Scholar
22Zhu G, Guo YA, Ho D, et al. Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden. Nat Commun. 2021; 12(1): 2229.
10.1038/s41467-021-22463-y
CAS PubMed Web of Science® Google Scholar
23Parreno V, Loubiere V, Schuettengruber B, et al. Transient loss of polycomb components induces an epigenetic cancer fate. Nature. 2024; 629(8012): 688-696.
10.1038/s41586-024-07328-w
CAS PubMed Web of Science® Google Scholar
24Fialkoff G, Takahashi N, Sharkia I, et al. Subtyping of small cell lung cancer using plasma cell-free nucleosomes. bioRxiv. 2022. 2022.2006.2024.497386.
Google Scholar
25Trier Maansson C, Meldgaard P, Stougaard M, et al. Cell-free chromatin immunoprecipitation can determine tumor gene expression in lung cancer patients. Mol Oncol. 2023; 17(5): 722-736.
10.1002/1878-0261.13394
CAS PubMed Web of Science® Google Scholar
26Brouwer I, Kerklingh E, van Leeuwen F, et al. Dynamic epistasis analysis reveals how chromatin remodeling regulates transcriptional bursting. Nat Struct Mol Biol. 2023; 30(5): 692-702.
10.1038/s41594-023-00981-1
CAS PubMed Web of Science® Google Scholar
27Tolstorukov MY, Sansam CG, Lu P, et al. Swi/Snf chromatin remodeling/tumor suppressor complex establishes nucleosome occupancy at target promoters. Proc Natl Acad Sci USA. 2013; 110(25): 10165-10170.
10.1073/pnas.1302209110
CAS PubMed Web of Science® Google Scholar
28Chen L, Abou-Alfa GK, Zheng B, et al. Genome-scale profiling of circulating cell-free DNA signatures for early detection of hepatocellular carcinoma in cirrhotic patients. Cell Res. 2021; 31(5): 589-592.
10.1038/s41422-020-00457-7
CAS PubMed Web of Science® Google Scholar
29Pham TMQ, Phan TH, Jasmine TX, et al. Multimodal analysis of genome-wide methylation, copy number aberrations, and end motif signatures enhances detection of early-stage breast cancer. Front Oncol. 2023; 13:1127086.
10.3389/fonc.2023.1127086
CAS PubMed Web of Science® Google Scholar
30Kakinuma R, Noguchi M, Ashizawa K, et al. Natural history of pulmonary subsolid nodules: a prospective multicenter study. J Thorac Oncol. 2016; 11(7): 1012-1028.
10.1016/j.jtho.2016.04.006
PubMed Web of Science® Google Scholar
31Stackpole ML, Zeng W, Li S, et al. Cost-effective methylome sequencing of cell-free DNA for accurately detecting and locating cancer. Nat Commun. 2022; 13(1): 5566.
10.1038/s41467-022-32995-6
CAS PubMed Web of Science® Google Scholar
32Zhou Z, Ma ML, Chan RWY, et al. Fragmentation landscape of cell-free DNA revealed by deconvolutional analysis of end motifs. Proc Natl Acad Sci USA. 2023; 120(17):e2220982120.
10.1073/pnas.2220982120
CAS PubMed Web of Science® Google Scholar
33Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb). 2021; 2(3):100141.
10.1016/j.xinn.2021.100141
CAS PubMed Web of Science® Google Scholar
34Guo S, Xu Z, Dong X, et al. GPSAdb: a comprehensive web resource for interactive exploration of genetic perturbation RNA-seq datasets. Nucleic Acids Res. 2023; 51(D1): D964-d968.
10.1093/nar/gkac1066
CAS PubMed Web of Science® Google Scholar
35Hubbell E, Clarke CA, Aravanis AM, et al. Modeled reductions in late-stage cancer with a multi-cancer early detection test. Cancer Epidemiol Biomarkers Prev. 2021; 30(3): 460-468.
10.1158/1055-9965.EPI-20-1134
CAS PubMed Web of Science® Google Scholar
36Han B, Zheng R, Zeng H, et al. Cancer incidence and mortality in China, 2022. J Natl Cancer Cent. 2024; 4(1): 47-53.
10.1016/j.jncc.2024.01.006
PubMed Web of Science® Google Scholar
37Zeng H, Ran X, An L, et al. Disparities in stage at diagnosis for five common cancers in China: a multicentre, hospital-based, observational study. Lancet Public Health. 2021; 6(12): e877-e887.
10.1016/S2468-2667(21)00157-2
PubMed Web of Science® Google Scholar
38He S, Li H, Cao M, et al. Survival of 7,311 lung cancer patients by pathological stage and histological classification: a multicenter hospital-based study in China. Transl Lung Cancer Res. 2022; 11(8): 1591-1605.
10.21037/tlcr-22-240
PubMed Web of Science® Google Scholar
39Lo YMD, Han DSC, Jiang P, et al. Epigenetics, fragmentomics, and topology of cell-free DNA in liquid biopsies. Science. 2021; 372(6538).
10.1126/science.aaw3616
Web of Science® Google Scholar
40Maluchenko NV, Nilov DK, Pushkarev SV, et al. Mechanisms of nucleosome reorganization by PARP1. Int J Mol Sci. 2021; 22(22).
10.3390/ijms222212127
Web of Science® Google Scholar
41Bradner JE, Hnisz D, Young RA. Transcriptional addiction in cancer. Cell. 2017; 168(4): 629-643.
10.1016/j.cell.2016.12.013
CAS PubMed Web of Science® Google Scholar
42Carter B, Zhao K. The epigenetic basis of cellular heterogeneity. Nat Rev Genet. 2021; 22(4): 235-250.
10.1038/s41576-020-00300-0
CAS PubMed Web of Science® Google Scholar
43Han DSC, Ni M, Chan RWY, et al. The biology of cell-free DNA fragmentation and the roles of DNASE1, DNASE1L3, and DFFB. Am J Hum Genet. 2020; 106(2): 202-214.
10.1016/j.ajhg.2020.01.008
CAS PubMed Web of Science® Google Scholar
44Tian XY, Li J, Liu TH, et al. The overexpression of AUF1 in colorectal cancer predicts a poor prognosis and promotes cancer progression by activating ERK and AKT pathways. Cancer Med. 2020; 9(22): 8612-8623.
10.1002/cam4.3464
CAS PubMed Web of Science® Google Scholar
45Clayton NS, Ridley AJ. Targeting Rho GTPase signaling networks in cancer. Front Cell Dev Biol. 2020; 8: 222.
10.3389/fcell.2020.00222
PubMed Web of Science® Google Scholar
46Wee P, Wang Z. Epidermal growth factor receptor cell proliferation signaling pathways. Cancers (Basel). 2017; 9(5): 52.
10.3390/cancers9050052
PubMed Web of Science® Google Scholar
47Fiala C, Diamandis EP. Utility of circulating tumor DNA in cancer diagnostics with emphasis on early detection. BMC Med. 2018; 16(1): 166.
10.1186/s12916-018-1157-9
CAS PubMed Web of Science® Google Scholar
48Markus H, Chandrananda D, Moore E, et al. Refined characterization of circulating tumor DNA through biological feature integration. Sci Rep. 2022; 12(1): 1928.
10.1038/s41598-022-05606-z
CAS PubMed Web of Science® Google Scholar
49Sanchez C, Roch B, Mazard T, et al. Circulating nuclear DNA structural features, origins, and complete size profile revealed by fragmentomics. JCI Insight. 2021; 6(7):e144561.
10.1172/jci.insight.144561
PubMed Web of Science® Google Scholar
50Cristiano S, Leal A, Phallen J, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019; 570(7761): 385-389.
10.1038/s41586-019-1272-6
CAS PubMed Web of Science® Google Scholar
51Oberhofer A, Bronkhorst AJ, Uhlig C, et al. Tracing the origin of cell-free DNA molecules through tissue-specific epigenetic signatures. Diagnostics (Basel). 2022; 12(8): 1834.
10.3390/diagnostics12081834
CAS PubMed Web of Science® Google Scholar
52Yang J, Xu J, Wang W, et al. Epigenetic regulation in the tumor microenvironment: molecular mechanisms and therapeutic targets. Signal Transduct Target Ther. 2023; 8(1): 210.
10.1038/s41392-023-01480-x
PubMed Web of Science® Google Scholar
53Fu K, Bonora G, Pellegrini M. Interactions between core histone marks and DNA methyltransferases predict DNA methylation patterns observed in human cells and tissues. Epigenetics. 2020; 15(3): 272-282.
10.1080/15592294.2019.1666649
PubMed Web of Science® Google Scholar
54Ginno PA, Gaidatzis D, Feldmann A, et al. A genome-scale map of DNA methylation turnover identifies site-specific dependencies of DNMT and TET activity. Nat Commun. 2020; 11(1): 2680.
10.1038/s41467-020-16354-x
CAS PubMed Web of Science® Google Scholar
55Hu X, Estecio MR, Chen R, et al. Evolution of DNA methylome from precancerous lesions to invasive lung adenocarcinomas. Nat Commun. 2021; 12(1): 687.
10.1038/s41467-021-20907-z
CAS PubMed Web of Science® Google Scholar
56Zhang Y, Fu F, Zhang Q, et al. Evolutionary proteogenomic landscape from pre-invasive to invasive lung adenocarcinoma. Cell Rep Med. 2024; 5(1):101358.
10.1016/j.xcrm.2023.101358
CAS PubMed Web of Science® Google Scholar
57Dejima H, Hu X, Chen R, et al. Immune evolution from preneoplasia to invasive lung adenocarcinomas and underlying molecular features. Nat Commun. 2021; 12(1): 2722.
10.1038/s41467-021-22890-x
CAS PubMed Web of Science® Google Scholar
58LaFave LM, Kartha VK, Ma S, et al. Epigenomic state transitions characterize tumor progression in mouse lung adenocarcinoma. Cancer Cell. 2020; 38(2): 212-228.e213.
10.1016/j.ccell.2020.06.006
CAS PubMed Web of Science® Google Scholar
59Haga Y, Sakamoto Y, Kajiya K, et al. Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma. Nat Commun. 2023; 14(1): 8375.
10.1038/s41467-023-43732-y
CAS PubMed Web of Science® Google Scholar
60Tan T, Shi P, Abbas MN, et al. Epigenetic modification regulates tumor progression and metastasis through EMT (Review). Int J Oncol. 2022; 60(6): 70.
10.3892/ijo.2022.5360
CAS PubMed Web of Science® Google Scholar
61Chabon JJ, Hamilton EG, Kurtz DM, et al. Integrating genomic features for non-invasive early lung cancer detection. Nature. 2020; 580(7802): 245-251.
10.1038/s41586-020-2140-0
CAS PubMed Web of Science® Google Scholar
62Liang N, Li B, Jia Z, et al. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat Biomed Eng. 2021; 5(6): 586-599.
10.1038/s41551-021-00746-5
CAS PubMed Web of Science® Google Scholar
63Mazzone PJ, Bach PB, Carey J, et al. Clinical validation of a cell-free DNA fragmentome assay for augmentation of lung cancer early detection. Cancer Discov. 2024; 14(11): 2224-2242.
10.1158/2159-8290.CD-24-0519
PubMed Web of Science® Google Scholar
64Li Y, Jiang G, Wu W, et al. Multi-omics integrated circulating cell-free DNA genomic signatures enhanced the diagnostic performance of early-stage lung cancer and postoperative minimal residual disease. EBioMedicine. 2023; 91:104553.
10.1016/j.ebiom.2023.104553
CAS PubMed Web of Science® Google Scholar
65Chen K, Sun J, Zhao H, et al. Non-invasive lung cancer diagnosis and prognosis based on multi-analyte liquid biopsy. Mol Cancer. 2021; 20(1): 23.
10.1186/s12943-021-01323-9
PubMed Web of Science® Google Scholar
66Saji H, Okada M, Tsuboi M, et al. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. Lancet. 2022; 399(10335): 1607-1617.
10.1016/S0140-6736(21)02333-3
CAS PubMed Web of Science® Google Scholar
67Altorki N, Wang X, Kozono D, et al. Lobar or sublobar resection for peripheral stage IA non-small-cell lung cancer. N Engl J Med. 2023; 388(6): 489-498.
10.1056/NEJMoa2212083
PubMed Web of Science® Google Scholar
68Wang Q, Song X, Zhao F, et al. Noninvasive diagnosis of pulmonary nodules using a circulating tsRNA-based nomogram. Cancer Sci. 2023; 114(12): 4607-4621.
10.1111/cas.15971
CAS PubMed Web of Science® Google Scholar
69He J, Wang B, Tao J, et al. Accurate classification of pulmonary nodules by a combined model of clinical, imaging, and cell-free DNA methylation biomarkers: a model development and external validation study. Lancet Digit Health. 2023; 5(10): e647-e656.
10.1016/S2589-7500(23)00125-5
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume15, Issue2

February 2025

e70225

Filename	Description
ctm270225-sup-0001-SuppMat.docx1 MB	Supporting Information
ctm270225-sup-0002-tableS1.xlsx11.6 KB	Supporting Information
ctm270225-sup-0003-tableS2.xlsx33.6 KB	Supporting Information
ctm270225-sup-0004-tableS3.xlsx67.3 KB	Supporting Information
ctm270225-sup-0005-tableS4.xlsx263.3 KB	Supporting Information
ctm270225-sup-0006-tableS5.xlsx195.7 KB	Supporting Information
ctm270225-sup-0007-tableS6.xlsx229.7 KB	Supporting Information
ctm270225-sup-0008-tableS7.xlsx29 KB	Supporting Information
ctm270225-sup-0009-tableS8.xlsx13.5 KB	Supporting Information
ctm270225-sup-0010-tableS9.xlsx10.8 KB	Supporting Information

Cell-free epigenomes enhanced fragmentomics-based model for early detection of lung cancer

Abstract

Background

Methods

Results

Conclusions

Keypoints

1 INTRODUCTION

2 METHODS

2.1 Study design and participants enrolment

2.2 Collection and preparation of samples

2.3 Library preparation for cfChIP-seq, cfRRBS and lpWGS

2.4 Processing of high-throughput sequencing data

2.5 Calculation of cfDNA fragmentomic features

2.6 Gene-level multi-omics feature analysis

2.7 Identification and functional analysis of multi-epigenetically regulated genes

2.8 Machine learning model construction and cross-validation analyses

2.9 Clinical benefits estimation

2.10 Statistical analysis

3 RESULTS

3.1 Study overview and cohort characteristics

3.2 Multiple cell-free epigenomes synergistically affect fragmentomic features

3.3 Cancer-derived fragmentomic changes enriched in epigenetically dysregulated gene hotspots

3.4 Identification and characterisation of MERGEs in lung cancer

3.5 Fragmentomics-based ensemble MERGE model enables accurate detection of lung cancer

3.6 Epigenetic patterns of MERGEs mirror lung adenocarcinoma progression

4 DISCUSSION

AUTHOR CONTRIBUTIONS

ACKNOWLEDGEMENTS

CONFLICT OF INTEREST STATEMENT

CONSENT FOR PUBLICATION

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information