Cytology plays a crucial role in lung cancer diagnosis. Pulmonary cytology involves cell morphological characterisation in the specimen and reporting the corresponding findings, which are extremely burdensome tasks. In this study, we propose a technique to generate cytologic findings from for cytologic images to assist in the reporting of pulmonary cytology.

Methods

For this study, 801 patch images were retrieved using cytology specimens collected from 206 patients; the findings were assigned to each image as a dataset for generating cytologic findings. The proposed method consists of a vision model and dual text decoders. In the former, a convolutional neural network (CNN) is used to classify a given image as benign or malignant, and the features related to the image are extracted from the intermediate layer. Independent text decoders for benign and malignant cells are prepared for text generation, and the text decoder switches according to the CNN classification results. The text decoder is configured using a transformer that uses the features obtained from the CNN for generating findings.

Results

The sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification, and the saliency map indicated characteristic benign and malignant areas. The grammar and style of the generated texts were confirmed correct, achieving a BLEU-4 score of 0.828, reflecting high degree of agreement with the gold standard, outperforming existing LLM-based image-captioning methods and single-text-decoder ablation model.

Conclusion

Experimental results indicate that the proposed method is useful for pulmonary cytology classification and generation of cytologic findings.

Graphical Abstract

Cytological images are taken by a microscope, and feature extraction and classification are performed using the vision model. Switching the text decoder to be used according to the classification results, a description of cytologic findings is generated.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

1 American Cancer Society, “ Cancer Facts and Figures 2023,” https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2023/2023-cancer-facts-and-figures.pdf.
Google Scholar
2F. C. Schmitt, L. Bubendorf, S. Canberk, et al., “The World Health Organization Reporting System for Lung Cytopathology,” Acta Cytologica 67, no. 1 (2023): 80–91, https://doi.org/10.1159/000527580.
10.1159/000527580
PubMed Web of Science® Google Scholar
3N. Thakur, M. R. Alam, J. Abdul-Ghafar, and Y. Chong, “Recent Application of Artificial Intelligence in Non-Gynecological Cancer Cytopathology: A Systematic Review,” Cancers 14 (2022): 3529, https://doi.org/10.3390/cancers14143529.
10.3390/cancers14143529
CAS PubMed Web of Science® Google Scholar
4A. Teramoto, T. Tsukamoto, Y. Kiriyama, and H. Fujita, “Automated Classification of Lung Cancer Types From Cytological Images Using Deep Convolutional Neural Networks,” BioMed Research International 2017 (2017): 4067832, https://doi.org/10.1155/2017/4067832.
10.1155/2017/4067832
PubMed Web of Science® Google Scholar
5T. Tsukamoto, A. Teramoto, A. Yamada, et al., “Comparison of Fine-Tuned Deep Convolutional Neural Networks for the Automated Classification of Lung Cancer Cytology Images With Integration of Additional Classifiers,” Asian Pacific Journal of Cancer Prevention 23, no. 4 (2022): 1315–1324, https://doi.org/10.31557/APJCP.2022.23.4.1315.
10.31557/APJCP.2022.23.4.1315
PubMed Google Scholar
6A. Teramoto, A. Yamada, Y. Kiriyama, et al., “Automated Classification of Benign and Malignant Cells From Lung Cytological Images Using Deep Convolutional Neural Network,” Informatics in Medicine Unlocked 16 (2019): 100205, https://doi.org/10.1016/j.imu.2019.100205.
10.1016/j.imu.2019.100205
Google Scholar
7A. Teramoto, T. Tsukamoto, A. Yamada, et al., “Deep Learning Approach to Classification of Lung Cytological Images: Two-Step Training Using Actual and Synthesized Images by Progressive Growing of Generative Adversarial Networks,” PLoS One 15, no. 3 (2020): e0229951, https://doi.org/10.1371/journal.pone.0229951.
10.1371/journal.pone.0229951
CAS PubMed Web of Science® Google Scholar
8D. Gonzalez, R. L. Dietz, and L. Pantanowitz, “Feasibility of a Deep Learning Algorithm to Distinguish Large Cell Neuroendocrine From Small Cell Lung Carcinoma in Cytology Specimens,” Cytopathology 31, no. 5 (2020): 426–431, https://doi.org/10.1111/cyt.12829.
10.1111/cyt.12829
PubMed Web of Science® Google Scholar
9H. Park, Y. Chong, Y. Lee, et al., “Deep Learning-Based Computational Cytopathologic Diagnosis of Metastatic Breast Carcinoma in Pleural Fluid,” Cells 12 (2023): 1847, https://doi.org/10.3390/cells12141847.
10.3390/cells12141847
PubMed Web of Science® Google Scholar
10X. Wang, Y. Peng, Z. L. Le Lu, and R. M. Summers, “ TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), 9049–9058.
10.1109/CVPR.2018.00943
Google Scholar
11B. Hou, G. Kaissis, R. Summers, and B. Kainz, “ RATCHET: Medical Transformer for Chest X-Ray Diagnosis and Reporting,” in Medical Image Computing and Computer Assisted Intervention, ed. M. Bruijne, P. C. Cattin, S. Cotin, et al. (Springer International Publishing, 2021), 293–303.
10.1007/978-3-030-87234-2_28
Google Scholar
12Y. F. Zhou, K.-L. Yao, and W.-J. Li, “GNNFormer: A Graph-Based Framework for Cytopathology Report Generation,” 2023 arXiv, arXiv:2303.09956.
Google Scholar
13K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition in International Conference on Learning Representations,” 2015 arXiv, arXiv:1409.1556.
Google Scholar
14C. Szegedy, W. Liu, Y. Jia, et al., “ Going Deeper With Convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2015 (IEEE, 2015), 1–9.
10.1109/CVPR.2015.7298594
Google Scholar
15K. He, X. Zhang, S. Ren, and J. Sun, “ Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), 770–778.
10.1109/CVPR.2016.90
Google Scholar
16G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “ Densely Connected Convolutional Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017), 2261–2269.
10.1109/CVPR.2017.243
Google Scholar
17R. R. Selvaraju, M. Cogswell, A. Das, et al., “ Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization,” in IEEE International Conference on Computer Vision (ICCV) (IEEE, 2017), 618–626.
10.1109/ICCV.2017.74
Google Scholar
18R. Girshick, J. Donahue, T. Darrell, and J. Malik, “ Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2014), 580–587.
10.1109/CVPR.2014.81
Google Scholar
19C. Raffe, N. Shazeer, A. Roberts, et al., “Exploring the Limits of Transfer Learning With a Unified Text-To-Text Transformer,” Journal of Machine Learning Research 21, no. 1 (2020): 5485–5551.
Google Scholar
20K. Papineni, S. Roukos, T. Ward, and W.-J. Zhua, Bleu: A Method for Automatic Evaluation of Machine Translation (Annual Meeting of the Association for Computational Linguistics, 2001), 1106–1114.
10.3115/1073083.1073135
Google Scholar
21M. Denkowski and A. Lavie, “ Meteor Universal: Language Specific Translation Evaluation for any Target Language,” in EACL Workshop on Statistical Machine Translation (Association for Computational Linguistics, 2014), 376–380.
10.3115/v1/W14-3348
Google Scholar
22C. Y. Lin, “ Rouge: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out (Association for Computational Linguistics, 2004), 74–81.
Google Scholar
23R. Vedantam, R. Vedantam, C. L. Zitnick, et al., “ Cider: Consensus-Based Image Description Evaluation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), 4566–4575.
10.1109/CVPR.2015.7299087
Google Scholar
24P. Anderson, B. Fernando, M. Johnson, and S. Gould, “Spice: Semantic Propositional Image Caption Evaluation,” European Conference on Computer Vision 40 (2016): 382–398.
Google Scholar
25J. Wang, Z. Yang, X. Hu, et al., “GIT: A Generative Image-To-Text Transformer for Vision and Language,” 2022 arXiv. arXiv:2205.14100v5.
Google Scholar
26J. Li, D. Li, C. Xiong, and S. Hoi, “ BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation,” in Proc. 39th International Conference on Machine Learning (PMLR, 2022), 12888–12900.
Google Scholar
27J. Li, D. Li, S. Savarese, et al., “BLIP-2: Bootstrapping Language-Image Pre-Training With Frozen Image Encoders and Large Language Models,” arXiv, arXiv:2301.12597v3 2023.
Google Scholar
28S. Zhang, S. Roller, N. Goyal, et al., “OPT: Open Pre-Trained Transformer Language Models,” arXiv. arXiv:2205.01068v4 2022.
Google Scholar
29A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale,” arXiv, arXiv:2010.11929 2020.
Google Scholar

Citing Literature

Volume36, Issue3

May 2025

Pages 240-249