DGIST Scholar: Privacy-preserving image captioning using virtual photon-limited imaging and federated learning

Detail View

Department of Robotics and Mechatronics Engineering Intelligent Imaging and Vision Systems Laboratory 1. Journal Articles

Privacy-preserving image captioning using virtual photon-limited imaging and federated learning

Citations

WEB OF SCIENCE

Citations

SCOPUS

Metadata Downloads

XML

Excel

Title: Privacy-preserving image captioning using virtual photon-limited imaging and federated learning

Issued Date: 2026-02

Citation: Results in Optics, v.23

Type: Article

Author Keywords: Federated learning ; Photon counting imaging (PCI) ; Poisson multinomial distribution-based PCI ; Privacy-preserving image captioning ; Deep learning

ISSN: 2666-9501

Abstract: The growing demand for visual privacy in optical imaging systems has motivated the development of frameworks that can both preserve privacy and maintain utility in downstream tasks. In this work, we propose a privacy-preserving image captioning framework that integrates Poisson Multinomial Distribution-based Photon Counting Imaging (PMD-PCI) with deep learning techniques. PMD-PCI simulates photon-limited imaging conditions by generating highly sparse multispectral images, thereby inherently concealing fine visual details. These sparse representations are used as inputs to two encoder-decoder architectures, ResNet101-Transformer and ViT-Transformer, for automated caption generation. To further enhance privacy and reduce the risk of centralized data exposure, we employ federated learning, allowing model training across distributed clients without direct access to raw images. Experimental evaluations on the Flickr8k and Flickr30k datasets show that accurate captions can be generated from photon-limited images with more than 50,000 incident photons at a resolution of 224 × 224 pixels. On Flickr30k, the proposed ResNet101-Transformer achieves a BLEU-4 score of 17.32 and a CIDEr score of 27.80 at 50,000 photons in a centralized setting, demonstrating that meaningful captions can be produced even under severe optical sparsification. Compared to traditional encryption-based techniques such as Double Random Phase Encoding (DRPE) and AES, our approach provides a better trade-off between privacy and captioning performance. Furthermore, convergence analysis reveals that federated learning achieves near-optimal performance within just 3 communication rounds, significantly reducing the communication overhead required for training. The proposed framework bridges physical-layer image privacy with optical imaging and learning-based caption generation, making it suitable for secure, low-light, or resource-constrained vision systems. © 2026 The Authors.
더보기

URI: https://scholar.dgist.ac.kr/handle/20.500.11750/59919

DOI: 10.1016/j.rio.2026.100970

Publisher: Elsevier

Show Full Item Record

File Downloads

There are no files associated with this item.

Moon, Inkyu문인규: Department of Robotics and Mechatronics Engineering

Detail View

Privacy-preserving image captioning using virtual photon-limited imaging and federated learning

File Downloads

공유

Related Researcher

Total Views & Downloads