Detail View

Title
Privacy-preserving image captioning using virtual photon-limited imaging and federated learning
Issued Date
2026-02
Citation
Results in Optics, v.23
Type
Article
Author Keywords
Federated learningPhoton counting imaging (PCI)Poisson multinomial distribution-based PCIPrivacy-preserving image captioningDeep learning
ISSN
2666-9501
Abstract

The growing demand for visual privacy in optical imaging systems has motivated the development of frameworks that can both preserve privacy and maintain utility in downstream tasks. In this work, we propose a privacy-preserving image captioning framework that integrates Poisson Multinomial Distribution-based Photon Counting Imaging (PMD-PCI) with deep learning techniques. PMD-PCI simulates photon-limited imaging conditions by generating highly sparse multispectral images, thereby inherently concealing fine visual details. These sparse representations are used as inputs to two encoder-decoder architectures, ResNet101-Transformer and ViT-Transformer, for automated caption generation. To further enhance privacy and reduce the risk of centralized data exposure, we employ federated learning, allowing model training across distributed clients without direct access to raw images. Experimental evaluations on the Flickr8k and Flickr30k datasets show that accurate captions can be generated from photon-limited images with more than 50,000 incident photons at a resolution of 224 × 224 pixels. On Flickr30k, the proposed ResNet101-Transformer achieves a BLEU-4 score of 17.32 and a CIDEr score of 27.80 at 50,000 photons in a centralized setting, demonstrating that meaningful captions can be produced even under severe optical sparsification. Compared to traditional encryption-based techniques such as Double Random Phase Encoding (DRPE) and AES, our approach provides a better trade-off between privacy and captioning performance. Furthermore, convergence analysis reveals that federated learning achieves near-optimal performance within just 3 communication rounds, significantly reducing the communication overhead required for training. The proposed framework bridges physical-layer image privacy with optical imaging and learning-based caption generation, making it suitable for secure, low-light, or resource-constrained vision systems. © 2026 The Authors.

더보기
URI
https://scholar.dgist.ac.kr/handle/20.500.11750/59919
DOI
10.1016/j.rio.2026.100970
Publisher
Elsevier
Show Full Item Record

File Downloads

  • There are no files associated with this item.

공유

qrcode
공유하기

Related Researcher

문인규
Moon, Inkyu문인규

Department of Robotics and Mechatronics Engineering

read more

Total Views & Downloads

???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???: