Detail View
Privacy-preserving image captioning using virtual photon-limited imaging and federated learning
Citations
WEB OF SCIENCE
Citations
SCOPUS
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Martin, Antoinette Deborah | - |
| dc.contributor.author | Moon, Inkyu | - |
| dc.date.accessioned | 2026-02-05T16:10:12Z | - |
| dc.date.available | 2026-02-05T16:10:12Z | - |
| dc.date.created | 2026-01-27 | - |
| dc.date.issued | 2026-02 | - |
| dc.identifier.issn | 2666-9501 | - |
| dc.identifier.uri | https://scholar.dgist.ac.kr/handle/20.500.11750/59919 | - |
| dc.description.abstract | The growing demand for visual privacy in optical imaging systems has motivated the development of frameworks that can both preserve privacy and maintain utility in downstream tasks. In this work, we propose a privacy-preserving image captioning framework that integrates Poisson Multinomial Distribution-based Photon Counting Imaging (PMD-PCI) with deep learning techniques. PMD-PCI simulates photon-limited imaging conditions by generating highly sparse multispectral images, thereby inherently concealing fine visual details. These sparse representations are used as inputs to two encoder-decoder architectures, ResNet101-Transformer and ViT-Transformer, for automated caption generation. To further enhance privacy and reduce the risk of centralized data exposure, we employ federated learning, allowing model training across distributed clients without direct access to raw images. Experimental evaluations on the Flickr8k and Flickr30k datasets show that accurate captions can be generated from photon-limited images with more than 50,000 incident photons at a resolution of 224 × 224 pixels. On Flickr30k, the proposed ResNet101-Transformer achieves a BLEU-4 score of 17.32 and a CIDEr score of 27.80 at 50,000 photons in a centralized setting, demonstrating that meaningful captions can be produced even under severe optical sparsification. Compared to traditional encryption-based techniques such as Double Random Phase Encoding (DRPE) and AES, our approach provides a better trade-off between privacy and captioning performance. Furthermore, convergence analysis reveals that federated learning achieves near-optimal performance within just 3 communication rounds, significantly reducing the communication overhead required for training. The proposed framework bridges physical-layer image privacy with optical imaging and learning-based caption generation, making it suitable for secure, low-light, or resource-constrained vision systems. © 2026 The Authors. | - |
| dc.language | English | - |
| dc.publisher | Elsevier | - |
| dc.title | Privacy-preserving image captioning using virtual photon-limited imaging and federated learning | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1016/j.rio.2026.100970 | - |
| dc.identifier.scopusid | 2-s2.0-105027471714 | - |
| dc.identifier.bibliographicCitation | Results in Optics, v.23 | - |
| dc.description.isOpenAccess | TRUE | - |
| dc.subject.keywordAuthor | Federated learning | - |
| dc.subject.keywordAuthor | Photon counting imaging (PCI) | - |
| dc.subject.keywordAuthor | Poisson multinomial distribution-based PCI | - |
| dc.subject.keywordAuthor | Privacy-preserving image captioning | - |
| dc.subject.keywordAuthor | Deep learning | - |
| dc.citation.title | Results in Optics | - |
| dc.citation.volume | 23 | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.type.docType | Article | - |
File Downloads
- There are no files associated with this item.
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
