Objective This study aimed to develop a novel multi-stage self-supervised learning model tailored for the accurate classification of optical coherence tomography (OCT) images in ophthalmology reducing reliance on costly labeled datasets while maintaining high diagnostic accuracy.Materials and Methods A private dataset of 2719 OCT images from 493 patients was employed, along with 3 public datasets comprising 84 484 images from 4686 patients, 3231 images from 45 patients, and 572 images. Extensive internal, external, and clinical validation were performed to assess model performance. Grad-CAM was employed for qualitative analysis to interpret the model's decisions by highlighting relevant areas. Subsampling analyses evaluated the model's robustness with varying labeled data availability.Results The proposed model outperformed conventional supervised or self-supervised learning-based models, achieving state-of-the-art results across 3 public datasets. In a clinical validation, the model exhibited up to 17.50% higher accuracy and 17.53% higher macro F-1 score than a supervised learning-based model under limited training data.Discussion The model's robustness in OCT image classification underscores the potential of the multi-stage self-supervised learning to address challenges associated with limited labeled data. The availability of source codes and pre-trained models promotes the use of this model in a variety of clinical settings, facilitating broader adoption.Conclusion This model offers a promising solution for advancing OCT image classification, achieving high accuracy while reducing the cost of extensive expert annotation and potentially streamlining clinical workflows, thereby supporting more efficient patient management.