Heart rate is an important physiological signal that reflects the physical state of a person and widely applied to medicine, sports, and healthcare applications. Heart rate is usually got by electrocardiogram (ECG) and photoplethysmogram (PPG) that requires commonly contact with a subject’s skin which may be inconvenient. Hence, many researchers have introduced remote heart rate (rHR) estimation algorithms from face video. In this paper, we introduce a remote heart rate (rHR) estimation algorithm using Video Swin Transformer. The original videos are firstly preprocessed to crop the face to get the face video. Then we fed 160 face sequences to Video Swin Transformer to extract spatiotemporal representation. Finally, we estimate rHR using the PPG predictor and Linear. To evaluate the performance of the proposed algorithm, we train and test on the public UBFC-rPPG dataset. The experimental results show that the proposed algorithm achieve better accuracy than CNN based methods.