The development of convolutional neural networks (CNNs) has brought about significant progress in a variety of computer vision tasks, and among them, stereo matching is an important area of research that allows for the reconstruction of depth and 3D information, which is difficult to obtain with a single camera. However, CNNs have limitations, particularly in their susceptibility to domain shift. The performance of state-of-the-art stereo matching networks that rely on CNNs is known to degrade when there are changes in domain. In addition, collecting real-world ground truth data to address this issue can be a time-consuming and expensive process when compared to synthetic ground truth data. To solve this problem, this study proposes an end-to-end framework that employs image-to-image translation to bridge the domain gap in stereo matching. Specifically, the study proposes a horizontal attentive generation (HAG) module that takes into account the epipolar constraint of contents when generating target-stylized left-right views. By using a horizontal attention mechanism during the generation process, the proposed method can deal with issues related to small receptive fields by aggregating more information from each view without using the entire feature map. As a result, the network can maintain consistency between the left and right views during image generation, making it more robust across different datasets.