We propose a novel approach to infer a high-quality depth map from a set of images with small viewpoint variations. In general, techniques for depth estimation from small motion consist of camera pose estimation and dense reconstruction. In contrast to prior approaches that recover scene geometry and camera motions using pre-calibrated cameras, we introduce a self-calibrating bundle adjustment method tailored for small motion which enables computation of camera poses without the need for camera calibration. For dense depth reconstruction, we present a convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches. Rather than directly estimating depth or optical flow correspondence from image pairs as done in many previous deep learning methods, DPSNet takes a plane sweep approach that involves building a cost volume from deep features using the plane sweep algorithm, regularizing the cost volume, and regressing the depth map from the cost volume. The cost volume is constructed using a differentiable warping process that allows for end-to-end training of the network. Through the effective incorporation of conventional multiview stereo concepts within a deep learning framework, the proposed method achieves state-of-the-art results on a variety of challenging datasets. IEEE