In visual surveillance, deep learning-based foreground object detection algorithms are superior to classical background subtraction (BGS)-based algorithms. However, deep learning-based methods are limited because detection performance deteriorates in a new environment different from the training environment. This limitation can be solved by retraining the model using additional ground-truth labels in the new environment. However, generating ground-truth labels for visual surveillance is time-consuming and expensive. This paper proposes a method that does not require foreground labels when adapting to a new environment. To this end, we propose an integrated network that produces two kinds of outputs a background model image and a foreground object map. We can adapt to the new environment by retraining using a background model image. The proposed method consists of one encoder and two decoders for detecting foreground objects and a background model image. It is designed to enable real-time processing with desktop GPUs. The proposed method shows 14.46% improved FM in a new environment different from training and 11.49% higher FM than the latest BGS algorithm.