Fast processing graph algorithms for large-scale graphs becomes increasingly important. Besides, there have been many attempts to process graph applications by exploiting the massive amount of parallelism of GPUs. However, most of the existing methods fail to process large-scale graphs that do not fit in GPU device memory. We propose a fast and scalable parallel processing method GStream that fully exploits the computational power of GPUs for processing large-scale graphs (e.g., billions vertices) very efficiently. It exploits the concept of nested-loop theta-join and multiple asynchronous GPU streams. Extensive experimental results show that GStream consistently and significantly outperforms the state-of-the art method.