For the past few years, training robots to enable them to learn various manipulative skills using deep reinforcement learning (DRL) has arisen wide attention. However, large search space, low sample quality, and difficulties in network convergence pose great challenges to robot training. This paper deals with assembly-oriented robot grasping training and proposes a DRL algorithm with a new mechanism, namely, policy guidance mechanism (PGM). PGM can effectively transform useless or low-quality samples to useful or high-quality ones. Based on the improved Deep Q Network algorithm, an end-to-end policy model that takes images as input and outputs actions is established. Through continuous interactions with the environment, robots are able to learn how to optimally grasp objects according to the location of maximum Q value. A number of experiments for different scenarios using simulations and physical robots are conducted. Results indicate that the proposed DRL algorithm with PGM is effective in increasing the success rate of robot grasping, and moreover, is robust to changes of environment and objects.