Monte Carlo simulation for positron emission tomography (PET) is important in terms of designing new prototype devices and reconstruction algorithms. However, most of current simulation packages suffer from long simulation time. To fully address the time issue, a GPU based efficient and accurate simulation package, gPET, was developed and validated. gPET is built on the NVidia CUDA platform. The simulation process was modularized into three functional parts: 1) source management, including positron decay, transport and annihilation, 2) gamma transport inside the voxelized phantom, and 3) signal detection and processing inside the detector with repeatable parts in three level hierarchy: panel, module and crystal scintillator. A predefined surface can be further used to simulate irregularly shaped detector. GPU shared memory was used to accelerate the program. The performance of gPET was compared with GATE8.0 in two cases: 1) ten million positrons from C-11 point source centered in an 8 cm3 cubic water phantom were simulated. Gammas generated from the annihilation process were further transported in an eight-panel-detector 6.65 cm away from the center; 2) fifteen million gamma pairs were directly transported into a six-panel-detector with an inner cylinder shape. The mean positron ranges are 0.99 mm and 1.14 mm for gPET and GATE/Geant4, respectively. 0.5% difference in angular distribution is found for the gammas from annihilated positron. In both cases, the differences of energy distribution and spatial distribution over crystals are below 3% for the final coincidence pairs. The computation times for gPET on single Titan Xp GPU (1.58 GHz) and GATE8.0 on single Intel i7-6850K CPU (3.6 GHz) were 0.6 s/million histories and 300 s/million histories, respectively. In summary, gPET is an efficient and accurate Monte Carlo simulation tool for PET.