To achieve lossless, high-throughput coincidence filtering with minimal data processing delay and memory storage, we propose a new FPGA-based digital coincidence processor (CP). Similar to the modern general-purpose graphics processing unit (GPGPU) architecture, the new CP has a standard Scatter-Gather hierarchy that provides a dedicated coincidence selection processor between each possible pair of detector modules, so that a gamma interaction from a detector module can be directly compared to interactions from all the other detector modules at the same time. Such “parallel” approach distributes the workload of CP to the shared network of coincidence selection processors, which leads to minimized processing delay and significantly increased throughput. Even it requires more FPGA resources, such new CP can be easily implemented with the latest FPGA. A prototype new CP was implemented on a Stratix IV GX FPGA with 42 coincidence selection processors. CP counting-rate performance was evaluated with a 12-detector module PET system. The evaluation study shows that the CP can achieve a peak coincidence processing throughput of 2.1G events/s, while the FPGA resource usage of all coincidence selection processors is around 28%, 44% and 9% in combinational logics, registers and on-chip memory, respectively. The overall event loss is less than 0.006% with event rate from 1K/s to 250K/s. With FPGA's re-programmability and parameterized programming technique, the design can be conveniently implemented for different detector and system configurations for maximizing counting rate performance. The new coincidence processor is particularly suited for Time-Of-Flight (TOF) PET and real-time coincidence imaging applications.