Intensity-Modulated Radiation Therapy (IMRT) is a method for treating cancers by aiming radiation to cancer tumor while minimizing radiation to organs-at-risk. Usually, radiation is aimed from a particle accelerator, mounted on a robot manipulator. Computationally finding the correct treatment plan for a target volume is often an exhaustive combinatorial search problem, and traditional optimization methods have not yielded real-time feasible results. Aiming to automate the beam orientation and intensity-modulation process, we introduce a novel set of techniques leveraging (i) pattern recognition, (ii) monte carlo evaluations, (iii) game theory, and (iv) neuro-dynamic programming. We optimize a deep neural network policy that guides Monte Carlo simulations of promising beamlets. Seeking a saddle equilibrium, we let two fictitious neural network players, within a zero-sum Markov game framework, alternatingly play a best response to their opponent’s mixed strategy profile. During inference, the optimized policy predicts feasible beam angles on test target volumes. This work merges the beam orientation and fluence map optimization subproblems in IMRT sequential treatment planning system into one pipeline. We formally introduce our approach, and present numerical results for coplanar beam angles on prostate cases.