In clinical practice, the beam orientation selection process is either tediously done by the planner or based on specific protocols, typically yielding suboptimal and inefficient solutions. Column generation (CG) has been shown to produce superior plans compared to those of human selected beams, especially in highly non-coplanar plans such as 4π Radiotherapy. In this work, we applied AI to explore the decision space of beam orientation selection. At first, a supervised deep learning neural network (SL) is trained to mimic a CG generated policy. By iteratively using SL to predict the next beam, a set of beam orientations would be selected. However, iteratively using SL to select the next beam does not guarantee the plan’s quality. Although the teacher policy, CG, is an efficient method, it is a greedy algorithm and still finds suboptimal solutions that are subject to improvement. To address this, a reinforcement learning application of guided Monte Carlo tree search (GTS) was implemented, coupled with SL to guide the traversal through the tree, and update the fitness values of its nodes. To test the feasibility of GTS, 13 test prostate cancer patients were evaluated. Our results show that we maintained a similar planning target volume (PTV) coverage within 2% error margin, reduce the organ at risk (OAR) mean dose, and in general improve the objective function value, while decreasing the computation time.