A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy

Chenyang Shen, Liyuan Chen, Xun Jia

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.

Original languageEnglish (US)
Article number134002
JournalPhysics in medicine and biology
Issue number13
StatePublished - Jul 7 2021


  • deep reinforcement learning
  • hierarchical learning
  • intelligent automatic treatment planning

ASJC Scopus subject areas

  • Radiological and Ultrasound Technology
  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy'. Together they form a unique fingerprint.

Cite this