A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy

Chenyang Shen; Liyuan Chen; Xun Jia

doi:10.1088/1361-6560/ac09a2

A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy

Chenyang Shen, Liyuan Chen, Xun Jia

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.

Original language	English (US)
Article number	134002
Journal	Physics in medicine and biology
Volume	66
Issue number	13
DOIs	https://doi.org/10.1088/1361-6560/ac09a2
State	Published - Jul 7 2021

Keywords

deep reinforcement learning
hierarchical learning
intelligent automatic treatment planning

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging

Access to Document

10.1088/1361-6560/ac09a2

Cite this

@article{ded4b8da81fc40cb8b077c351fee6440,

title = "A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy",

abstract = "Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.",

keywords = "deep reinforcement learning, hierarchical learning, intelligent automatic treatment planning",

author = "Chenyang Shen and Liyuan Chen and Xun Jia",

note = "Funding Information: This work was supported by the National Institutes of Health grant number R01CA237269 and Cancer Prevention and Research Institute of Texas grant number RP160661. Publisher Copyright: {\textcopyright} 2021 Institute of Physics and Engineering in Medicine.",

year = "2021",

month = jul,

day = "7",

doi = "10.1088/1361-6560/ac09a2",

language = "English (US)",

volume = "66",

journal = "Physics in medicine and biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "13",

}

TY - JOUR

T1 - A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy

AU - Shen, Chenyang

AU - Chen, Liyuan

AU - Jia, Xun

N1 - Funding Information: This work was supported by the National Institutes of Health grant number R01CA237269 and Cancer Prevention and Research Institute of Texas grant number RP160661. Publisher Copyright: © 2021 Institute of Physics and Engineering in Medicine.

PY - 2021/7/7

Y1 - 2021/7/7

N2 - Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.

AB - Purpose. We have previously proposed an intelligent automatic treatment planning (IATP) framework that builds a virtual treatment planner network (VTPN) to operate a treatment planning system (TPS) to generate high-quality radiation therapy (RT) treatment plans. While the potential of IATP in automating RT treatment planning has been demonstrated, its poor scalability caused by an almost linear growth of network size with the number of treatment planning parameters (TPPs) is a bottleneck, preventing its application in complicate, but clinically relevant treatment planning problems. The decision-making behavior of the trained network is hard to understand. Motivated by the decision-making process of a human planner, this study proposes a hierarchical IATP framework. Methods and materials. The hierarchical VTPN (HieVTPN) consists of three networks, i.e. Structure-Net, Parameter-Net, and Action-Net. When interacting with a TPS, the networks are employed in a sequential order in each step to decide the structure to adjust, the TPP to adjust for the selected structure, and the specific adjustment manner for the parameter, respectively. We developed an end-to-end hierarchical deep reinforcement learning scheme to simultaneously train the three networks. We then evaluated the effectiveness of the proposed framework in the treatment planning problems for prostate cancer intensity modulated RT (IMRT) and stereotactic body RT (SBRT). We benchmarked the performance of our approach by comparing plans made by VTPN of a parallel architecture, and the human plans submitted for competition in the 2016 American Association of Medical Dosimetrist (AAMD)/Radiosurgery Society (RSS) Plan Study. We analyzed scalability of the network size with respect to the number of TPPs. Numerical experiments were also performed to understand the rationale of the decision-making behaviors of the trained HieVTPN. Results. Both HieVTPNs for prostate IMRT and SBRT were trained successfully using 10 training patient cases and 5 validation cases. For IMRT, HieVTPN was able to generate high-quality plans for 59 testing patient cases that were not included in training process, achieving an average plan score of 8.62 (±0.83), with 9 being the maximal score. The score was comparable to that of the VTPN, 8.45 (±0.48). For SBRT planning, HieVTPN achieved an average plan score of 139.07 on five testing patient cases compared to the score of 132.21 averaged over the human plans summited for competition in AAMD/RSS plan study. Different from VTPN with network size linearly scaling with the number of TPPs, the network size of HieVTPN is almost independent of the number of TPPs. It was also observed that the decision-making behaviors of HieVTPN were understandable and generally agreed with the human experience. Conclusions. With the scalability and explainability, the hierarchical IATP framework is more favorable than the previous framework in terms of handling treatment planning problems involving a large number of TPPs.

KW - deep reinforcement learning

KW - hierarchical learning

KW - intelligent automatic treatment planning

UR - http://www.scopus.com/inward/record.url?scp=85109079185&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85109079185&partnerID=8YFLogxK

U2 - 10.1088/1361-6560/ac09a2

DO - 10.1088/1361-6560/ac09a2

M3 - Article

C2 - 34107460

AN - SCOPUS:85109079185

SN - 0031-9155

VL - 66

JO - Physics in medicine and biology

JF - Physics in medicine and biology

IS - 13

M1 - 134002

ER -

A hierarchical deep reinforcement learning framework for intelligent automatic treatment planning of prostate cancer intensity modulated radiation therapy

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this