We developed a computational method to infer the complementarity-determining region 3 (CDR3) sequences of tumor-infiltrating T cells in 9,142 RNA-seq samples across 29 cancer types. We identified over 600,000 CDR3 sequences, including 15% that were full length. CDR3 sequence length distribution and amino acid conservation, as well as variable gene usage, for infiltrating T cells in many tumors, except in brain and kidney cancers, resembled those for peripheral blood cells from healthy donors. We observed a strong association between T cell diversity and tumor mutation load, and we predicted SPAG5 and TSSK6 as putative immunogenic cancer/testis antigens in multiple cancers. Finally, we identified three potential immunogenic somatic mutations on the basis of their co-occurrence with CDR3 sequences. One of them, a PRAMEF4 mutation encoding p.Phe300Val, was predicted to result in peptide binding strongly to both MHC class I and class II molecules, with matched HLA types in its carriers. Our analyses have the potential to simultaneously identify immunogenic neoantigens and tumor-reactive T cell clonotypes.
ASJC Scopus subject areas