Prognostic implications of immune-related eight-gene signature in pediatric brain tumors

Genomic studies have provided insights into molecular subgroups and oncogenic drivers of pediatric brain tumors (PBT) that may lead to novel therapeutic strategies. Participants of the cohort Pediatric Brain Tumor Atlas: CBTTC (CBTTC cohort), were randomly divided into training and validation cohorts. In the training cohort, Kaplan-Meier analysis and univariate Cox regression model were applied to preliminary screening of prognostic genes. The LASSO Cox regression model was implemented to build a multi-gene signature, which was then validated in the validation and CBTTC cohorts through Kaplan-Meier, Cox, and receiver operating characteristic curve (ROC) analyses. Also, gene set enrichment analysis (GSEA) and immune infiltrating analyses were conducted to understand function annotation and the role of the signature in the tumor microenvironment. An eight-gene signature was built, which was examined by Kaplan-Meier analysis, revealing that a significant overall survival difference was seen, either in the training or validation cohorts. The eight-gene signature was further proven to be independent of other clinic-pathologic parameters via the Cox regression analyses. Moreover, ROC analysis demonstrated that this signature owned a better predictive power of PBT prognosis. Furthermore, GSEA and immune infiltrating analyses showed that the signature had close interactions with immune-related pathways and was closely related to CD8 T cells and monocytes in the tumor environment. Identifying the eight-gene signature (CBX7, JADE2, IGF2BP3, OR2W6P, PRAME, TICRR, KIF4A, and PIMREG) could accurately identify patients' prognosis and the signature had close interactions with the immunodominant tumor environment, which may provide insight into personalized prognosis prediction and new therapies for PBT patients.


Introduction
Brain tumors are the most common solid tumor in pediatrics, accounting for 23.7% of new cancer diagnoses in children (1), and the second most common pediatric malignancy after leukemia (2,3). The life-saving treatments these children receive may result in impaired brain structure and function, leading to long-term major cognitive deficits (1,(4)(5)(6)(7)(8). Current treatment options include surgical resection, cranial radiation, and chemotherapy. Survivors treated with high-load cranial radiation likely experience cognitive dysfunction, including difficulties in controlled attention, such as response inhibition and slower information processing (1,(4)(5)(6)(7)(8). The distribution, pathology, molecular characteristics, and treatment strategies for pediatric brain tumors (PBT) have essential differences compared to those of the adult population (9).
Although the cure rate of PBT has increased in the past two decades of the 20th century, which was primarily due to advances in imaging, neurosurgery, and radiation oncology technologies and the introduction of combined chemotherapy, unfortunately, the overall survival has remained static (10). The lack of advances in PBT treatment was hindered by our lack of knowledge about the molecular pathogenesis of brain tumors (10). Advanced genomic analysis of the entire spectrum of PBT heralds an era in which this defect can be overcome by new technologies that will help us understand the genome pattern of PBT (10). Genomic studies have provided insights into molecular subgroups and oncogenic PBT drivers that may lead to novel therapeutic strategies (11).
The Gabriella Miller Kids First Data Resource Center (Kids First DRC; ohttps://kidsfirstdrc.org4) is a new, collaborative, pediatric research effort to understand the genetic causes and links between childhood cancer and structural congenital disabilities. This is a brand-new public database that has only been launched in recent years. It is vital that this data is expressly set up for children's tumor research, and few researchers have begun to mine this database.
Here, we conducted a comprehensive mining of the Kids First database to determine the minimum number of potentially robust genes that can be used to predict PBT patients' prognosis. Importantly, we used the LASSO algorithm, which can effectively analyze high-dimensional sequencing data (12). Furthermore, we evaluated the accuracy of this eight-gene signature and validated it in a validation cohort. Moreover, gene set enrichment analysis (GSEA) and immune infiltrating analyses were conducted to explore the role of the signature in the tumor microenvironment.

Data mining from the Kids First program
The gene expression profiles of PBT from 973 patients and their clinical and survival data were downloaded from Kids First Xena Hub (ohttps://kidsfirst.xenahubs.net4) with the cohort name: Pediatric Brain Tumor Atlas: CBTTC (CBTTC cohort). The study was conducted following the Declaration of Helsinki, and the Ethical Committee of Zhumadian Central Hospital approved the study.

Identification and validation of the prognostic gene signature
First, all patients in the CBTTC cohort were randomly divided into training cohort (486 cases, 49.9%) and validation cohort (487 cases, 50.1%). In the training cohort, Kaplan-Meier analysis was used to screen the prognostic genes with a cutoff of P value o10.0E-15 in the log-rank test. Furthermore, univariate Cox regression analysis was performed on the training cohort to find prognostic genes with P values o10.0E-15. The intersected genes identified in Kaplan-Meier and univariate Cox analyses were then entered into the LASSO Cox regression model analysis, which was implemented in the training cohort utilizing R software (owww.r-project.org4) and the ''glmnet'' package (ocran.r-project.org4). Ten-time cross-validations were applied to detect the best penalty parameter lambda (12)(13)(14)(15). According to the best lambda value, a list of prognostic genes with coefficients was obtained from the gene expression and patients' survival data.
Moreover, each patient's risk score was calculated based on the expression level of each prognostic gene and its corresponding coefficient. Using the median risk score as the cutoff point, the patients in the training cohort were distributed into high-risk or low-risk groups. Kaplan-Meier analysis was applied to evaluate the survival difference between the two groups. Cox and ROC analyses were conducted to further assess the gene signature's prognostic value in the training cohort. Furthermore, the prognostic gene signature was validated in the validation cohort. The same formula was conducted to compute risk scores like those in the training cohort. Kaplan-Meier, Cox, and ROC analyses were carried out as described earlier.

Correlation of risk score with the proportion of tumorinfiltrating immune cells (TICs)
The CIBERSORT (16) and MCP-counter (17) methods were used to estimate all tumor samples' TIC abundance distribution in the CBTTC cohort. The correlation was examined by the Spearman method.

Statistical analysis
All statistical calculations were performed in the R software. Kaplan-Meier analysis was conducted to determine the overall survival differences between groups. Univariate and multivariate Cox proportional hazard regression analyses were conducted to assess the association between risk score and overall survival. The ROC analysis was applied to examine the sensitivity and specificity of survival prediction using the gene signature risk score. An area under the ROC curve (AUC) served as a pointer of prognostic accuracy. The R package ''pROC'' was used for ROC analysis, and the ''Delong'' method was used to study the significant differences among ROC curves. In all analyses, a P value o0.05 was considered statistically significant.

Clinical characteristics
The flowchart of the present research is shown in Figure 1. A total of 973 cases in the CBTTC cohort were randomly distributed to a training cohort (N=486) and a validation cohort (N=487). The detailed clinical characteristics and grouping of these patients are summarized in Table 1.

Construction of prognostic signature from the training cohort
Ninety-two genes were extracted from the Kaplan-Meier analysis (Table S1), while 136 genes were identified as significant in the Cox regression analysis (Table S2). Taken together, 72 genes in the intersection of the two results are defined as prognostic genes for subsequent analyses (Table S3). These prognostic genes were then subjected to LASSO Cox regression analysis, and regression coefficients were calculated. The coefficient of each gene is illustrated in Figure 2A. When 8 genes were included, the model achieved the best performance ( Figure 2B). These genes and corresponding coefficients are shown in Table 2.
Prognostic value of the eight-gene signature in the cohorts The distribution of risk scores and survival status and the expression profiles of the eight-gene signature of the patients in the training cohort were plotted and are shown in Figure 3A. As shown in the figure, there were more deaths in the high-risk patient group, and the survival time was shorter than that of the low-risk patient group. The heatmap indicates that CBX7 and JADE2 were underexpressed in high-risk patients, while IGF2BP3, OR2W6P, PRAME, TICRR, KIF4A, and PIMREG were highly expressed in high-risk patients. We also verified the performance of this eight-gene signature in the validation cohort. As shown in Figure 3B and C, the pattern was consistent with that in the training cohort. Furthermore, we examined this eight-gene signature in subtypes in the CBTTC cohort and the results were similar ( Figure S1).
Kaplan-Meier survival analysis showed that patients in the high-risk group were associated with a poor prognosis trend in the training cohort (P value o0.0001, Figure 4A). To confirm the efficacy of the eight-gene signature in predicting overall survival in PBT patients, it was examined in  the validation cohort. According to the median risk score, patients were divided into high-risk and low-risk groups using the same classification method. Consistent with previous results, patients in the high-risk group showed significantly worse overall survival than patients in the low-risk group (P value o0.0001, Figure 4B). In the entire CBTTC cohort, which was the sum of the training and validation cohorts, the eight-gene signature also had similar predictive ability (P value o0.0001, Figure 4C). We tested the capacity of each of the eight genes via Kaplan-Meier analysis and found CBX7 and JADE2 predicted favorable outcomes, while the remaining genes had poor effects on the prognosis ( Figure S2). Moreover, the subtypes of PBT in the CBTTC cohort were also examined by Kaplan-Meier analysis, which showed that the eight-gene signature risk score predicted the survival of PBT subtypes ( Figure S3).
Univariate and multivariate Cox analyses were performed in the training, validation, and CBTTC cohorts, using the available co-variables including risk score, age, gender, race, and ethnicity to confirm whether the prognostic capacity of our eight-gene signature was independent from the clinic-pathologic characteristics. In the training cohort, both univariate and multivariate Cox Data are reported as number and percent.  Figure 5A). Consistent with that in the training cohort, the eight-gene signature displayed a similar ability in the validation and CBTTC cohorts ( Figure 5B and C). These results proved that the eight-gene signature was a strong and independent variable. Subsequently, we conducted ROC analyses to assess how the eight-gene signature would behave in predicting prognosis. As shown in Figure 6A, the area under the ROC curve (AUC) of the eight-gene risk score model performed in the training cohort was 0.808, which was superior to those of age, gender, race, and ethnicity (0.492, 0.494, 0.482, and 0.514, respectively). This finding was also confirmed in the validation cohort (AUC=0.844, Figure 6B) and CBTTC cohort (AUC= 0.827, Figure 6C).

Gene set enrichment analysis with the eight-gene signature
Given the negative correlation between the eight-gene signature risk score and the overall survival of PBT patients, GSEA was conducted between the high-and low-risk groups. As displayed in Figure 7A and Table S4, genes in the high-risk group were mostly enriched in immune-related functions and pathways. They were relating to regulatory T cells, macrophages, CD4 T cells, TGF beta, IL6, and naive B cells. As to the low-risk score group, the genes were enriched in pathways involved in macrophages, CD4 T cells, T helper type 2 cells, and FOXP3+ regulatory T cells, which were also closely immune-related ( Figure 7A and Table S5). For HALL-MARK collection defined by the Molecular Signatures Database, multiple immune functional gene sets like HALL MARK_MITOTIC_SPINDLE, HALLMARK_G2M_CHECK POINT, and HALLMARK_E2F_TARGETS were significantly   Table S6); whereas only one gene set HALLMARK_ADIPOGEN-ESIS was enriched in the low-risk group ( Figure 7B and Table S7). These findings indicated that the risk score was potentially closely related to the status of tumor microenvironment.

Correlation of risk score with the proportion of TICs
To further confirm the correlation between the risk score and the immune microenvironment, as shown in Figure S4 and Figure S5, we used the CIBERSORT and MCP-counter algorithm to analyze the proportion of TICs subpopulations and constructed immune cell profiles in PBT samples. Combining the results of correlation analysis ( Figure 8A) and differential analysis ( Figure 8B), a total of 11 TICs were associated with the eight-gene signature risk score ( Figure 8C). In the result of the MCPcounter algorithm (Figure 9), a list containing 7 TICs identified closely with the signature. In summary, by adopting these methods, CD8 T cells and monocytes were overlapped in the two results and seen as critical cells that affected the eight-gene signature in the tumor environment of PBT.

Discussion
At present, diagnosis, prognosis, and treatment of PBT are highly dependent on the histopathological  characteristics of the tumor (3,18). However, more importantly, given the current development of precision medicine and genetic research of tumors, in the past decade, significant changes have taken place in pediatric neuro-oncology, and exploring optimized tumor biomarkers will be the trend of future development (19,20). In this study, we built a PBT prognostic signature by comprehensively analyzing the Kids First database, designed to understand the genetic causes and connections of childhood cancer and congenital structural disabilities.
After we constructed the eight-gene signature, we firstly confirmed its capacity to distinguish the survival time and survival status of patients effectively. As shown in Figure 3A, the high-risk zone not only counted more deaths, but also the patients in it presented a shorter survival time than those in the low-risk zone. Moreover, the heatmap indicated that each of these eight genes had a differential expression pattern between the low-and high-risk groups. Importantly, this eight-gene signature had the same or similar performance in the validation cohort ( Figure 3B) and the CBTTC cohort ( Figure 3C).
In addition, we examined the prognostic value of the eight-gene signature by Kaplan-Meier analysis in training, validation, and CBTTC cohorts, finding its predicting ability in PBT patients significant ( Figure 4). Furthermore, univariate and multivariate analyses were performed in the three cohorts to confirm whether our eight-gene signature could be independent from other variables in predicting PBT overall survival. As plotted in Figure 5, no matter in which cohort, whether it was univariate or multivariate Cox regression analysis, the variable of risk score was always statistically significant, and the results confirmed the predictive ability of the risk score, and its independence.
To further assess the predictive power of this eight-gene signature, ROC analysis was conducted. In diagnostic tests, AUC is used to check accuracy and determine the predictive capacity of biomarkers (21). ROC analysis indicated that the AUC of the eight-gene signature stayed above 0.8 in these three cohorts and was superior to age, gender, race, and ethnicity. These ROC results again suggested that our signature strengthened the predictive accuracy of prognosis in PBT.   The shade around the blue line represents the 95% confidence interval. Spearman coefficient was used for the correlation test. B, Violin plot showing the ratio differentiation of 10 types of immune cells between PBT samples from low-and high-risk groups to the median risk score. Wilcoxon rank sum test was used to assess for significance. C, Venn plot displays 7 TICs correlated with risk score co-determined by difference and correlation tests shown in violin and scatter plots, respectively. P value o0.05 was the cutoff. TIC: tumor-infiltrating immune cell; PBT: pediatric brain tumor.
Our signature was composed of eight genes, which are CBX7, JADE2, IGF2BP3, OR2W6P, PRAME, TICRR, KIF4A, and PIMREG. In the signature model, CBX7 and JADE2 were protective genes, whereas other genes were unfavorable on the overall survival of PBT patients. IGF2BP3 is an oncofetal protein that binds RNA, thereby influencing the fate of target transcripts, and it is up-expressed in a variety of malignant tumors and represents a promising cancer biomarker (22). PRAME is a tumorassociated antigen that was first identified through analysis of the specificity of tumor-reactive T-cell clones derived from a patient with metastatic cutaneous melanoma (23). Subsequently, it was found that PRAME is not only expressed in cutaneous melanoma, but also in ocular melanoma and various non-melanoma cell malignancies (24). PRAME expression can be detected in 82% of medulloblastoma samples, regardless of molecular and histopathological subgroups. The high expression of PRAME is also related to poor outcomes of patients with medulloblastoma. Studies have shown that adoptive immunotherapy that redirects T cells to PRAME antigen may represent an innovative treatment for medulloblastoma (25). TICRR is one of the important replication initiation factors. The knockout of TICRR significantly inhibits tumor cell growth, migration, and colony formation in vitro, and inhibits tumor growth in xenograft models (26). A recent study demonstrated that TICRR is a major carcinogen, which can accelerate the proliferation of cancer cells by promoting the initiation and progression of DNA replication (27). KIF4A was found to be implicated in the regulation of chromosome condensation and segregation during mitotic cell division, which is essential for eukaryotic cell proliferation (28). KIF4A is aberrantly expressed in a variety of cancers, and it is overexpressed in most tumors but also low-expressed in a few tumors, suggesting distinct functions and mechanisms for different tumors (29,30). PIMREG plays a key role in regulating cell proliferation and is induced by mitogens, and its protein level is related to the cell cycle (31). Jiang et al. found that PIMREG played a crucial role in the promotion of breast cancer aggressiveness via constitutive activation of the NF-kB signaling pathway (32). There are relatively few studies related to these genes and PBT. However, the eight-gene signature had a significant role in predicting and diagnosing PBT in our research. The eight-gene signature or each of them may indicate specific directions for future research on PBT.
The findings of the GSEA analysis demonstrated that the eight-gene signature might potentially participate in the immune-dominant tumor microenvironment. The analysis based on CIBERSORT algorithm for the proportion of TICs showed that half of TICs (11/22) were correlated with the eight-gene signature risk score in PBT patients, further supporting that the signature interacted closely with the tumor environment. Combining the CIBERSORT and MCPcounter methods, we found CD8 T cells and monocytes were in a close relationship with the eight-gene signature in the tumor environment of PBT. Strategies targeting the tumor microenvironment of pediatric brain cancers have the potential to improve the efficacy of standard and genomebased molecular therapeutics and to help resolve many of the challenges associated with developing new drugs and running clinical trials for a relatively small PBT population (33). The specific pathways and TICs revealed in our analysis have potential for tumor microenvironment targets in further studies.
Our research also had some limitations. For the study of PBT, currently available public databases are very limited. The datasets in GEO and TCGA are not eligible for validation purposes because of the population's age distribution. Our eight-gene signature came from retrospective data, and more prospective data is needed for proving the clinical utility of it. In addition, due to the very limited clinical characteristics of patients included in the CBTTC cohort, we could not perform certain clinical subgroup analyses. Furthermore, there are currently no wet experimental data explaining the relationship between these eight genes and their mechanism in PBT samples. Therefore, more research is needed to clarify the potential relationship.
In conclusion, our research defined a robust eightgene signature in PBT. It was a comprehensive analysis of the new Kids First database. This signature was related to PBT's overall survival and accurately identified the prognostic risk of patients. Notably, we assessed the reliability and accuracy of the signature in a validation cohort. In addition, the functions and immune infiltrating analyses showed that the signature had close interactions with CD8 T cells and monocytes in the tumor environment, which may advance the development of new therapies for PBT treatment.

Supplementary Material
Click to view [pdf].