Screening potential prognostic biomarkers of long non-coding RNAs for predicting the risk of chronic kidney disease

Not much is known about the roles of long non-coding RNAs (lncRNAs) for chronic kidney disease (CKD). In this study, we included CKD patient cohorts and normal controls as a discovery cohort to identify putative lncRNA biomarkers associated with CKD. We first compared the lncRNA expression profiles of CKD patients with normal controls, and identified differentially expressed lncRNAs and mRNAs. Co-expression network based on the enriched differentially expressed mRNAs and lncRNAs was constructed using WGCNA to identify important modules related to CKD. A lncRNA-miRNA-mRNA pathway network based on the hub lncRNAs and mRNAs, related miRNAs, and overlapping pathways was further constructed to reveal putative biomarkers. A total of 821 significantly differentially expressed mRNAs and lncRNAs were screened between CKD and control samples, which were enriched in nine modules using weighted correlation network analysis (WGCNA), especially brown and yellow modules. Co-expression network based on the enriched differentially expressed mRNAs and lncRNAs in brown and yellow modules uncovered 7 hub lncRNAs and 53 hub mRNAs. A lncRNA-miRNA-mRNA pathway network further revealed that lncRNAs of HCP5 and NOP14-AS1 and genes of CCND2, COL3A1, COL4A1, and RAC2 were significantly correlated with CKD. The lncRNAs of NOP14-AS1 and HCP5 were potential prognostic biomarkers for predicting the risk of CKD.


Introduction
Arising from many heterogeneous disease pathways, chronic kidney disease (CKD) can alter the function and structure of the kidney irreversibly, over months or years. In 2012, according to World Health Organization global health estimates, 864 226 deaths (or 1.5% of deaths worldwide) were attributable to CKD. Thus, the burden of CKD is substantial. The etiology of CKD is very complex, with hypertension and diabetes being the main causes. Moreover, increased risk of developing CKD and more rapidly progressing CKD are reported to be related to worsening blood pressure control. Recently, investigation of people with genetic causes of CKD has been evolving rapidly. Several loci, genetic polymorphisms, and single nucleotide polymorphisms that might contribute to accelerated progression of CKD have been identified with genome-wide association studies (1)(2)(3)(4).
During past years, the biological roles of various types of non-coding RNAs (ncRNAs) have been highlighted. Long non-coding RNAs (lncRNAs), a newly discovered class of ncRNAs, were defined as RNA molecules longer than 200 nucleotides in length. There is growing evidence that lncRNAs are involved in various biological processes, including maintenance of pluripotency, nuclear organization, development, translational control, and RNA splicing. A growing number of lncRNAs have been characterized as tumor suppressor genes or oncogenes contributing to cancer development, progression, and metastasis (3,5,6). Nevertheless, not much is known about the roles of lncRNAs for chronic human diseases other than cancer until recently, such as CKD.
In this work, CKD patient cohorts and normal controls were included as a discovery cohort to identify putative lncRNA biomarkers. The lncRNA expression profiles of CKD patients were compared with normal controls, and differentially expressed lncRNAs and mRNAs were identified. Coexpression network based on the enriched differentially expressed mRNAs and lncRNAs was constructed using weighted correlation network analysis (WGCNA) to identify important modules related to CKD. A lncRNA-miRNA-mRNA-pathway network based on the hub lncRNAs and mRNAs, related miRNAs, and overlapping pathways was further constructed to reveal putative biomarkers for CKD.

Material and Methods
Microarray data and data preprocessing The microarray data GSE48944 for CKD was downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus database (GEO) on May 7, 2018 (7). The testing platform was Affymetrix Human Genome U133A 2.0 (GPL571: HG-U133A_2). GSE48944 included 13 human CKD samples and 12 control (CTRL) samples. Microarray raw data (.CEL files) of the CKD were processed with oligo Version 1.41.1 (http:// www.bioconductor.org/packages/release/bioc/html/oligo. html) in R language to achieve an approximate normal distribution (8). Subsequently, the data were standardized using the median normalization and quantile methods.

WGCNA analysis to screen specific modules and RNAs
WGCNA, a biology systems method, can be used to construct a scale-free network from gene expression data. WGCNA package Version 1.61 in R (https://cran.r-project. org/web/packages/WGCNA/) was used to screen specific modules and RNAs related to CKD (16)(17)(18). The modules were detected using dynamic tree cut algorithm with a minimum module size of 100 and a minimum cut height of 0.95 (P value o0.05). The significantly differentially expressed mRNAs and lncRNAs were mapped to modules in WGCNA. Fold enrichments and P values of significantly differentially expressed mRNAs and lncRNAs in modules were calculated using the hypergeometric test of f(k,N,M, n) = C(k,M) * C(n-k,N-M) / C(n,N) (19). A fold enrichment 41 and P value o0.05 were used as the cut-off criteria.

Network construction of enriched differentially expressed mRNAs and lncRNAs
Pearson's correlation coefficient (PCC) of the enriched differentially expressed mRNAs and lncRNAs in WGCNA modules was calculated using the cor function in R (http:// 77.66.12.57/R-help/cor.test.html), and the co-expression network was constructed. Cytoscape 3.6.1 (http://www. cytoscape.org/) was used to visualize the co-expression network of enriched differentially expressed mRNAs and lncRNAs in WGCNA modules (20). GO functional and KEGG pathway analyses for enriched differentially expressed mRNAs and lncRNAs in the co-expression network were searched using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://www. david.niaid.nih.gov) software. Three categories of biological process, cellular compartment, and molecular function were included in the GO terms.

Construction of KEGG pathway network related to CKD
KEGG pathways related to CKD were searched with Comparative Toxic Genomics Database 2017 update (CTD, http://ctd.mdibl.org/) using the keyword ''Chronic Kidney Disease''. These pathways were compared with KEGG pathways enriched in the co-expression network of differentially expressed mRNAs and lncRNAs, and then the overlapping pathways were obtained. A KEGG pathway network related to CKD was constructed using these overlapping pathways. We searched the related miRNAs using StarBase Version 2.0 database (http://starbase. sysu.edu.cn/), constructed lncRNA-miRNA-mRNA-pathway network by connecting the hubs, and screened the potential prognostic biomarkers of long non-coding RNAs for predicting the risk of CKD (21).

Overview of differential expression analysis (lncRNAs and mRNAs)
Firstly, the microarray raw data were standardized using the median normalization and quantile methods. Boxplots of microarray raw data before and after normalization are shown in Figure 1.
Next, we performed differentially expressed analysis to identify potential lncRNAs associated with CKD. A total of 130 lncRNAs and 12,126 protein-coding genes were obtained after re-annotation, as shown in Table S1. A total of 821 significantly differentially expressed mRNAs and lncRNAs were screened using Limma (http://www.biocon ductor.org/packages/release/bioc/html/limma.html), including 205 downregulated and 616 upregulated between CKD and CTRL ( Figure 2A and Table S2). The expression level of these significantly differentially expressed mRNAs and lncRNAs are shown in Table S3. Bilateral hierarchical clustering of significantly differentially expressed mRNAs and lncRNAs are shown in Figure 2B. The expression level of RNAs could discriminate CKD and CTRL samples, indicating distinctive features between these samples. GO functional and KEGG pathway analyses for 802 differentially expressed coding RNAs indicated that 32 GO functions (15 biological processes, 12 cellular compartments, and 5 molecular functions) and 15 KEGG pathways were enriched. As shown in Table 1 and Figure 3, these differentially expressed coding RNAs were related to GO functions of wound response, immune response,   and inflammatory response, as well as KEGG pathways of complement and coagulation cascades (hsa04610), allograft rejection (hsa05330), and cell adhesion molecules (hsa04514), etc.
Identification and characterization of CKD-associated modules using WGCNA The underlying molecular mechanisms of CKD could be explored through an approach of gene co-expression network. It allows us to explore a set of interacting mRNAs and lncRNAs measured by modules or subnetworks that are involved in the complex disease of CKD. The WGCNA R package implements a suite of tools, which can be used for the network construction. A weighted adjacency matrix implemented in WGCNA was used to construct a scalefree network and identify important modules related to CKD. In this work, a step-by-step network construction and module detection method was used, and a selected power (power=12) was determined through a soft-threshold approach implemented in WGCNA, as shown in Figure 4. Co-expression modules were defined by a robust dynamic hierarchical tree and sets of tightly co-regulated genes with the measurement of dissimilarity. The minimum module size of 100 and a minimum cut height of 0.25 were set to ensure a qualified number of genes for the further analysis. Nine co-expression modules were obtained by clustering the highly co-expressed genes in the constructed network ( Table 2). With each clustered module showing a different color, they were visualized as shown in Figure 5. Differently expressed mRNAs and lncRNAs were significantly enriched in brown and yellow modules, including 151 and 54 RNAs (containing 7 lncRNAs), respectively. These 205 differently expressed mRNAs and lncRNAs enriched in brown and yellow modules were used for subsequent investigation. The list of these RNAs in brown and yellow modules are provided in Table S4.

Network analysis revealed prognostic lncRNA biomarkers associated with CKD progression
To identify potential prognostic lncRNA biomarkers associated with CKD progression, we further constructed a co-expression network based on the enriched differentially expressed mRNAs and lncRNAs in brown and yellow modules. PCC of 197 differentially expressed mRNAs and 7 differentially expressed lncRNAs were calculated (Table  S5). As shown in Figure 6, lncRNA-mRNA co-expression network contained 462 lines (250 with negative correlation and 212 with positive correlation) and 204 hubs. Among these 7 lncRNAs, 1 lncRNA was upregulated and 6 lncRNAs were downregulated, while 53 mRNAs were downregulated and 144 mRNAs were upregulated.
GO functional and KEGG pathway analyses using DAVID for differentially expressed mRNAs indicated 32 GO functions (15 biological processes, 6 cellular compartments,       and 11 molecular functions) and 10 KEGG pathways were enriched (Table 3 and Figure 7). The differentially expressed mRNAs in the lncRNA-mRNA co-expression network were related to GO functions of wound response, defense response, immune response, and inflammatory response. Moreover, the enriched KEGG pathways included complement and coagulation cascades (hsa04610), ECMreceptor interaction (hsa04512), cytokine-cytokine receptor interaction (hsa04060), and cell adhesion molecules (hsa04514), etc.
KEGG pathway network further revealed the prognostic lncRNA biomarkers related to CKD Using the keyword ''Chronic Kidney Disease'', 175 KEGG pathways related to CKD were searched with CTD, as shown in Table S6. These pathways were compared with KEGG enriched pathways in the co-expression network of differentially expressed mRNAs and lncRNAs, and 7 overlapping pathways were obtained (Table 3, indicated with *). A lncRNA-mRNA-pathway network related to CKD was constructed using these overlapping pathways, as shown in Figure 8A. The miRNAs related to 7 lncRNAs in the network were searched using StarBase Version 2.0 database, and 24 lncRNA-miRNA linking relations were found, involving two lncRNAs of HCP5 and NOP14-AS1 (Table S7). Next, a lncRNA-miRNA-mRNApathway network was constructed by connecting the has-miR-29a/b/c, HCP5, and 4 KEGG pathways. As shown in Figure 8B, the four genes of CCND2, COL3A1, COL4A1, and RAC2, positively related with HCP5, were found to participate in four KEGG pathways directly related to CKD. As shown in Figure 9, CCND2, COL3A1, COL4A1, RAC2, and HCP5 were significantly upregulated in GSE48944 and GSE47184 (besides RAC2 in GSE47184).

Discussion
The results of this study indicated that genes with similar functions within modules could contribute to the risk of CKD in a co-expression manner, and the modules with different functions could be regulated synergistically.
In order to identify potential critical lncRNAs and genes associated with CKD, we focused our attention on hub lncRNAs and genes in the lncRNA-miRNA-mRNA-pathway network. Downregulated has-miR-29a/b/c was reported to be related to CKD directly (22). NOP14-AS1 (NOP14 antisense RNA 1) is affiliated with the non-coding RNA class. Diseases associated with NOP14-AS1 include astrocytoma. Previous research identified that NOP14-AS1 was strongly regulated by DNA damaging agents and significantly negatively correlated with its sense gene NOP14 in a p53dependent manner. Hence, sense-antisense pair of NOP14-AS1/NOP14 might be involved in the progression of CKD upon DNA damage induction. The long non-coding RNA-HLA complex P5 (HCP5) has been found to be overexpressed in follicular thyroid carcinoma, which functions as a competing endogenous RNA and acts as a sponge for several miRNAs (23). The lncRNA-miRNA-mRNA-pathway network constructed indicated that HCP5 might be regulatory genes associated with the progression of CKD via the genes of CCND2, COL3A1, COL4A1, and RAC2. The CCND2 gene encodes a protein G1/S-specific cyclin-D2 in humans, which belongs to the highly conserved cyclin family. Cyclins function as regulators of cyclin-dependent kinases. High level expression of CCND2 gene was observed in ovarian and testicular tumors (24). COL3A1 gene encodes a protein of collagen alpha-1(III) chain, a precursor to collagen III, which is found in extensible connective tissues, frequently in association with type I collagen (25). The COL4A1 gene contains one of 27 SNPs associated with increased risk of coronary artery disease (26). RAC2 gene encodes a small (B21 kDa) signaling G protein, a member of the Rac subfamily of the Rho family of GTPases, which has been shown to interact with nitric oxide synthase 2A (27). Thus, these are important genes for CKD.
In summary, this work revealed that lncRNA of NOP14-AS1 and HCP5 might be potential genetic biomarkers for the progression of CKD. They may function via the genes of CCND2, COL3A1, COL4A1, and RAC2, and the pathways of complement and coagulation cascades (hsa04610), ECM-receptor interaction (hsa04512), cytokine-cytokine receptor interaction (hsa04060), and cell adhesion molecules (hsa04514), etc. Although a comprehensive analysis was performed, there were a number of limitations in the work. Functional validation was not done for the hub lncRNAs and mRNAs obtained. Further investigations are required with substantial experiments. Nevertheless, this work provides novel insights into the occurrence and progression of CKD.

Supplementary Material
Click here to view [xls].