Bioinformatics analysis of biomarkers and transcriptional factor motifs in Down syndrome

In this study, biomarkers and transcriptional factor motifs were identified in order to investigate the etiology and phenotypic severity of Down syndrome. GSE 1281, GSE 1611, and GSE 5390 were downloaded from the gene expression ominibus (GEO). A robust multiarray analysis (RMA) algorithm was applied to detect differentially expressed genes (DEGs). In order to screen for biological pathways and to interrogate the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, the database for annotation, visualization, and integrated discovery (DAVID) was used to carry out a gene ontology (GO) function enrichment for DEGs. Finally, a transcriptional regulatory network was constructed, and a hypergeometric distribution test was applied to select for significantly enriched transcriptional factor motifs. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were each up-regulated two-fold in Down syndrome samples compared to normal samples; of these, SON and TTC3 were newly reported. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were located on human chromosome 21 (mouse chromosome 16). The DEGs were significantly enriched in macromolecular complex subunit organization and focal adhesion pathways. Eleven significantly enriched transcription factor motifs (PAX5, EGR1, XBP1, SREBP1, OLF1, MZF1, NFY, NFKAPPAB, MYCMAX, NFE2, and RP58) were identified. The DEGs and transcription factor motifs identified in our study provide biomarkers for the understanding of Down syndrome pathogenesis and progression.


Introduction
Down syndrome, the most frequent genetic cause of mental retardation occurring in newborns, results from the presence of three copies of chromosome 21 (trisomy 21) (1). This imbalance of 300 genes causes dysfunctions in developmental and physiological processes, leading to a complex phenotype defined by several clinical features, which are variable in their number and intensity (2). It is typically associated with physical growth delays, a particular set of facial characteristics, and a severe degree of intellectual disability (3). Children with Down syndrome may have severe mental retardation (4) and developmental delay, and they are prone to gastrointestinal malformations (5). At present, there is no effective drug for treatment of the disease, and, because prenatal diagnosis is the most effective way to avoid the birth of children with Down syndrome, it is important to study the pathogenesis of this disease.
A change in gene expression occurs in trisomy 21 (6). Antonarakis et al. showed that some characteristics of the Down syndrome phenotype can be related to an increase in expression of two HSA21 genes: DSCR1-RCAN1 (regulator of calcineurin activity 1) and the protein kinase DYRK1A (dual-specificity tyrosine phosphorylationregulated kinase). In the developing brain, candidate genes would be involved in neurogenesis, neuronal differentiation, myelination, or synaptogenesis (7). In Down syndrome, aberrant expression of CRLF2 is associated with mutated JAK2, suggesting that blocking the CRLF2/JAK2 pathway may be an effective method for Down syndrome therapy (8).
Because the abnormal copy number of chromosome 21 is the main genetic characteristic of the disease, we applied a variety of bioinformatics tools to determine biomarkers of Down syndrome and the transcriptional regulatory network. In the process of identifying differentially expressed genes (DEGs), those genes that showed the greatest up-regulation were selected as biomarkers for Down syndrome. GO (gene ontology) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment for DEGs were used to identify the significantly enriched biological terms and pathways. Finally, a hypergeometric distribution test was applied to select the significantly enriched motifs by transcriptional regulatory network construction.

Data source
Raw gene expression data of Down syndrome samples and normal samples were downloaded from the Gene Expression Omnibus (GEO) (http://www.ncbi. nlm.nih.gov/geo/) of the National Center for Biotechnology Information (NCBI). GSE 1281 (9) and GSE 1611 (10) were from the following GLP81 platform: Affymetrix Murine Genome U74A Version 2 Array. GSE 1281 included six Down syndrome samples and six normal samples, which were from newborn rat brain tissue; GSE 1611 contained 12 samples (Ts1Cje and euploid Down syndrome cerebellum) as follows: two Down syndrome samples at postnatal days 15 and 30, respectively, and two normal samples of newborn at postnatal days 15 and 30, respectively. GSE 5390 (11) was from the following GPL96 platform: [HG-U133A] Affymetrix Human Genome U133A Array, including seven Down syndrome samples from adult humans with Down syndrome and eight normal samples.

Data preprocessing
The robust multiarray analysis (RMA) algorithm (12) in Affymetrix Power Tools (APT; http://www.affymetrix.com/) was applied to perform background correction and standardization for all raw data, aiming to filter falsepositive data. The applied criterion was as follows: at least half the samples had PLIER signal intensity values greater than 100 (13,14).

Screening of DEGs
The screening criteria for DEGs were as follows: signal intensity of Down syndrome samples to normal samples was up-regulated or down-regulated 1.2 times (RMA conversion was 0.26; P,0.05), and there was at least one group in which the PLIER signal intensity value of half the samples was greater than 100. In order to obtain more DEGs, data in GSE 1281 and GSE 1611 were integrated and normalization performed, while differential analysis of data in different tissue and development stages of Down syndrome samples compared with normal samples were performed separately, and then differential results were combined. We screened up-and down-regulated genes according to the sequence conservation (15) in human and rat brains. Finally, the genes showing the greatest degree of up-regulation (such as the genes whose signal intensity increased or decreased 1.5 times and with RMA conversion values greater than 0.58) were selected for study.

Gene function annotation
The database for annotation, visualization, and integrated discovery (DAVID) (16) was applied to perform GO function enrichment for DEGs with signal intensity increased or decreased 1.2 times and with RMA where the conversion value was greater than 0.26, and then the significantly enriched GO terms [P#0.05, false discovery rate (FDR)#10] and KEGG pathways (P#0.05) were identified.

Construction of the transcriptional regulatory network
To construct the transcriptional regulatory network, the sequence from 1 kb upstream to 200 bp downstream at the 59UTR transcription start site of DEGs was extracted, and the JASPAR database (17) was used to search for transcription factor binding sites (score threshold: 0.95). In addition, because the JASPAR database is limited, the transcriptional regulatory network was constructed combined with transcription factor binding sites in the University of California Santa Cruz (USA) genome browser database (18). Finally, a hypergeometric distribution test (19) was applied to select the significantly enriched motifs (P#0.01).

Identification of DEGs and biomarkers
A total of 581 DEGs were identified, in which 227 DEGs were up-regulated and 354 DEGs were downregulated ( Figure 1A). The distribution of DEGs in different development stages of Down syndrome is shown in Figure 1A. A total of 2180 DEGs were identified, in which 2010 were up-regulated and 800 were downregulated ( Figure 1A).
Generally speaking, the relationship between the transcription factors and their regulatory target genes in human and mouse were conserved. A total of 50 upregulated and 23 down-regulated DEGs were identified according to orthologous relationships of human vs mouse. Nine up-regulated genes CDKN1C (cyclin-dependent kinase inhibitor 1C), DCN (decorin), HMGN1 (non-histone chromosomal protein HMG-14), HSD17B11 (estradiol 17beta-dehydrogenase 11), LGALS3BP (galectin-3-binding protein), MT2 (metallothionein-2R), RCAN1, SON [a large Ser/Arg (SR)-related protein], and XIST (X-inactive specific transcript) were up-regulated at least 1.5 times, and the RMA conversion value was greater than 0.58. Figure 2A demonstrates the distribution of 50 upregulated genes and 23 down-regulated genes in both human and mouse in the human chromosome. Eight genes of 50 up-regulated DEGs were distributed in human chromosome 21 (mouse chromosome 16), while no downregulated DEGs were distributed in human chromosome 21 (Table 1). Molecular interactions of the following eight genes are shown in Figure 2B: CBR1 (carbonyl reductase 1), DYRK1A, HMGN1, ITSN1 [intersectin 1 (SH3 domain protein)], RCAN1, SON, TMEM50B (transmembrane protein 50B), and TTC3 (tetratricopeptide repeat protein 3). Information on interactions was selected from the BIOGRID database (20), but TMEM50B had no interaction information.

Gene function and KEGG pathway annotation
The DAVID functional annotation of 73 DEGs is shown in Figure 3 (P#0.05 and FDR#10). The most significantly enriched biological processes were macromolecular complex subunit organization, cellular ion homeostasis, and synaptic vesicle endocytosis (Table 2). A total of 73 KEGG pathways of DEGs is shown in Table 3 (P#0.05). The most significantly enriched KEGG pathways were focal adhesion, natural killer cell mediated cytotoxicity, and Alzheimer's disease.

Discussion
Down syndrome is a genetic condition in which a person has 47 chromosomes instead of the usual 46. In most cases, Down syndrome occurs when there is an extra copy of chromosome 21. In this study, we identified 581 DEGs in Down syndrome compared with normal tissue, and 8 DEGs were distributed in human chromosome 21. Finally, we identified 11 transcription factor motifs.
HMGN1, RCAN1, and SON were distributed in human chromosome 21. HMGN1 and RCAN1 have been reported to be associated with Down syndrome (21,22), but the relationship between SON and Down syndrome has not been identified. HMGN1 could regulate expression of methyl CpG-binding protein 2 (MeCP2), and may lead to neurodevelopmental disorders (23). Up-regulated expression of RCAN1 was reported to cause Down  syndrome-like immune dysfunction (24). The differential expression of SON was significant during developmental stages in human and mouse (except postnatal day 15; others were more than 1.5 times). SON is an SR-type protein involved in mRNA processing and gene expression, which is a splicing cofactor that contributes to an efficient splicing within cell cycle progression (25). Some SR proteins have been reported to influence the selection  of alternative 59 splice sites (26). All results suggested that SON may most likely be a candidate gene. Another five genes (CBR1, DYRK1A, ITSN1, TMEM50, and TTC3) were also distributed in human chromosome 21. A previous study reported that the mRNA level of CBR1 in Down syndrome samples was 1.8-fold greater than that in normal samples. In agerelated neurodegenerative diseases, DYRK1A can regulate the expression of RCAN1 (27). Trisomic mice injected in the hippocampus with short hairpin RNA against DYRK1A showed attenuation of synaptic plasticity defects (28). Recently, De la Torre et al. (29) reported that the DYRK1A inhibitor EGCG (the green tea flavonol, epigallocatechin gallate) could inhibit the activity of DYRK1A in the hippocampus area to prevent cognitive deficits in Down syndrome mouse models. ITSN and TMEM50 have been reported to be related to Down syndrome (30,31). Although there is no literature about TTC3 and Down syndrome, TTC3 is related to another seven genes distributed in human chromosome 21 (32). Therefore, TTC3 is also a very likely candidate gene for Down syndrome.
In GO function annotation, some genes involved with regulation of neurological system processes were enriched in Down syndrome samples. Also, during KEGG pathway annotation, there were some genes related to Alzheimer's disease (a nervous system disease) (33). PRKCZ (potein kinase C, zeta), EDNRB (endothelin receptor type B), CDK5 (cell division protein kinase 5), and CACNA1A (cav2.1 P/Q voltage-dependent calcium channel) were related to nervous system diseases. PRKCZ is thought to be responsible for maintaining long-term memory in the brain (34). EDNRB is a G protein-coupled receptor, which activates a phosphatidylinositol-calcium second messenger system (35). CDK5 is involved in the processes of neuronal maturation and migration, phosphorylating the key intracellular adaptors of the reelin signaling chain (36). CACNA1A is also involved in neurotransmitter release (37). These genes are involved in the nervous system and they are likely related to Down syndrome, and further studies are needed to confirm this.
In addition, 11 significantly enriched transcription factor motifs (PAX5, EGR1, XBP1, SREBP1, OLF1, The nine highest-ranked terms showing significant differential expression are listed. FDR: false discovery rate. The four highest-ranked terms showing significant differential expression are listed. FDR: false discovery rate. MZF1, NFY, NFKAPPAB, MYCMAX, NFE2, RP58) were identified. XBP1, SREBP1, OLF1, NFY, NFKAPPAB, MYCMAX, NFE2, and RP58 were reportedly not associated with nervous system disease, whereas, in children with Down syndrome and acute lymphoblastic leukemia, PAX 5 was found to be missing (38). Combined with our results, we considered that PAX 5 may be involved in Down syndrome. In the hippocampus region of simian immunodeficiency virus encephalitis-induced neural dysfunction, EGR1 is down-regulated (39). MZF1 is predominantly expressed in neuronal tissue, and mutations in this gene are associated with neurological disorders, providing a potential link between this kinase and neurodegeneration (40).