The gene expression profiles of induced pluripotent stem cells (iPSCs) generated by a non-integrating method are more similar to embryonic stem cells than those of iPSCs generated by an integrating method

Induced pluripotent stem cells (iPSCs) obtained by the ectopic expression of defined transcription factors have tremendous promise and therapeutic potential for regenerative medicine. Many studies have highlighted important differences between iPSCs and embryonic stem cells (ESCs). In this work, we used meta-analysis to compare the global transcriptional profiles of human iPSCs from various cellular origins and induced by different methods. The induction strategy affected the quality of iPSCs in terms of transcriptional signatures. The iPSCs generated by non-integrating methods were closer to ESCs in terms of transcriptional distance than iPSCs generated by integrating methods. Several pathways that could be potentially useful for studying the molecular mechanisms underlying transcription factor-mediated reprogramming leading to pluripotency were also identified. These pathways were mostly associated with the maintenance of ESC pluripotency and cancer regulation. Numerous genes that are up-regulated during the induction of reprogramming also have an important role in the success of human preimplantation embryonic development. Our results indicate that hiPSCs maintain their pluripotency through mechanisms similar to those of hESCs.


Introduction
Induced pluripotent stem cells (iPSCs) are derived from somatic cells by transfecting two pluripotent transcription factors, Oct4 (O) and Sox2 (S), and two protooncogenes, c-Myc (M) and Klf4 (K). These four transcription factors globally reset the epigenetic and transcriptional state of fibroblasts into that of pluripotent cells (Takahashi et al., 2007). This technology provides alternative pluripotent cells that closely resemble blastocyst-derived embryonic stem cells (ESCs), which are considered the gold standard for stem cells (Takahashi et al., 2007;Kang et al., 2010). The replacement of ESCs with iPSCs in the field of regenerative medicine is based on the assumption that iPSCs are as potent as ESCs in their ability to differentiate and in their safety for clinical applications (Boue et al., 2010). Mouse iPSCs have the same functional characteristics as mouse ESCs, as shown by their capacity to generate mice in tetraploid complementation experiments (Boland et al., 2009;Kang et al., 2009;Zhao et al., 2009). In contrast, this convincing pluripotency test is difficult to execute in human iPSCs (hiPSCs). Genome-wide profiling analysis of gene expression (Ghosh et al., 2010), DNA methylation patterns (Doi et al., 2009) and differentiation properties have detected incomplete reprogramming in hiPSCs. These findings suggest that there are substantial differences between hESCs and hiPSCs.
The advantages and disadvantages of the delivery method for each factor have been discussed elsewhere (Achiwa et al., 2005;Gonzalez et al., 2011). Since the first report on iPSCs produced by retroviral delivery of four factors (OSKM), a substantial number of alternative approaches have been developed to induce pluripotency. In this report, we describe a meta-analysis of gene expression information from multiple independent but related studies (summarized in Table 1). For this, we compared the transcription signatures of hiPSCs generated by different methods and transcriptional factors, with hESCs serving as the gold standard. We also determined the detailed molecular events involved in human cell reprogramming by comparing the transcriptomes of hiPSCs and fibroblasts.  platform (Affymetrix) were obtained online at the Gene Expression Omnibus (GEO), a public repository for a wide range of high-throughput experimental data. The donor cells and different hiPSC lines are summarized in Table 1.

Microarray analysis
We imported datasets from GEO into GeneSpring GX 11.0 using a guided workflow step to identify potential targets that were both statistically and biologically meaningful. Probe sets with gene-level normalized intensities greater than log (base 2) of 5.0 in a least one sample were excluded from ANOVA. The data were then filtered based on their flag values (P -present and A -absent) to remove probe sets for which the signal intensities for all the treatment groups were in the lowest 20 percentile of all intensity values. ANOVA in conjunction with the Benjamini-Hochberg FDR multiple test correction was used to identify genes that were differentially expressed between different groups. The level of significance was set at p < 0.05.

Gene ontology (GO) annotation and pathway analysis
The functions of up-or down-regulated genes in iPSCs vs. somatic cells were investigated by using the Da-tabase for Annotation, Visualization and Integrated Discovery (DAVID) v 6.7 (Huang et al., 2009) based on gene ontology (GO) (Ashburner et al., 2000) annotations. In addition, groups of genes associated with specific pathways (based on the Kyoto Encyclopedia of Genes and Genomes -KEGG) were analyzed together to assess pathway regulation during reprogramming.

Network analysis
We investigated the possible functional associations between the top 484 noticeably significant unregulated genes in iPSCs compared with fibroblasts using the STRING database (STRING score of at least 0.5) (von Mering et al., 2007). Gene networks for which there was high confidence as interacting partners were visualized using MEDUSA (Hooper and Bork, 2005).

Results
Comparative global transcriptomic analysis of iPSCs and ESCs (including synthetic modified mRNA, episomes, proteins and minicircles) and those involving the integration of exogenous transcription factors (lentiviral and retroviral methods and inducible reprogramming systems). Most (75%) of the iPSCs analyzed in this study used fibroblasts as the donor cell type. ANOVA was used to determine the degree of reprogramming within hiPSCs derived using different methods of induction and transcription factors, and to examine the "distance", i.e., number of differentially expressed genes (based on cut-off criteria of p < 0.05 and a fold-change = 2), among hESCs, hiPSCs and their corresponding donor cells (Figure 1). To eliminate the influence of micro-environmental factors associated with different laboratories and the genetic background of donor cells, the differentially expressed genes were identified by comparing iPSCs and ESCs derived from the same laboratory and donor animals of the same sex (Table 1). Table S1 (Supplementary material) provides a detailed list of the genes that were differentially expressed between iPSCs and ESCs.
We also analyzed the relationship between the "distance" of iPSCs vs. ESCs and the method used to deliver the transcription factor(s). iPSCs generated by integrating viral vectors (moloney-based retrovirus and HIV-based lentivirus) were not as close to ESC lines as iPSCs generated by non-integrating methods (episomes, synthetic modified mRNA, proteins and minicircle DNA) (Figure 2A). The type of transcription factor used had little impact on the gene expression signature of iPSCs ( Figure 2B). No overlapping genes were differentially expressed between hESCs and hiPSCs derived from various reprogramming experiments, i.e., there were no consistent differences in the global gene expression between human ESCs and iPSCs. These findings supported the idea that reprogramming progressed through a series of stochastic events to produce pluripotency.

Functional analysis of significantly altered genes between iPSCs and donor cells
The detailed molecular events involved in reprogramming to produce iPSCs remain largely unknown. To address this issue, we undertook an in-depth analysis of the biological functions of differentially expressed genes in all 20 iPSC lines vs. donor fibroblasts; the selection criteria were again p < 0.05 (Student's t-test) and at least a two-fold difference in gene expression. Table 1 summarizes the number of differentially expressed genes between the iPSC lines and the original cell lines. Of these, 312 genes upregulated in each iPSC line were compared with fibroblasts (Table S2). We defined the 312 up-regulated probes as essential for maintaining the pluripotency of hiPSCs (EMP genes). The STRING database was used to visualize all known functional interactions between EMP genes in iPSC lines using the default cutoff suggested by STRING. One hundred and fifty-nine genes in this set (32%) interacted with each other (Figure 3). The functional network of genes with higher expression levels in iPSCs showed a central, highly interconnected area in which common pluripotency regulators such as Pou5f1, Nanog, Lin28, Dnmt3 and Dppa4 were identified. This finding indicated that hiPSCs and hESCs shared a similar core network to maintain pluripotency. The absence of Sox2 in this analysis reflects the fact that Marchetto et al. (2009) used mouse neural stem cells (NSCs), which have a high endogenous expression of Sox2, as the donor cell lines to induce reprogramming. Hence, Sox2 was not included in the 312 genes unregulated in iPSCs. This protein interaction network for pluripotency provides a model for exploring neo-factors that may enhance the induction of reprogramming.
We took advantage of a recently published microarray dataset (Xie et al., 2010) to study the dynamic changes in EMP genes during mammalian preimplantation em- 696 Liu et al. bryonic development (Table S3). One hundred and twenty EMP genes, including Pou5f1 Dppa4 and Lin28, were up regulated during the transitional phase from the four-cell stage to the eight-cell stage of human early embryonic development, known as the human zygotic genome activation period (Hoffert et al., 1997) (Figure 4). This pluripotent network, which is essential for maintaining the selfrenewal of iPSCs, also plays a pivotal role in establishing embryos in vivo. The 101 EMP genes that were downregulated during the process could contribute to the differentiation of stem cells in vivo and in vitro.
The functions associated with genes that were significantly altered in reprogramming were examined by analyzing the over-represented annotations and pathways using DAVID, with a cut-off criterion of p < 0.01. The overrepresented GO terms focused on "regulation of transcription" and "regulation of cell proliferation" (Table S4). The results of this analysis supported the idea that an increase in proliferation rate was necessary for fully cellular reprogramming .
We also analyzed whether significant pathways in iPSCs were enriched in significantly altered genes. The results showed that hiPSCs were responsive to the TGF-b signaling pathway that regulates the maintenance of pluripotency, self-renewal and proliferation of hESCs (Table  S4). These results demonstrated that hiPSCs reprogrammed from somatic or embryonic cells relied on similar signaling pathways to control their pluripotency.

Discussion
The results described herein show that the overall transcriptional profiles of different human iPSC lines shared a common "signature" with hESCs, although there were certain differences. Notably, the transcriptomes of hiPSCs produced by a delivery method that avoided geno-Gene expression profiles of iPSCs and ESCs 697 Figure 3 -Predicted stem-cell-specific protein-protein interaction network of genes with higher expression levels in iPSCs compared to somatic cells.
mic integration shared a greater gene expression signature with hESCs than did iPSCs produced by a virus-based method. Gene-delivery methods can affect the quality of the resulting iPSCs by influencing the amount, balance, continuity and silencing of transgene expression. Potent oncogenes such as myc apparently have little effect on the transcriptional signature of iPSCs. Our findings provide a basis for selecting the most suitable method for clinical or basic applications and a better understanding of the reprogramming process.
This study also improves our understanding of the mechanisms of cellular reprogramming. The transcriptional network maintains the self-renewal and pluripotency of iPSCs established primarily during preimplantation at the stage of zygote genome activation. Detailed analysis showed that increased proliferation and the upregulation of genes that drive the cell cycle are necessary events for fibroblast reprogramming. Recent reports have shown that hiPSCs are more tumorigenic than hESCs based on a comparison of protein-coding point mutations (Gore et al., 2011), copy number variations (Hussein et al., 2011) and DNA methylation (Lister et al., 2011). Together, these results stress the link between pluripotency and tumorigenicity. Given that self-renewal is a hallmark of ESCs and cancer cells, the ability to induce tumors during cellular reprogramming implies that there are potential risks involved in the use of iPSCs for regenerative therapy.
In addition, non-coding RNA, including microRNA (miRNA) and large intergenic non-coding (lincRNA), which may represent a distinct layer to fine-tune the transcriptional network of stem cells, has a role in modulating the induction of reprogramming (Judson et al., 2009;Loewer et al., 2010). Significantly, recent work has shown that a single miRNA cluster rapidly reprogrammed mouse and human fibroblasts into iPSCs and totally avoided the use of transcription factors (Anokye-Danso F et al., 2011). The mechanism underlying reprogramming by miRNA differs from that of transcription factor-induced reprogramming in that there is no requirement for protein translation; the former method also targets hundreds of ESC-related mRNAs directly.
In conclusion, we have examined the gene expression profiles of iPSCs obtained by different methods and from donor cell of different of origins. iPSCs produced by nonintegrative methods are more closely resembled the fully reprogrammed pluripotent state than did iPSCs obtained by using integrative delivery systems, although the efficiency and kinetics were lower. Some of the results described here may reflect the markedly different circumstances in which they were generated, e.g., the culture conditions, the passage number at which the cells were used and the age of the donor cells. Another limitation in our analysis was that only the initial state (donor cell) and end state (pluripotent cell) of reprogramming were examined.
Further research on each aspect of reprogramming, e.g., the initial transcriptional response to the induction of reprogramming, the epigenetic roadblocks, the partially pluripotent state and the late events leading to pluripotency, is required in order to understand how reprogramming leads to pluripotency. A comprehensive understanding of the events involved in reprogramming a set of iPSCs can only be reached by examining the changes in the corresponding transcriptome (protein coding RNA, microRNA and lincRNA expression), epigenome (genome imprint, X chromosome activation, histone modifications and DNA methylation), metabolome and proteome.

Supplementary Material
The following online material is available for this article: Table S1 -Genes differentially expressed between iPSC lines and their original donor cells. This material is available as part of the online article from http://www.scielo.br/gmb.

Associate Editor: Carlos F.M. Menck
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.