Differential gene expression profiles of hepatocellular carcinomas associated or not with viral infection

Chronic hepatitis B (HBV) and C (HCV) virus infections are the most important factors associated with hepatocellular carcinoma (HCC), but tumor prognosis remains poor due to the lack of diagnostic biomarkers. In order to identify novel diagnostic markers and therapeutic targets, the gene expression profile associated with viral and non-viral HCC was assessed in 9 tumor samples by oligo-microarrays. The differentially expressed genes were examined using a z-score and KEGG pathway for the search of ontological biological processes. We selected a non-redundant set of 15 genes with the lowest P value for clustering samples into three groups using the non-supervised algorithm k-means. Fisher’s linear discriminant analysis was then applied in an exhaustive search of trios of genes that could be used to build classifiers for class distinction. Different transcriptional levels of genes were identified in HCC of different etiologies and from different HCC samples. When comparing HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs non-viral (NV)-HCC, HBC-HCC vs NV-HCC, and HCV-HCC vs NV-HCC of the 58 nonredundant differentially expressed genes, only 6 genes (IKBKβ, CREBBP, WNT10B, PRDX6, ITGAV, and IFNAR1) were found to be associated with hepatic carcinogenesis. By combining trios, classifiers could be generated, which correctly classified 100% of the samples. This expression profiling may provide a useful tool for research into the pathophysiology of HCC. A detailed understanding of how these distinct genes are involved in molecular pathways is of fundamental importance to the development of effective HCC chemoprevention and treatment.


Introduction
Correspondence: M. Bellodi-Privato, Departamento de Gastroenterologia (LIM 37), FM-USP, Av.Dr. Arnaldo, 455, 01246-903 São Paulo, SP, Brasil.Fax: +55-11-3061-7270.E-mail: martaprivato@hotmail.com Received March 11, 2009.Accepted September 30, 2009.Available online November 9, 2009.Hepatocellular carcinoma (HCC), the most important primary malignant tumor of the liver, is one of the human cancers clearly linked to viral infection (1).The major risk factors for HCC are chronic hepatitis B virus (HBV) infection, chronic hepatitis C virus (HCV) infection, prolonged dietary exposure to aflatoxin, alcoholic cirrhosis, and cirrhosis due to other causes such as hereditary hemochromatosis (2).Some individuals who develop HCC are not infected with HCV or HBV, and do not have cirrhosis in the surrounding parenchyma.Although HCC mortality has significantly decreased with the development of new surgical techniques, about 60-100% of these patients ultimately suffer an HCC recurrence even after curative resection, and this has become the most important factor that limits the long-term survival of HCC patients.Shortage of organs and limited indications make transplantation a therapeutic method not frequently used for HCC (3).With advances in the understanding of tumor biology, interest in the molecular biomarkers of carcinogenesis has grown, both in terms of their prognostic significance and of their potential use as therapeutic targets (4).Several reports have provided information on multiple genetic changes such as chromosome aberrations, genetic alternations, and gene product abnormalities, which have been suggested to cause carcinoma of the liver (5)(6)(7).Considering the complexity of hepatocarcinogenesis, many genes are probably involved in the initiation and progression of this cancer, and comprehensive expression analysis using microarray technology has a great potential for the discovery of new genes involved in this process (8).Genome-wide gene expression analysis by microarray offers a systematic approach to gaining comprehensive information regarding transcription profiles (9).Although these genomic approaches have yielded global gene expression profiles in HCC, new biomarkers useful for cancer staging, prediction of prognosis, and treatment selection must now be identified (8).
The present study was designed to identify new biomarkers for HCC using microarray analyses in order to identify genes differentially expressed in HCV-or HBV-associated HCC, and in non-viral HCC.

Material and Methods
The study protocol (#633/06) was approved by the Ethics Committee of the School of Medicine, University of São Paulo.

Patients and tissues
We obtained liver tumor samples from 9 patients subjected to hepatic resection or liver transplantation for HCC.Tumor tissue samples were either flash-frozen in liquid nitrogen or placed in ribonucleic acid (RNA) stabilization fluid (RNAlater ® , Invitrogen, USA) and stored at -80°C.Of the 9 patients, 3 were HBs antigen-positive (group HBV-HCC; samples B1, B2 and B3), 3 were HCV antibody-positive (group HCV-HCC; samples C1, C2 and C3), and 3 were double-negative for the HCV antibody and HBs antigennon-viral HCC (group NV-HCC; samples N1, N2 and N3).
For NV-HCC patients, aflatoxin exposure, alcoholic cirrhosis, cirrhosis due to other causes such as hereditary hemochromatosis and nonalcoholic steatohepatitis were excluded as causes of the carcinoma.No patients had other causes of hepatocellular injury, as confirmed by clinical and laboratory findings.

Microarray experiments
At the time of RNA extraction, diagnosis of HCC was confirmed by H&E staining.Total RNA was isolated and purified from frozen liver tissues using the RNeasy mini kit (Qiagen, Germany), according to the manufacturer protocol.The quality of total RNA samples was analyzed by inspection of 18S and 28S rRNA bands following agarose gel electrophoresis.The concentrations of the RNA samples were quantified by measuring absorbance using a NanoDrop ND-1000 instrument (NanoDrop Technologies, USA).We utilized the CodeLink™ Human Whole Genome Bioarray (GE Healthcare Biosciences, UK) with ~57,000 human transcripts represented in a single bioarray.Briefly, 5 µg total RNA was first reverse transcribed to the singlestranded cDNA and subsequent cRNA was synthesized using the CodeLink™ Expression Assay Kit (GE Healthcare Biosciences).The cRNA targets were prepared by in vitro transcription using a single labeled nucleotide, biotin-11-UTP, in the in vitro reaction at a concentration of 1.25 mM.The concentration of unlabeled UTP was 3.75 mM, while the concentrations of GTP, ATP, and CTP were 5 mM in each case.The mixture was incubated at 37ºC overnight for 14 h.The labeled cRNA was then purified using the RNeasy™ mini kit (Qiagen) and subsequently fragmented in 1X fragmentation buffer (40 mM Tris-acetate, pH 7.9, 100 mM KOAc, and 31.5 mM MgOAc) at 94ºC for 20 min.
For hybridization, 10 µg fragmented cRNA in 260 µL hybridization solution was added to each bioarray and incubated for 18 h at 37ºC with shaking at 300 rpm in a shaking incubator.Immediately following hybridization, the bioarrays were washed and stained with Cy5™-streptavidin (GE Healthcare Biosciences) and scanned with a GenePix  4000B Array Scanner (Axon Instruments, USA).

Data processing and statistical analysis
After image acquisition, the fluorescence intensity signal of each spot was corrected by subtracting fluorescence intensity background (spots with signal level less than or equal to background were identified and excluded from the analysis).Next, background-subtracted spot intensities were normalized by the global mean normalization procedure (10).Replicate spots representing the same gene were identified, and average signal intensity was determined.Data analysis was performed using R (version 2.4.0), a free software environment for statistical computing and graphics (http://www.r-project.org),adapted to our needs.We searched our data for differentially expressed genes in the three groups (HCV-HCC, HBV-HCC and NV-HCC) using the Wilcoxon test.The level of significance was set at P < 0.01.Next, the differentially expressed genes were examined using a z-score and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (www.genome.jp/kegg) for searching biological process ontologies.The z-score was derived by dividing the difference between the observed number of genes meeting the criterion in a specific Gene Ontology term and the expected number of genes based on the total number of genes in the array meeting the criterion, and standardized by dividing by the standard deviation of the observed number of genes under the hypergeometric distribution.A positive z-score indicates that more genes than expected fulfilled the criterion in a given group or pathway; therefore, the respective group or pathway is likely to be affected (11).For clustering, based on the expression profile we selected the non-redundant set of 15 genes [6 genes from each HCC group (18 genes) with 3 redundant genes being excluded], with the lowest P value for clustering samples into three groups using the non-supervised algorithm k-means.Once clusters were obtained, samples were organized hierarchically based on their correlation distances (12).To determine classifiers, we used Fisher's linear discriminant analysis and carried out an exhaustive search of the entire dataset for trios of genes, such that data points representing signal intensity for all 3 genes for each sample were separated by a plane in a three-dimensional space.More precisely, for a given group of genes, this linear classification method searches for linear combinations of their expressions with large ratios of between-group to within-group sum of squares (12).This maximal ratio of sum of squares, or its square root, which is denoted here by singular value decomposition, measures how well separated the three groups are.For the search of trios, the 9-sample dataset was split into three groups, and we performed an exhaustive search for the best classification trios for each of the four comparisons of interest among HCV-HCC, HBV-HCC and NV-HCC samples.Trios were ranked according to their singular value decomposition and only trios with perfect classification were considered.

Results
To identify differences in gene expression between HCC of different etiologies we compared mRNA samples prepared from the three groups, i.e., HBV-HCC, HCV-HCC and NV-HCC.Using the Wilcoxon test, we identified differentially expressed genes for four comparisons: HBV-HCC vs HCV-HCC (1141 genes), HBV-HCC/HCV-HCC vs NV-HCC (2257 genes), HBV-HCC vs NV-HCC (1671 genes) and HCV-HCC vs NV-HCC (1584 genes).This set of filtered genes was stored and will be considered for further studies.Considering the multiplicity of gene selection, we used a second filtering criterion of at least a 2.0-fold change in expression, the Student t-test (P < 0.05) and the z-score parameter in order to classify the differentially expressed genes into known signaling pathways derived from the KEGG biological process.Genes were classified into several families according to their function after separating each HCC group.Twelve, 60, 76, and 49 genes were differentially expressed while 9, 43, 45, and 31 were non-redundant for the comparisons HBV-HCC vs HCV-HCC, HBV-HCC/ HCV-HCC vs NV-HCC, HBV-HCC vs NV-HCC, and HCV-HCC vs NV-HCC, respectively, considering all selected pathways.Next, we selected only the non-redundant genes considering all comparisons.The functional categories and up-regulated or down-regulated genes in each comparison are summarized in Table 1.A non-supervised clustering method was used to determine whether the 15 genes with the lowest P value for each HCC group would be capable of grouping samples based on their expression profiles.Using the k-means algorithm (12), samples were grouped into three clusters on the basis of the expression profile of 15 genes that were non-redundant among the 6 genes with the lowest P value for all comparisons.The unsupervised hierarchical clustering analysis of all HCC samples was based on the similarity of the expression patterns for all genes.As shown in Figure 1, all HCC samples were fully clustered into three distinct groups HBV-HCC, HCV-HCC and NV- Up-and down-regulation of non-redundant genes are defined as expression in HCC tissues considering HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs NV-HCC, HBC-HCC vs NV-HCC, and HCV-HCC vs NV-HCC comparisons.HBV = hepatitis B virus; HCV = hepatitis C virus; HCC = hepatocellular carcinoma; NV = non-viral.
HCC, according to the serological and histological analysis described in Material and Methods, corroborating their unique expression profile.
In order to validate the hierarchical clustering analysis, we next applied another approach, this time using an ex-haustive search for trios of genes, to precisely separate all tumor samples, with perfect class distinction on the basis of the expression signature of each individual sample.Using the signal intensity of all genes selected previously by the Wilcoxon test, we then applied the Fisher's linear discrimi- nant analysis (12) and identified all possible trios of genes that correctly separate, without misclassifications, tissue samples in each of four possible comparisons: HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs NV-HCC, HBV-HCC vs NV-HCC, and HCV-HCC vs NV-HCC.The number of trios found for each comparison was 9, 108, 19, and 14 trios for HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs NV-HCC, HBV-HCC vs NV-HCC and HCV-HCC vs NV-HCC, respectively, with perfect distinction (100%) of all samples.Figure 2 and Table 2 show the examples of trios that can classify HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs NV-HCC, HBV-HCC vs NV-HCC and HCV-HCC vs NV-HCC.

Discussion
Oligo-microarray technology has been extensively applied to cancer research (13), and expression profiling is being increasingly used for the distinction between physiological and disease states, as well as to distinguish between groups of disease samples for which the expression profile can discriminate between clinically or biologically similar entities (14).
Although structural alterations in many cancer-related genes have been found in HCC (15), the high number of genes involved suggests that different etiological factors may affect different gene subsets within hepatocytes.Thus, distinct but related genetic pathways may be altered during hepatocarcinogenesis, possibly due to different initiators and promoters.Multiple studies linking hepatitis viruses and chemical carcinogens to hepatocarcinogenesis have provided clues for the understanding of this molecular system (16).Several reports have differentially identified expressed genes in HCC using oligo-microarrays (17,18).Although genomic approaches have yielded global gene expression profiles in HCC and have identified a number of candidate genes as biomarkers useful for cancer staging, the prediction of prognosis and treatment selection (8) remain unclear across all subsets of HCC (19).
IkB kinase β (IKBKβ) is required for activation of NF-kB, a transcription factor that regulates liver inflammation and protection from injury.Koch et al. (20) found that IKBKβ deletion conferred direct growth advantages to hepato-cytes and enhanced cell proliferation.Both advantages implicate a growth-suppressor role of IKBKβ under conditions of induced hepatotoxicity and hepatocarcinogenesis. Therapies may target IKBKβ in Kupffer cells to prevent hepatic inflammation, hepatocyte proliferation, and hepatic carcinogenesis (21).
Cyclic AMP responsive element-binding protein (CREBBP) is a transcriptional co-activator that plays an essential role in the liver by regulating gene expression and different processes such as gluconeogenesis, lipid metabolism, and cell proliferation (22).Abramovitch et al. (23) demonstrated both in vitro and in vivo that CREBBP involves resistance to apoptosis and plays an important role in HCC tumor progression.
The WNT10B gene is a member of the Wnt family, which plays crucial roles in normal development and neoplastic transformation.WNT10B expression seems to be a specific event in cancer because normal liver does not show detectable expression of WNT10B.Yoshikawa et al. (24) demonstrated that WNT10B can be silenced by DNA methylation.
PRDX6 is a member of the PRDX family, associated The expression of integrin αV (ITGAV) and extracellular matrix proteins in the liver has been shown to be closely associated with chronic HBV infection and HBV-infected HCC, as follows: in the injured liver, integrins and collagens are expressed by activated hepatic stellate cells and the increased expression of these genes is a commonly observed histological abnormality in hepatitis B infection (26).
Interferon (IFN)-α exerts its antitumor effect by the interaction of IFN with multisubunit receptors -IFN-α receptor (IFNAR; including IFNAR1 and IFNAR2a).Damdinsuren et al. (27) have suggested that the expression of IFNAR1 plays an important role in the anti-proliferative effect of IFN-α in HCC cells.The expression levels of IFNAR1 were closely correlated with the response rates to IFN treatment in patients with chronic hepatitis C. The 5-FU-induced modulation of IFNAR1 expression could play a pivotal role in the therapeutic efficacy of IFN-α combined with 5-FU (28).
In the NV-HCC group, the PRDX6, ITGAV and IFNAR1 genes were up-regulated in relation to HBV-HCC.In contrast, the CREBBP and WNT10B genes are down-regulated in NV-HCC when compared to HBV-HCC/HCV-HCC.Concerning the IKBKβ gene, it is down-regulated in HCV-HCC in relation to HBV-HCC (Table 1).Although many studies have reported differential gene expression profile in HCC (4,8,9,17,18), the 6 genes previously reported to be associated with HCC had not been identified in the literature comparing viral and non-viral HCC.In a similar study, Kurokawa et al. (18) found a total of 51 genes that were identified as differentially expressed between tumor and non-tumor tissues regardless of the etiology of HCC.It is thought that these genes may play significant roles in the development of cancer independent of hepatitis viruses.
Having compared tumor samples and identified genes whose pattern of expression correlates with HCC groups, we next constructed a cluster where samples were organized hierarchically (Figure 1).Based on their correlation distances, the samples were perfectly separated.
Subsequently, in order to confirm our differentially expressed genes between groups, Fisher's linear discriminant analysis was applied.We performed an exhaustive search of trios of genes that could be used to build classifiers for distinction between viral and non-viral HCC.There are few reports in the literature concerning this approach.Meireles et al. (29) determined the expression profile in tissue samples representing normal gastric mucosa, as well as gastritis, intestinal metaplasia, and adenocarcinoma of the stomach.Using Fisher's linear discriminant analysis, these investigators identified a series of molecular classifiers that could distinguish between cancer and non-cancer samples.They also identified a series of intestinal metaplasias whose gene expression profile resembled that of adenocarcinoma.Stolf et al. (30) searched for expression signatures of individual samples of adenomas and follicular carcinomas that could be used as molecular classifiers for the precise classification of malignant and non-malignant lesions.In our study, we found a strong correlation between data from classifiers and from cluster analysis.All samples that were classified by the trios were grouped into the hierarchical cluster.This is the first study that focuses on a search for genes that could be used for the construction of molecular classifiers.It is now imperative to apply these classifiers to a large set of samples.
We identified differentially expressed genes in HCC of different etiologies, and this expression profiling may provide useful clues for pathophysiological research into HCC.Molecular stratification of individual HCC into genetically homogeneous subclasses can be of help by offering an opportunity for developing optimal therapeutic agents for various HCC based on their distinctive genomic types.

Figure 1 .
Figure 1.Clustering of the 9 hepatocellular carcinoma (HCC) samples according to the expression profile of 15 genes.Using the k-means algorithm, 9 HCC tissue samples representing the HBV-HCC (B1, B2, B3 samples), HCV-HCC (C1, C2, C3 samples) and NV-HCC (N1, N2, N3) groups were grouped into three clusters on the basis of the expression profile of the non-redundant set of 15 genes representing the 6 genes with the lowest P value for each pair-wise comparison.The lines represent genes ordered according to their hierarchical distances.The red color denotes high expression and the green color denotes low expression compared with average expression among the nine samples.Within each cluster, samples were ordered on the basis of their correlation distances.HBV = hepatitis B virus; HCV = hepatitis C virus; NV = non-viral.

Table 1 .
Biological process categories and differentially identified expressed genes for all comparisons.