Protein profile of rice (Oryza sativa) seeds

Seeds are the most important plant storage organ and play a central role in the life cycle of plants. Since little is known about the protein composition of rice (Oryza sativa) seeds, in this work we used proteomic methods to obtain a reference map of rice seed proteins and identify important molecules. Overall, 480 reproducible protein spots were detected by two-dimensional electrophoresis on pH 4–7 gels and 302 proteins were identified by MALDI-TOF MS and database searches. Together, these proteins represented 252 gene products and were classified into 12 functional categories, most of which were involved in metabolic pathways. Database searches combined with hydropathy plots and gene ontology analysis showed that most rice seed proteins were hydrophilic and were related to binding, catalytic, cellular or metabolic processes. These results expand our knowledge of the rice proteome and improve our understanding of the cellular biology of rice seeds.


Introduction
Rice (Oryza sativa L.) is the main food source for more than two-third of the world's population (Sasaki and Burr, 2000), especially in Southeast Asia (Nwugo and Huerta, 2011;Wang et al., 2011).With the completion of the rice genome sequencing program, rice has become the model organism in molecular biological research of monocotyledons (Agrawal and Rakwal, 2011;Li et al., 2011).The International Rice Genome Sequencing Project (IRGSP) has generated high-quality sequences that cover 95% of the 389 Mb rice genome and has produced a genomic map for this species (Liu and Xue, 2006).
In recent years, many studies have investigated the functional genomics of rice.Traditional functional genomics have investigated mainly the changes in mRNA abundance in histiocytes.However, because of transcriptional regulation, mRNA levels do not provide a true indication of protein expression levels (Jugran et al., 2010;Ding et al., 2012).On the other hand, some proteins undergo complex post-translational modifications such that changes in the level of active protein may be more significant than those in the total protein content.Proteomic analysis was first described by Wilkins and Williams (1994) and seeks to study all proteins expressed in a cell, tissue or organism at a specific time or under specific circumstances by maximizing protein separation and identification (Wilkins et al., 1998).Two-dimensional electrophoresis (2-DE) combined with mass spectrometry (MS) are still the core tools for identifying differentially expressed proteins in proteomics (Yang et al., 2006(Yang et al., , 2007a,b;,b;Chitteti and Peng, 2007;Torabi et al., 2009;Chi et al., 2010;Ahrné et al., 2011;Fan et al., 2011;He et al., 2011;Nwugo and Huerta, 2011;Ding et al., 2012;Kalli and Hess, 2012).
Seeds are important plant storage organs that play a central role in the life cycle of plants because they are essential for plant reproduction and the initial stages of offspring formation (Yang et al., 2009).Seed biology is a major subject in plant research, although most studies have focused on seed dormancy and germination mechanisms (Koornneef et al., 2002;Finch-Savage and Leubner-Metzger, 2006;Yang et al., 2007b;Vaughan et al., 2008;He et al., 2011), with little being known about seed protein composition.Since proteomics is a well-established means of assessing global changes in protein profiles (Agrawal et al., 2006;Agrawal and Rakwal, 2011;Fan et al., 2011), in this study we used 2-DE and MALDI-TOF-MS to examine the proteomic profile of rice seeds.Our specific goals were (1) to determine the proteomic profile of rice seeds, (2) to identify the main protein components involved and (3) to understand the functional characteristics of the identified proteins.

Seeds
Seeds of the Nipponbare strain of rice (O.sativa L. spp.japonica, cv.Nipponbare, AA genome) were used in this work.

Protein extraction
The rice seeds were peeled and washed three times using purified water, after which proteins were extracted using a modified version of the protocol described by Shen et al. (2003).Seeds (2 g samples) were homogenized in pre-cooled extraction buffer (20 mM Tris-HCl, pH 7.5, 250 mM sucrose, 10 mM EGTA, 1 mM PMSF, 1 mM DTT and 1% Triton X-100) on ice.The homogenate was transferred to a 2 mL centrifuge tube and centrifuged (15,000 g, 4 °C, 20 min).The supernatant was collected and proteins were precipitated for 30 min in an ice bath by adding 50% cold trichloroacetic acid (TCA) until the final concentration of TCA was 10% (Yang et al., 2006).The supernatant was discarded after centrifugation (15,000 g, 4 °C, 20 min) and the pellet was then washed four times using cold acetone containing 13 mM DTT.After further centrifugation (15,000 g, 4 °C, 20 min), the pellet was vacuum-dried.The dried powder was dissolved in sample buffer (7 M urea, 2 M thiourea, 4% Chaps, 2% Bio-Lyte pH 3-10, 1 mM PMSF and 1% DTT; 1 mg dried powder/0.1 mL of buffer) at 4 °C overnight.Following a final centrifugation (15,000 g, 4 °C, 20 min), the supernatant was used for 2-DE.Protein concentrations were determined by a dyebinding method (Bradford, 1976).Since some of the components of the sample buffer interfered with the Bradford assay an equal volume of sample buffer was added to the protein reagent to compensate for this interference.Bovine serum albumin was used as the standard.

Two-dimensional electrophoresis
Isoelectric focusing (IEF) was done using a Bio-Rad PROTEAN electrophoresis system and 17 cm immobilized IPG dry gel strips with a linear pH range (pH 4-7) (Bio-Rad, USA).Protein samples (~1.5 mg) were loaded during the rehydration step (passive rehydration, room temperature, 12-13 h) and IEF was done at 300, 500 and 1000 V for 1 h, with linear ramping to 8000 V over 2 h and holding at 8000 V until a total voltage of 50 kVh was achieved.Subsequently, the strips were equilibrated for 15 min with buffer I (6 M urea, 50 mM Tris-HCl, pH 6.8, 30% v/v glycerol, 2.5% SDS, 1% w/v DTT) and then for 15 min with buffer II (6 M urea, 50 mM Tris-HCl, pH 6.8, 30% v/v glycerol, 2.5% SDS, 2.5% w/v iodoacetamide).After equilibration, the second dimension SDS-PAGE was done using 12% polyacrylamide gels.Proteins were detected by staining the gels with 0.116% Coomassie brilliant blue R-250.

Image and data analysis
The 2-DE gels were scanned (resolution: 300 dpi) with an ImageScanner III scanner (GE Healthcare BIO-Science) and the gel images were analyzed with PDQuest software (Bio-Rad, USA).Each protein spot in the 2-DE map was assigned a number.

In-gel digestion and MALDI-TOF MS analysis
Protein spots were excised manually from the Coomassie blue-stained gels and each gel fragment was im-mersed in purified water and sonicated twice (10 min each).Subsequently, the gel pieces were destained with 50 mM ammonium bicarbonate and an equivalent volume of 50% acetonitrile, followed by sequential washing with 25 mM ammonium bicarbonate, 50% acetonitrile and 100% acetonitrile, respectively.After lyophilization, the gel fragments were rehydrated in digestion buffer (2 mL) containing 25 mM NH 4 HCO 3 and 10 ng of trypsin/mL (Promega, Madison, WI, USA) at 4 °C.After 30 min, 10-15 mL of 25 mM NH 4 HCO 3 was added and digestion was continued at 37 °C overnight (11-16 h).After digestion, the peptide solution was collected and tryptic peptide masses were determined using a MALDI-TOF mass spectrometer (Ultraflex-TOF-TOF, Bruker, Germany).

Database search and protein identification
All of the acquired peptide mass fingerprint data were used in online searches with the Mascot program through Biotechnology Information nonredundant database.The search parameters included trypsin as the selected enzyme (one missed cleavage was permitted), carbamidomethyl as the fixed modification, Gln-> pyro-Glu (N-terminal Q) as the variable modification and a peptide tolerance of ± 0.2 Da.O. sativa was selected as the taxonomic category.Proteins with a MOWSE score > 64 were considered as positive identifications.

Bioinformatics analysis of the identified proteins
The hydropathy of all proteins identified with a high level of confidence (MOWSE scores > 64) and the grand average of hydropathicity (GRAVY) for all the proteins were calculated as described by Kyte and Doolittle (1982), using the Protparam tool from the ExPASy site.The resulting grand average hydropathy values were then analyzed with Origin 7.0 software.
The Gene Ontology (GO) identity of each of the identified proteins was obtained by InterProscan searching.The GO classification of these proteins was obtained using the WEGO platform and the annotated data of the identified proteins.

Proteomic profile of rice seeds
The analysis of 2-DE gels with PDQuest software detected 480 reproducible protein spots, most of which were distributed near the center of the gels (Figure 1).For example, the pI of 415 protein spots was between 5 and 7 and accounted for 84.5% of the total number of protein spots.In addition, the molecular mass of ~90% of the proteins was between 15 kDa and 95 kDa.

Protein identification by MALDI-TOF MS
A comprehensive knowledge of rice seed proteins will greatly enhance our understanding and exploration of the functional characteristics of these seeds.The 480 reproducible proteins were screened by MALDI-TOF-MS to obtain peptide mass fingerprint data.Only 302 proteins (Figure 2) with high confidence levels (MOWSE scores > 64) were identified (Table S1 -Supplementary Material), of which 52 were unidentified proteins of unknown functions (Figure 3; Table S2 -Supplementary Material).In some cases, different spots contained the same protein (Table S1), e.g., spots 4, 5, 6 and 7 corresponded to hypothetical protein OsJ_13773, and spots 10 and 11 were putative aconitate hydratase.

Bioinformatics analysis of identified proteins
Proteins with negative GRAVY scores were hydrophilic and those with positive GRAVY scores were hydrophobic.Figure 5 shows that identified proteins with negative GRAVY scores were significantly more abundant than those with positive GRAVY scores.The GRAVY values of most proteins were between -0.6 and 0, indicating that most of them were hydrophilic.
Protein profile of rice seeds 89    Figure 6 shows the GO analysis of the identified proteins, all of which were classified in terms of cellular component, molecular function, and physiological and biological processes using appropriate software (Gene Ontology Annotation Plot, WEGO).Most of the identified proteins associated with cellular components were involved in cell, cell parts, envelope, macromolecular complex, organelle and organelle parts, while those associated with molecular functions were involved in antioxidant, binding, catalytic, electron carrier, enzyme regulator, nutrient reservoir, transcription regulator and transporter activities.Biological processes involved biological regulation, cellular component organization, cellular process, establishment of localization, localization, metabolic process, multi/-organism process, multicellular organismal process, pigmentation, reproduction, reproductive process and response to stimulus.

Discussion
Proteomic technologies are the most widely applied approach for identifying proteins in rice (Yang et al., 2006(Yang et al., , 2007a,b;,b;Chitteti and Peng, 2007;Torabi et al., 2009;Chi et al., 2010;Fan et al., 2011;He et al., 2011;Nwugo and Huerta, 2011;Ding et al., 2012).In this study, we used 2-DE combined with MALDI-TOF-MS to obtain a 2-DE proteomic profile of rice seeds.A total of 480 reproducible protein spots were selected for MALDI-TOF-MS analysis.However, only 302 proteins with a MOWSE score > 64 were identified as proteins (see Tables S1 and S2); there were no significant matches for the other 178 protein spots.There are at least two possible explanations for this phenomenon.First, some protein spots with low confidence levels possibly contained more than one protein.Second, some small protein spots could not be identified by MALDI-TOF-MS or were not included in the databases because of a lack of information in the rice database (Woo et al., 2002).
The majority of corn proteins can be divided into three categories: storage proteins, structure-or metabolism-related proteins, and protective proteins (Shewry and Halford, 2002).As shown in Figure 4, 24.8% of the identified proteins were classified in the metabolism group, 16.9% were involved in disease/defense and 15.6% were cell structure proteins.Furthermore, 10.3% of the identified proteins were classified in the energy group.Together, the proteins in these groups accounted for > 67% of the identified proteins.Metabolism is essential for many activities and, not surprisingly, metabolism-related proteins have an important role in maintaining seed vigor.In addition, most metabolism-and energy-related proteins are associated with carbohydrate metabolic pathways (He et al., 2011), including glycolysis and the TCA cycle.In this study, many enzymes involved in glycolysis were identified, including pyruvate orthophosphate dikinase (spots 12 and 13), phosphoglucomutase (spot 21), pyrophosphate-fructose-6phosphate 1-phosphotransferase (spot 51), pyrophosphate-fructose-6-phosphate 2-phosphotransferase (spot 52), UTP-glucose-1-phosphate uridylyltransferase (spots 56 and 97), fructose bisphosphate aldolase (spot 58), glucose-6-phosphate isomerase (spots 60 and 77), enolase (spots 94 and 95), glyceraldehyde 3-phosphate dehydrogenase (spots 143 and 145), glucose-6-phosphate 1-epimerase (spot 161) and triosephosphate isomerase (spots 250, 254, 256 and 257).Some enzymes involved in the TCA cycle were also identified, such as aconitate hydratase (spots 10 and 11), succinate dehydrogenase (spot 35), isocitrate dehydrogenase (spot 121), succinyl-CoA synthetase (spot 160) and malate dehydrogenase (spots 167, 168 and 173).Similarly, two enzymes involved in the alcoholic fermentation pathway were also identified, namely, alcohol dehydrogenase (spot 34) and pyruvate decarboxylase (spot 45).These results indicate that aerobic and anaerobic respiration occurs in storage rice seeds.The energy demand is met primarily by glycolysis and the TCA cycle, although anaerobic fermentation can also provide energy in the absence of oxygen.
We also identified 12 proteins related to amino acid metabolism: five of these have a central role in amino acid metabolism (spots 106, 148, 153, 155 and 156)   volved in arginine metabolism (spots 105, 107 and 108).
Compared with germinating rice seeds, there were fewer proteins associated with amino acid metabolism in storage rice seeds.There are several explanations for this phenomenon.First, dry seeds are used mainly for storage and transport, and a lower metabolic activity favors the preservation of rice seeds.Second, the moisture content of storage seeds is very low, with the existing metabolism providing only essential energy and many physiological and biochemical reactions are inactive.Third, staining with Coomassie brilliant blue may not be sufficiently sensitive to detect some spots so that more sensitive staining methods such as negative staining and fluorescence staining should be used in future studies.Finally, some strongly basic proteins or proteins with extreme molecular masses may be missed in the 2-DE gels.The presence of the same protein in different spots suggests variations in post-translational modifications or the presence of protein subunits, as also suggested by others (Yang et al., 2006;Chi et al., 2010;Liu and Bennett, 2011).
The hydropathy analysis showed that most of the rice seed proteins were hydrophilic.Rice seeds contain many proteins and enzymes related to metabolism and disease/defense, and these proteins may only be active in physiological processes when in solution, i.e., in a soluble state.The presence of soluble proteins is a further characteristic of rice seed proteins.
In a proteomic survey of metabolic pathways in rice, Koller et al. (2002) identified 2,528 unique proteins, 877 of which were from seeds.Of the 2,528 proteins detected, 189 were expressed in rice leaves, roots and seeds.In addition, there were 512 seed-specific proteins.Koller et al. (2002) collected their seed samples from the entire panicle at 14 days postanthesis.In contrast, we used seed samples from mature rice seeds and identified 302 proteins that represented 252 gene products.Our findings therefore expand the results of previous studies.

Conclusion
Seeds are a major food source for humans and are essential for plant reproduction.In this study, we identified 302 proteins in the proteome of rice seeds.These proteins represented 252 gene products and were classified into 12 functional categories.The 302 proteins identified here represent an important contribution to the rice proteome database and shed light on the protein content of rice seeds.

Figure 2 -
Figure 2 -The protein spots identified by MALDI-TOF-MS.Each protein with a high confidence level (MOWSE score > 64) was assigned a number.

Figure 4 -
Figure 4 -Functional classifications of the identified proteins.The number of proteins in each category is indicated in parentheses.
, four are involved in the metabolism of branched chain amino acids (spots 38, 40, 53 and 141) and the remaining three are in-90 Yang et al.

Figure 5 -
Figure 5 -Hydropathic analysis of all proteins identified by 2-DE.Negative and positive GRAVY values indicate hydrophilic and hydrophobic proteins, respectively.

Figure 6 -
Figure 6 -GO classifications of the identified proteins.All of the proteins were classified into three main categories and 26 subcategories.

Table S1 -
The protein spots identified by MALDI-TOF-MS.