Molecular identification, characterization, and expression analysis of a serine protease inhibitor gene from cotton bollworm, Helicoverpa armigera (Lepidoptera: Noctuidae).

Serine protease inhibitors (serpins), a superfamily of protease inhibitors, are known to be involved in several physiological processes, such as development, metamorphosis, and innate immunity. In our study, a full-length serpin cDNA, designated Haserpin1, was isolated from the cotton bollworm Helicoverpa armigera. The cDNA sequence of Haserpin1 is 1176 nt long, with an open reading frame encoding 391 amino acids; there is one exon and no intron. The predicted molecular weight of Haserpin1 is 43.53 kDa, with an isoelectric point of 4.98. InterProScan was employed for Haserpin1 functional characterization, which revealed that Haserpin1 contains highly conserved signature motifs, including a reactive center loop (RCL) with a hinge region (E341-N350), the serpin signature, (F367-F375) and a predicted P1-P1' cleavage site (L357-S358), which are useful for identifying serpins. Transcripts of Haserpin1 were constitutively expressed in the fat body, suggesting that it is the major site for serpin synthesis. During the developmental stages, a fluctuation in the expression level of Haserpin1 was observed, with low expression detected at the 5th-instar larval stage. In contrast, relatively high expression was detected at the prepupal stage, suggesting that Haserpin1 might play a critical role at the H. armigera wandering stage. Although the detailed function of this serpin (Haserpin1) needs to be elucidated, our study provides a perspective for the functional investigation of serine protease inhibitor genes.


Introduction
Serine proteases, ubiquitous enzymes found in eukaryotes, bacteria, and viruses, constitute almost one-third of all proteases and play a pivotal role in the catalysis of intracellular and extracellular hydrolytic reactions (Yang et al., 2017). Serine proteases are known to be involved in a wide range of essential biological processes (Ross et al., 2003;Zou et al., 2006;Zhao et al., 2010). In addition to their vital role in physiological processes, the activity of these proteases, if not controlled properly, might be hazardous to living organisms (Neurath, 1989). Injury due to excessive protease activity includes tissue damage, melanization, and inappropriate coagulation, among others (Gubb et al., 2010;Eleftherianos and Revenis, 2011). Thus, their activity must be properly and strictly controlled by inhibitors (Krowarsch et al., 2003;Rawlings et al., 2004).
Serpins, a super-family of proteins, are the most diverse family of serine protease inhibitors found in animals, plants, and microorganisms (Roberts et al., 2004;Reichhart, 2005). Serpins are composed of approximately 350-500 amino acids, with a reactive center loop (RCL) of approximately 20 residues near the carboxyl terminus (Shakeel et al., 2019). The serpin RCL region, a determining region of serpins that mediates interaction with the target protease, is comprised of a scissile bond between residues designated P1 and P1' (Reichhart et al., 2011). After successful cleavage of the scissile bonds, serpin dramatically undergoes a conformational change that covalently traps the target proteinase (Dissanayake et al., 2006;Ulvila et al., 2011;Yang et al., 2017).
Serpins have been documented to play a critical role in various physiological processes, including reproduction, metamorphosis, inflammation, blood coagulation, complement, and innate immune responses (Bayer et al., 1997;Choo et al., 2012;Davie et al., 1991;Gál et al., 2013;Kim et al., 2013;Shigetomi et al., 2010;Shakeel and Zafar, 2020). All of these processes require the existence of serpins to control cellular homeostasis, inhibit inadmissible proteolytic cascades (Jarasrassamee et al., 2005), or prevent exogenous proteases secreted by pathogens that exploit the protease to invade new tissues and penetrate a host (Wang et al., 2009).
The cotton bollworm Helicoverpa armigera (Hübner) is an important polyphagous generalist pest species in Asia, Europe, Africa, and Australia that causes serious damage to cotton, corn, sorghum and many other crops (Tay et al., 2013). Over the past decades, primary control strategies for this pest have relied on pesticides and transgenic crops (Tabashnik et al., 2013). However, these methods not only lead to the development of resistance to insecticides but also affect the environment. Thus, there is an urgent need to develop novel biological control strategies.
Considering the importance of serpins in physiological processes, as exhibited by previous reports, the present study aimed to identify and characterize a serpin gene that might also be involved in biological processes in H. armigera. Furthermore, we sought to elucidate the phylogenetic relationship of H. armigera serpin to serpins from other insects.

Insect culture
The H. armigera population was nurtured on an artificial diet (Abbasi et al., 2007). The rearing conditions were as follows: 70% relative humidity, photoperiod of 16:8-h L:D, and 25±2 °C temperature.

Collection of tissues
Dissection of healthy H. armigera larvae was performed on ice under a stereomicroscope (Zeiss, Jena, Germany). The insect fat body was collected and washed with PBS (140 mM NaCl, 27 mM KCl, 8 mM Na2HPO4, and 1.5 mM KH2PO4, pH 7.4). The fat body tissues were then stored at −80 °C for RNA extraction.

RNA isolation and cDNA synthesis
Total RNA was isolated using TRIzol reagent following the manufacturer's protocol. Assessment of the RNA quality was performed by 1% (w/v) agarose gel electrophoresis. A UV spectrophotometer was used to assess RNA concentration. DNA contamination was eliminated by DNaseI (Fermentas). cDNA was synthesized by reverse transcription RevertAid™ Reverse Transcriptase (Fermentas) in a 20-μL reaction. The reaction mix was incubated at 42 ºC for 60 min, and the product was stored at -80 ºC.

Identification and primer design of H. armigera serpin for PCR
The H. armigera serpin (accession number: HQ615869) sequence was retrieved from the GenBank database at NCBI (National Center for Biotechnology Information) and used as a reference sequence for designing the primers. Sequence-specific primers were designed by primer premier 5 in accordance with the length of the cDNA sequence of Haserpin1 to amplify the open reading frame (ORF). The primers were sense, 5'-GGATCCATGTCATCAGATTC CG ATGAACTTCTCA AGCA-3' and antisense, 5'GGTACCTTAATGATG ATGATGATGATGTG ATTGGATGACTCCGCTAAACAG AATGTTATTCCGTT-3', with BamHI and KpnI as restriction enzyme sites (underlined), respectively, and 6×-His-tag at the N terminus of the antisense primer.

Genomic DNA isolation and PCR
The FGENESH program (Salamov and Solovyev, 2000) was used to predict the number of potential exons and introns in the Haserpin1 genomic DNA. For experimental verification, whole bodies of H. armigera (n = 3) were used to isolate DNA (Wizard Genomic DNA Purification Kit, Promega  ) for PCR. The oligonucleotide primers used for amplification were the same as those mentioned above. The amplification primers were designed using the cDNA sequence of Haserpin1. All PCR products were verified by DNA sequence analysis.

H. armigera Haserpin1 cloning
To amplify gene-specific primers, cDNA and a set of sense and anti-sense primers were used in PCR. For the digestion reaction, sense primer was incorporated into the BamHI restriction site, whereas to incorporate anti-sense primer, the KpnI restriction site was used. The PCR amplification was performed as follows: denaturation at 95 ºC for 3 min, followed by 35 cycles at 95 ºC for 30 sec, 55 ºC for 30 sec, and 72 ºC for 2 min, with a final extension at 72 ºC for 5 min. The PCR product was then visually examined on a 1% (w/v) agarose gel stained with ethidium bromide using the BioRad imaging system. A gel extraction kit (Omega) was used to purify the amplified product, which was then ligated to the pGEM-T easy vector (TaKaRa) and transformed into Escherichia coli DH5α. A positive clone was selected on LB agar plates that contained 50 μg mL-1 ampicillin after incubation at 37 ºC overnight. The resulting clones were sequenced by Shanghai Sunny Biotech Co., Ltd. Triplicates were used for sequencing, and all the sequences were determined in both directions for at least 9-12 clones, and only those cloned PCR products showing high similarity with other serpin genes from insects were selected.

Biochemical properties and phylogenetic tree construction
NCBI Basic Protein BLAST (Basic Local Alignment Search Tool) was used to identify homologous serpin gene sequences from different insect species with the H. armigera serpin (Hasperin1) sequence used as the query. PEPSTATS was used to compute the primary sequence composition of IPK2 (Rice et al., 2000). The physiochemical parameters of the Haserpin1 protein were predicted using the ProtParam tool (Gasteiger et al., 2005). The SignalP 4.1 webserver was used to identify the possible secretory signal peptide of Haserpin1 (Petersen et al., 2011). To detect the evolutionary location and phylogenetic similarities of Haserpin1 with other insect serpin genes, multiple sequence alignment (MSA) of selected amino acid sequences was carried out to produce quality alignments.
To better understand the evolutionary relationship of Haserpin1 with serpins from other insects, serpin sequences of O. furnacalis, A. gambiae, Apis mellifera, B. mori, D. melanogaster, M. sexta, T. molitor, P. xylostella, Mamestra brassicae, and Mamestra configurata were used to construct a phylogenetic tree using the maximum likelihood method in MEGA (Tamura et al., 2013). The maximum likelihood analysis was performed based on the JTT matrix-based model with 1,000 bootstrap replicates. The initial tree for the heuristic search was obtained automatically by applying the neighbor-joining and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model and then selecting the topology with the superior log-likelihood value. The tree is drawn to scale, with branch lengths indicating the number of substitutions per site.

Fluorescence real-time quantitative PCR analysis of gene expression
Real-time qPCR was performed using a BioRad iQ2 cycler. The real-time qPCR reaction was carried out in PCR strips with SYBR Green; first-strand cDNA (in triplicate) for each sample served as the template for RT-qPCR using SsoFast™ EvaGreen  Supermix (Bio-Rad, Hercules, CA, USA) with an iQ2 Optical System (Bio-Rad). Every reaction was performed in iQ™ 96-well PCR plates containing a 20-μL volume containing cDNA and 100 nM of each primer and covered with adhesive seals (Bio-Rad). Primers for Haserpin1 (accession number: HQ615869) forward primer 5'-CTTCAGAAGGAGATTCTC-3' and reverse primer 5'-CTTCCTCTACCCATGTAT-3') and a housekeeping gene, the ribosomal protein L28 (accession number: DQ875266.1 forward primer 5'-CATCTGAACTGGATGATC-3' and reverse primer 5'-GTACACTACTGTGAAACC-3'), were designed by the Beacon Designer program (Corbett Robotics). Ribosomal protein L28 was used as the internal control for normalization (Shakeel et al., 2015). The reactions were initiated with denaturation for 30 sec at 95 °C, followed by 40 cycles at 95 °C for 5 sec at 60 °C for 10 s. The amplified product was then subjected to melting curve analysis at 55 to 95 °C to confirm the specificity and consistency. The relative expression of genes was calculated using the 2 −ΔΔCT method (Livak and Schmittgen, 2001).

Cloning of H. armigera serpin
The present study provides a report on the identification and characterization of a serpin gene (Haserpin1) in H. armigera. The Haserpin1 gene sequence (accession number: HQ615869) was obtained from NCBI, and sequence-specific primers were used to amplify the Haserpin1 gene. The amplicon with the appropriate size of 1382 bp fragments containing an ORF of 1174 nucleotides and encoding 391 amino acids was obtained, confirming the presence of the Haserpin1 gene in H. armigera.

Genomic DNA sequence analysis of the H. armigera serpin
Genomic DNA sequence analysis was conducted to determine the number of introns and exons present in the Haserpin1 gene of H. armigera. For this purpose, FGENESH software was used to predict the Haserpin1 gene sequence. The results of sequence prediction revealed that the full-length genomic DNA sequence of Haserpin1 has one exon and no introns. To further confirm the presence of one exon and no introns in the Haserpin1 sequence, genomic DNA was isolated from H. armigera and used as a template for PCR. The amplification results showed an amplicon size of 1174 nucleotides. The PCR products were verified by DNA sequence analysis, which also demonstrated no intron in Haserpin1.

Biochemical properties and characterization of H. armigera serpin
The biochemical properties of the Haserpin1 sequence predicted by ProtParam indicated Leu (11.5%) as the most abundant amino acid, followed by Ala (9.2%), Ser (9.2%), Lys (8.7%) and Val (8.2%); Trp (0.5%) and His (0.8%) are the least abundant amino acids in the Haserpin1 sequence. The molecular weight of Haserpin1 was computed to be 43.53 kDa, with a pI of approximately pH 5. The instability index (Ii) of Haserpin1 was computed to be 37.99, suggesting that Haserpin1 is thermally stable with a long half-life. Moreover, the present study obtained a high aliphatic index (93.79) for the Haserpin1 protein.
The extinction coefficient (EC) of the protein at 280 nm in water was 27,390. The gradient average hydropathicity (GRAVY) index for Haserpin1 was calculated to be -0.203. InterPro scan analysis confirmed the identity of the serpin superfamily. The SignalP 4.1 web server identified the presence of a signal peptide at the N-terminus position of Haserpin1. Furthermore, the PANTHER classification system recognized Haserpin1 as a protease inhibitor belonging to the serpin family (PTHR11461).

Phylogenetic tree analysis of H. armigera serpin
A phylogenetic tree was constructed based on Haserpin1 and other insect serpins using maximum likelihood analysis (Figure 2). The results revealed that insect serpins can be grouped into different clusters, with Haserpin1 being closely related to M. brassicae serpin1-A (MbSPI1-A) and M. configurata serpin1-A (McSPI1A) along with two serpins from M. sexta (MsSPI and MsSerpin-1J) and one from B. mori (Bmserpin-1A).

Prediction of H. armigera serpin secondary and tertiary structures
The secondary structure prediction for Haserpin1 conducted by SWISS-MODEL and Phyr2 indicated that Haserpin1 contains 13 α-helices and 10 β-strands (Figure 3). The predicted tertiary structure of Haserpin1 revealed that it has high similarity with M. sexta serpin1 (MsSerpin1: PDB. 1sek) (Figure 4).

The mRNA expression profile of Haserpin1 in the fat body
In the present study, SYBER Green real-time PCR was employed to analyze the cDNA of H. armigera fat body tissue to determine the tissue-specific mRNA expression of Haserpin1. The RPL28 gene was used as an endogenous control. Our results revealed that Haserpin1 mRNA was  . Predicted sequence of Haserpin1 amino acids indicating the location of conserved helices (alpha) and strands (beta). The bold lines represent alpha helices labelled from "hA" to "hM"; the broken lines represent beta-strands labelled from "s1" to "s10". constitutively expressed in the fat body at the larval, prepupal, pupal, and adult stages ( Figure 5). In addition, the level of Haserpin1 gene expression was relatively elevated at the prepupal stage and on the 1 st day of the pupal stage, whereas very low expression was found on other pupal days and at the adult stage.

Discussion
Serine protease inhibitors, serpins, are the largest and most widely distributed family of protease inhibitors found in all organisms, though rarely in bacteria and fungi (Kanost and Jiang 1997). Serpins have been reported to play important roles in biological processes such as development, metamorphosis, and immunity (Bayer et al., 1997;Choo et al., 2012;Davie et al., 1991;Gál et al., 2013;Kim et al., 2013;Shigetomi et al., 2010). In the present study, a serpin sequence from H. armigera, designated Haserpin1, was obtained from NCBI. Currently, there are no other reports on the expression and characterization of H. armigera serpin1. Thus, our work presented herein represents the first report of the cloning, characterization, and expression analysis of an H. armigera serpin, which we designated Haserpin1. The cDNA sequence of Haserpin1 consists of a 1382-bp open reading frame encoding 391 amino acids with a calculated molecular mass of 43.53 kDa. The molecular weight of our protein is in accordance with other serpins reported in insects (Shukla et al., 2015).
The biochemical analysis of Haserpin1 demonstrated that leucine is the most abundant amino acid, comprising approximately 11.5% of its residues. The isoelectric point (pI) of Haserpin1 is approximately 5, indicating that it might be soluble in an acidic buffer. Instability indices (Ii) are used to determine in vivo half-lives, and a protein with an Ii value lower than 40 is predicted to be stable (Rogers et al., 1986). In the present study, the Ii value of Haserpin1 was less than 40, which indicates that it is thermally stable with a long half-life. The aliphatic index of Haserpin1 was also assessed to evaluate the relative volume occupied by aliphatic side chains (Ikai, 1980). The results indicated that Haserpin1 has a high aliphatic index, which indicates that it may be stable at a high temperature (Ikai, 1980). GRAVY analysis was conducted to determine the hydrophobicity of Haserpin1, whereby a positive GRAVY value indicates a hydrophobic nature and negative GRAVY value a hydrophilic nature (Kyte and Doolittle, 1982). Our results showed that Haserpin1 is hydrophilic and can interact favorably with water.
Moreover, sequence alignment analysis of Haserpin1 with other insect serpins revealed that it shares the highest (65%) identity with M. configurata and M. brassicae and the lowest (51%) with Papilio xuthus. These sequence similarity results are in accordance with other reports of insect serpin sequence similarity; for example, O. furnacalis serpin1 (ofserpin1) shares a 60% identity with M. configurata serpin1a and 55% with P. xylostella serpin1a (Zhang et al., 2016).
The deduced amino acid sequence of Haserpin1 shows the presence of a signal peptide at the N-terminus, indicating that Haserpin1 is a secretory protein and probably plays an important role in inhibiting extracellular serine proteinase cascades. The potential signal peptide found in Haserpin1 has also been reported in several other secretory serpins (Babin et al., 1984;Bania et al., 1999;Cierpicki et al., 2000;Huang et al., 1994;Kim et al., 2013). The C-terminus of Haserpin1 is composed of characteristic serpin domains, an RCL with a scissile bond between P1-P1', which may be cleaved by the target protease, a serpin signature, and a hinge region, which are also found in all other insect serpins (Chamankhah et al., 2003;Zheng et al., 2009;Zhang et al., 2016). The P1 residue of the RCL region is considered a determinant of substrate specificity. Previously, it has been reported that the presence of Arg or Lys as a P1 residue indicates that the serpin will target trypsin-like proteases (Gulley et al., 2013). Similarly, serpins containing Pro, Tyr, Phe, Leu, or Met at the P1 residue are predicted to inhibit chymotrypsin and chymotrypsin-like proteases (Laskowski Junior and Kato, 1980). In the present study, a Leu residue at the P1 position of Haserpin1 was observed, suggesting that this serpin might inhibit chymotrypsin-like proteases. A similar Leu residue was found at the P1 position of O. furnacalis serpin (ofserpin1), with inhibitory activities against trypsin and chymotrypsin (Zhang et al., 2016).
To illustrate the biological role of Haserpin1 in H. armigera, the expression pattern of mRNA was investigated in the fat body at different developmental stages by real-time quantitative PCR. In previous studies, it has been reported that the fat body is the major site for the synthesis of serpins, and constitutive and high expression of serpins has been observed in the fat body (Chamankhah et al., 2003;Li et al., 2012;Liu et al., 2015).
Our results also showed that Haserpin1 was constitutively expressed in the fat body, which suggests that Haserpin1 is a serine protease inhibitor derived from the H. armigera fat body. Variable expression of Haserpin1 was observed during development. At the 5 th -instar larval stage, low expression was detected, which was relatively elevated at the prepupal and on the 1 st day of the pupal stage; conversely, very low expression was found on other pupal days and at the adult stage. The variation in the expression of Haserpin1 observed during development in our study is not uncommon, as a similar phenomenon has been observed in previous studies. For example, in Antheraea perenyi, expression of Apserpin1 was lowest at the 4 th larval instar and highest at the pupal stage (Yu et al., 2017). Similarly, serpin1 of Choristoneura fumiferana showed high expression at the intermolt phase compared with the molting phase (Zheng et al., 2009). Further evidence that supports the fluctuation of serpin gene expression at different developmental stages has been obtained from M. sexta, in which serpin1 was detected at lower levels during molting and wandering stages (Kanost et al., 1995). These results from different insect species suggest that serpin1 gene expression fluctuates at different developmental stages.

Conclusion
In conclusion, a serpin designated Haserpin1 was identified and characterized for the first time in the fat body at different developmental stages of H. armigera. Constitutive but fluctuating expression of Haserpin1 was observed in the fat body at different stages of development.
Our results indicate that the fat body is the major site of serpin synthesis in H. armigera. Furthermore, higher expression at the prepupal stage suggests that Haserpin1 might play a critical role at the wandering stage of H. armigera. Although the detailed function of this serpin (Haserpin1) still needs to be investigated, our study provides a perspective for the functional investigation of serine protease inhibitor genes.