Homology modeling and epitope prediction of Der f 33

Dermatophagoides farinae (Der f), one of the main species of house dust mites, produces more than 30 allergens. A recently identified allergen belonging to the alpha-tubulin protein family, Der f 33, has not been characterized in detail. In this study, we used bioinformatics tools to construct the secondary and tertiary structures and predict the B and T cell epitopes of Der f 33. First, protein attribution, protein patterns, and physicochemical properties were predicted. Then, a reasonable tertiary structure was constructed by homology modeling. In addition, six B cell epitopes (amino acid positions 34–45, 63–67, 103–108, 224–230, 308–316, and 365–377) and four T cell epitopes (positions 178–186, 241–249, 335–343, and 402–410) were predicted. These results established a theoretical basis for further studies and eventual epitope-based vaccine design against Der f 33.

Allergen specific immunotherapy (SIT) is one of the most effective treatments for allergic diseases (4). SIT can be improved by using recombinant allergens, which contain most of the IgE-binding epitopes of the source allergens and are pure and better standardized compared to natural allergen extracts (5). A number of recombinant dust mite allergens have been cloned, expressed, and purified, including Der f groups 1-3, 5-8, 10, 11, 13-18, 22, 24, and 33 allergens (6,7). Allergen extracts of HDM have been used for diagnosis and treatment of IgE-mediated allergic diseases. However, these crude extracts include some inflammatory molecules, such as kallikreins, ceramides, and endotoxins, which could modify treatment outcomes and efficacy (8). Thus, these extracts have some limitations in both their safety and efficacy in SIT (5).
Some SIT approaches have shifted toward epitopebased vaccine design (9,10). In this approach, a recombinant allergen contains multiple B and T cell epitopes. Thus, identifying the major B and T cell epitopes of allergens is critical for effective immunotherapy of allergic diseases via epitope-based vaccine preparation.
To date, 36 groups of mite allergens have been listed in the Allergen Nomenclature Database (www.allergen. org). Der f 33 was identified in 2014 (GenBank accession KM010005), and it was characterized as having a molecular weight of 52 kDa and belonging to the alpha-tubulin protein family. Moreover, Der f 33 could react to the serum of patients with mite allergy; the positive rate of skin prick test to Der f 33 was 23.5% (4/17 patients). Also, it can modulate the functions of dendritic cells (DCs) and induce airway allergy (7). However, the major B and T cell antigen epitopes of Der f 33 have not been reported.
In this study, we used bioinformatics to predict the secondary and tertiary protein structures and identify the B and T cell epitopes of Der f 33. These findings provide theoretical support for mite allergen epitope-based vaccine design.

Sequence retrieval and analyses
Der f 33 amino acid sequence (Accession Number: AIO08861.1) was obtained from the International Union of Immunological Societies (IUIS) nomenclature database and the protein database of National Center for Biotechnology Information (NCBI). Family classification of Der f 33 was analyzed by Superfamily v1.75 (11) and InterPro v56.0 (12). TMHMM server 2.0 (13) was used for predicting the transmembrane helices in Der f 33 proteins.

Physicochemical analysis and secondary structure prediction
Physicochemical analysis including molecular weight, negatively charged residues, positively charged residues, theoretical pI, aliphatic index, grand average of hydropathicity (GRAVY), and instability index of Der f 33 was predicted by ProtParam (14). Characteristic patterns and functional motifs of Der f 33 were checked by using Prosite (15). Secondary structure of Der f 33 was predicted by Jpred 4.0 (16).

Tertiary structure prediction and evaluation
Homology modeling was used for constructing the tertiary structure of Der f 33. BLASTP search was performed against the Protein Data Bank (PDB) to find suitable Der f 33 templates, which were based on the high score, lower e-value, and maximum sequence identity. Tertiary structure was constructed by MODELLER v9.16 (17), which was imported to Chiron (18) to rectify unfavorable clashes and improve the quality of stereochemistry.
Estimating the quality of tertiary structure is a vital step. VERIFY_3D (19) was used to determine the compatibility of an atomic model (3D) with its own amino acid sequence (1D) and compare the results to good structures. PROCHECK (20) was used to check the stereochemical quality of Der f 33 structure. ERRAT (21) was used to analyze the statistics of non-bonded interactions between different atom types. ProSA (22) was used to analyze the Z-score, which shows the degree of match between the template protein and Der f 33. QMEAN (23) is a composite scoring function, which was used to derive both global (for the entire structure) and local (per residue) error estimates based on one single model. Visualization of tertiary structure was performed using UCSF Chimera 1.10.2 (24).
In the 2 programs, high binding peptides have an IC50 value below 50 nM. The ultimate T cell epitopes were obtained by combining the results of the HLA-DR alleles epitopes and HLA-DQ alleles epitopes.

Amino acid sequence analysis
The ProtParam results showed that the complete amino acid sequence of Der f 33 comprises 461 amino acids and has a molecular weight of 51.6 kDa. The number of negatively charged residues (Asp+Glu) and positively charged residues (Arg + Lys) were 62 and 42, respectively. The theoretical pI and aliphatic index of Der f 33 were 5.04 and 79.11, respectively. The GRAVY and instability index were -0.286 and 43.23, respectively.
The results of InterPro v56.0 and Superfamily v1.75 showed that Der f 33 belonged to the alpha-tubulin protein family (InterPro No. IPR002452) and tubulin protein superfamily (InterPro No. IPR000217). Prosite analysis of Der p 33 revealed that it contained a TUBULIN pattern (PS00227, 149-155, GGGTGSG). The computed results of TMHMM Server 2.0 showed that Der f 33 has no transmembrane helices, and the protein sequences are all located outside of the membrane.

Tertiary structure construction and analysis
As the homology modeling template, Cytotoxic Dolastatin 10 Analogues (PDB accession No.: 4X20) have a high sequence identity (82%), lower e-value (0.0) and a high score (761) with Der f 33.
The Ramachandran plot of tertiary structure showed that 86.3% amino acid residues of Der f 33 were within the most favored regions, 12.3% of residues were in the additional allowed region, 0.5% residues in the generously allowed regions, and 1.0% residues in the disallowed region. The application of the ERRAT program showed that the overall quality factor is 85.34. VERIFY 3D program revealed that 88.72% of the residues had an averaged 3D-1D score X0.2. As indicated by the ProSa server, the Z-scores of Der f 33 and 4X20 are -8.89 and -8.68, respectively. The QMEAN Z-score of Der f 33 was -0.927 and Q value was 0.692 (Table 1). The tertiary structure of Der f 33 is shown in Figure 1.
In the secondary structure of Der f 33, the percentages of overall amino acids located in a-helices, b-sheets, and random coils are 33.41% (14 domains), 9.98% (9 domains), and 56.61%, respectively. The tertiary structure of Der f 33 also contain a-helices, b-sheets, and random coils, and the amino acid numbers of these three elements are slightly different from the secondary structures.

Discussion
HDM are important sources of inhalant and contact allergens that can cause a variety of allergic diseases (3).
Thus, molecular characterization and identification of epitopes of HDM allergens will promote a better understanding of immune response and promote an effective epitope-based vaccine design.
To better understand the structure and function of Der f 33, we first analyzed the basic sequence properties. The bioinformatics analyses showed that Der f 33 is a   hydrophilic (GRAVY) and unstable (instability index) protein, which has no transmembrane helices, and the protein sequences are all located outside of membrane. Homology modeling built a target structure based on the comparison with the data extracted from homologous sequences with suitable templates (33). A total 98.6% amino acid residues of Der f 33 were in favored and allowed regions, showing that the distribution of the amino acid is reasonable. The VERIFY 3D and ERRAT results showed that the tertiary structure of Der f 33 was good and had high resolution. The ProSa results showed that there was a high tertiary structure matching degree between Der f 33 protein and the template protein. The standard deviation value of QMEAN Z-score was less than 1, showing that the Der f 33 protein model variation rate was low, the overall folding and local structure both had high accuracy rate, and stereochemistry was reasonable. In addition, the Q value was between 0 and 1, showing that the predicted model of Der f 33 was reliable and could be adopted for this study.
The secondary and tertiary structure of Der f 33 both contain three elements (a-helices, b-sheets, and random coils); the amino acid percentages of these three elements in the tertiary structure differed slightly from the secondary structure. This phenomenon may be due to different methods of prediction for the secondary and tertiary structures.
Hydrophobicity, fragment flexibility/mobility, surface accessibility, polarity, exposed surface, and turns are important features for B cell antigenic epitope identification. These antigenic indexes showed the epitope-forming capacity of the Der f 33 amino acid sequence. Moreover, secondary and tertiary structures are important for B cell epitope prediction. The a-helices and b-sheets have higher chemical bond energy, making epitope formation difficult. Random coils are located in surface-exposed regions of a protein, which often contain epitope sequences (34). Integrating the results from the four programs and combining with the secondary and tertiary structures, the final B cell epitopes included six sequences: amino acid positions 34-45, 63-67, 103-108, 224-230, 308-316, and 365-377. The prediction results showed that T cell epitopes contained four sequences: amino acid positions 178-186, 241-249, 335-343, and 402-410.
Finally, allergen epitopes usually contained high proportion hydrophobic amino acids residues (Ala, Ser, Asn, Gly, and Lys) (35). The prediction results showed that the B and T cell epitopes of Der f 33 both contain multiple hydrophobic amino acids. However, these predicted epitopes require experimental verification.