Systematic review of reverse vaccinology and immunoinformatics data for non-viral sexually transmitted infections

: Sexually Transmitted Infections (STIs) are a public health burden rising in developed and developing nations. The World Health Organization estimates nearly 374 million new cases of curable STIs yearly. Global efforts to control their spread have been insufficient in fulfilling their objective. As there is no vaccine for many of these infections, these efforts are focused on education and condom distribution. The development of vaccines for STIs is vital for successfully halting their spread. The field of immunoinformatics is a powerful new tool for vaccine development, allowing for the identification of vaccine candidates within a bacterium’s genome and allowing for the design of new genome-based vaccine peptides. The goal of this review was to evaluate the usage of immunoinformatics in research focused on non-viral STIs, identifying fields where research efforts are concentrated. Here we describe gaps in applying these techniques, as in the case of Treponema pallidum


INTRODUCTION
According to the World Health Organization (WHO), an estimated 1 million new cases of sexually transmitted infections (STIs) occur every day ("Global progress report on HIV, viral sexually transmitted infections, 2021," 2021, "Sexually transmitted infections (STIs)," 2022, Fatima et al. 2022) .Every year, nearly 374 million cases of curable STIs such as syphilis, chlamydia, and trichomoniasis are acquired ("Global progress report on HIV, viral sexually transmitted infections, 2021," 2021, "Sexually transmitted infections (STIs)," 2022, Fatima et al. 2022).Due to the unavailability of vaccines, prophylactic measures for their prevention are generally based on sexual education and condom distribution (Fatima et al. 2022).While these methods can be effective, they face low adoption rates in populations with limited access to these resources and consistently fail to reach STI-vulnerable populations ("Global health sector strategy on Sexually Transmitted Infections, 2016-2021", Fatima et al. 2022).Thus, the WHO's strategy on STIs highlights the need for developing vaccines against these conditions as essential for their control and eventual eradication ("Global health sector strategy on Sexually Transmitted Infections, 2016-2021").
The application of immunoinformatics can be a significant boon to efforts to develop STI vaccines and tools for diagnostics and treatment.This recent field seeks to apply the repertoire of currently sequenced and understood genomes to the development of vaccines, drugs, and diagnostic methods (Ramana et al. 2020, Oli et al. 2020).These techniques are made possible by extensive databases generated by previous immunological studies and genome sequencing efforts (Oli et al. 2020).The application of computational techniques, such as machine learning, to these databases, allows for the exploration of a given bacterium's genome for sequences of interest in silico.
Computational techniques cen be leveraged in vaccine development efforts by identifying vaccine candidates in a microorganism's genome (Jaiswal et al. 2017), as well as by rationalizing vaccinal peptide design (Martinelli 2022).This review aims to evaluate the current state of the art in using these methodologies towards sexually transmitted infections.Specifically, we have examined the study of curable, nonviral STI-causing microorganisms, such as Treponema pallidum, Neisseria gonorrhoeae, Chlamydia trachomatis, Trichomonas vaginalis, Mycoplasma genitalium and Haemophilus ducreyi, identifying fields where research is already being performed and where more investment is required.These pathogens have been focused on as they are regarded by the WHO as significant burdens to be eradicated ("Global progress report on HIV, viral sexually transmitted infections, 2021," 2021, "Sexually transmitted infections (STIs)," 2022, Fatima et al. 2022).Research investment in this field may reduce the monetary and labor costs involved in vaccine development, which is slow and costly (Moyle et al. 2013, Martinelli 2022).

Data collection and filtering
The present systematic review was performed per the benchmark work Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Page et al. 2021).Searches were performed amongst all documents in the Scopus, Web of Science, and PubMed databases published until the 22nd of March, 2022.Filtering was performed in steps: [1] identification of documents including selected keywords in the databases; [2] automated document triage; and [3] Manual evaluation of eligibility.
Documents were exported and extracted as a CSV file through the format_input.pyscript ( Code S2), where the Title, Publication year, Digital Object Identifier (DOI), Document type, Language, and Author(s) were extracted.Documents lacking a DOI and duplicates were removed.Redundancy between the three databases has been removed through the remove_duplicates.py script (Code S3).Articles were labeled based on the databases on which they could be found.The Federal University of Minas Gerais' internal library network access permissions were used to download each article in PDF format through its DOI.
Three separate reviewers performed document analysis independently, with a fourth being consulted in case of disagreement.Articles were first analyzed based on their titles and abstracts, then reevaluated based on two sets of eligibility and exclusion criteria (Table I):

Bibliographic coupling and word cloud analysis
The VOSviewer software (version 1.6.8)was used for bibliographic coupling between authors and publications (Van Eck et al. 2007, 2010).This analysis creates a citation network based on the articles, describing associations between authors and, by combining clustering and visualization techniques, allows for better analysis.A word cloud was generated based on the PDF files using the NVivo v.20.5.0 software (QSR International Pty Ltd., 2020).

Data analysis
Data on the following subjects were extracted from the selected documents: (1) publication  date; (2) methodology, (3) drug target prediction; (4) vaccine candidate identification; and (5) multi-epitope vaccine design.A map containing information on publications per country and the frequency in which each microorganism was studied was generated using ggplot2 v3.3.6 and scatterpie v0.1.7.The bar graph used to demonstrate the evolution of scientific production over the years was generated through ggplot2 v3.3.6.A circular visualization generated through the circlize package was utilized to facilitate interaction data analysis (Gu et al. 2014).Analyses of each paper's different content were performed using UpSetR version 1.4.0 (Conway et al. 2017).Association networks relating drug targets and vaccine candidates to their targets were generated using Gephi v0.9.2, using the Yifan Hu algorithm to distribute nodes and links.

Study characteristics
A diagram of the study selection process for this review is shown in Figure 1.Out of 971 articles, automatic filtering removed 316 documents.
From the 655 documents that were manually reviewed, 7 exclusion categories were reported.Reason 3 (Not involving an STI-causing bacteria) was responsible for most exclusions (207), and reason 4 (Article not in English) was responsible for the least exclusions (2).Only three papers were excluded for a reason 6 (Not found).In total, 54 articles fulfilled the inclusion criteria (Supplementary Material -Table SI).
Bibliographic coupling analysis of the 54 studies shows author association based on a citation network (Figure 2).While there was connectivity between the author groups, interplay was limited, suggesting a lack of communication between them.
Word clouds use textual mining algorithms to determine the most frequently relevant words in a set of documents.In this work, we have displayed the 100 most frequent terms in the set that were included in the keywords used to perform database searches.The words "protein/ proteins" and "vaccine" were the most frequent.The words "cell" and "gonorrhoeae" can also be highlighted as frequent, underlining a greater interest in gonorrhea research (Figure 3).
The selected studies were distributed between six continents: North America (United States of America-21; Mexico-1); South America (Brazil-3; Argentina-1); Asia (China-5; Pakistan-5; India-7; Iran-1); Europe (Sweden-1; Germany-1; Denmark-1; United Kingdom-1); and Oceania (Australia-4) (Figure 4).Research efforts were concentrated in the USA, with more research directed at Neisseria gonorrhoeae.In other countries, a more significant frequency of research was directed at the Chlamydia trachomatis pathogen.Only Sweden, South Africa, Brazil, and Mexico did not perform any study on Chlamydia immunoinformatics.Treponema pallidum was also represented in multiple countries (USA, Brazil, Sweden, and China), with more studies being developed in the US.
Supplementary Material Figure S1 shows the scientific production in the field of STI immunoinformatics over its history.The earliest studies in this field were performed in 2004, with an increase in production in 2010.In 2006In , 2009In , and 2011 no research was performed in this field.Booms in production can be seen in 2016 and 2021.

Methodological characteristics
The usage of immunoinformatics-based tools in support of drug target and vaccine candidate identification studies is an expanding field.Among these tools, Reverse Vaccinology (R.V.), Virtual Drug Screening (V.D.S.), and the rational design of multi-epitope vaccines (M.V.) are essential highlights.
Of the 54 selected articles, the most commonly used methodology was R.V. ( 52), followed by M.V.-based studies (13), applied majoritarian to Neisseria gonorrhoeae and Chlamydia trachomatis research.Virtual Drug Screening analyses were less common (Figure 5) [Table SII].
Despite being a widely applied set of methodologies in studying some microorganisms, there are still gaps in knowledge that need to be filled.In STI research, a wide range of tools still needs to be applied in certain areas, such as research on Haemophilus ducreyi, Trichomonas vaginalis, and Treponema pallidum.

Drug target identification
In this section of the systematic literature review, we assess the current state of the art in using in silico techniques to identify drug targets in the context of STIs.Of 54 articles carefully analyzed articles, 14 identified 97 drug targets for STIcausing microorganisms.Out of 14 articles, only 4 provided the information related to the active site, binding affinity (binding score), and ligand library used for docking analysis [Figure S2, Table SIII].No standard methodology was observed between articles, with various strategies such as: subtractive genomics, essentiality analysis,  and subcellular localization analysis used for the identification.
In comparing all 14 out of 54 selected articles for drug target identification, we identified 97 drug targets reported in different STIs. Figure 6 shows the complex network of identified drug targets of STIs used in this work.Druggable targets were identified in C. trachomatis (Aslam et (Barh et al. 2010, Ragland et al. 2018, El-Rami et al. 2019, Tanwer et al. 2020), and T. pallidum (Dwivedi et al. 2015, Kumar Jaiswal et al. 2022).Of the identified drug targets, two were common to multiple microorganisms.The ddl gene was identified in both N. gonorrhoeae and T. pallidum, and the SecA gene was identified in C. trachomatis, T. pallidum and M. genitalium.

Vaccine candidate identification
Of 54 articles under analysis, 27 proposed vaccine candidates for the STIs being researched.Of those, eight articles provided only basic information regarding the candidates and their function, without information on subcellular localization and adhesion probability.Most articles included information on the candidate's subcellular localization in the pathogen's cell, performed through a wide variety of differing tools [Figure S3].Of those, nine only provided sublocalization, six provided the antigenicity per the Vaxijen tool, and three provided adhesion probability as predicted by Vaxign.SIV].There was no standardization between articles that performed vaccine candidate prediction, with different protocols being performed in the selection of putative candidates and evaluation of their vaccinal potential.

One article provided antigenicity without predicting subcellular localization [Table
From the 27 articles that proposed vaccine candidates, 274 proteins were proposed between the different STIs. 4 proteins had inactive Uniprot entries, and 61 were unavailable in the Uniprot database.Figure 7 shows a complex network of vaccine candidates and the microorganisms in which they were identified.The most represented microorganisms were N. gonorrhoeae, M. genitalium, and C. trachomatis, with 88, 55 and 30 proposed candidates, respectively.Other bacteria had between 1 and 16 proposed proteins.An automated search between the annotated proteins found no common candidates between the microorganisms.
Most proteins studied were either membrane, secreted, or otherwise surface exposed.There was no standardization in the terms used to describe the proteins' sublocalization, with each article using differing terminology to describe the same meaning, and imprecise language in many cases.EAAAK, AAY, GPGPG, and KK linkers were used as connectors for epitopes.Their primary role is to link the immunogenic epitopes of the recombinant protein.They may be flexible, rigid or cleavable linkers (Chen et al. 2013).The rigid EAAAK linker was used in all three multi-epitope constructs as an epitope separator.Flexible KK and GPGPG linkers play a role in producing a better conformation of the construct (Dong et al. 2020).The AAY linker is the cleavage site for the mammalian proteasomes, allowing the epitopes that flank this linker to get disconnected within cells (Ayyagari et al. 2020).The flexible GPGPG linker is reported to increase construct solubility, providing high accessibility and flexibility for adjacent domains (Tarrahimofrad et al. 2021).

Epitope prediction and multi-epitope vaccine design
7 other works (202, 220, 243, 450, 463, 603, and 643) suggested epitopes as suitable activators of MHC-I, MHC-II, and/or B-cells [Table SV].These epitopes were predicted to activate the host immune system effectively, but were not included in the design of a multi-epitope construct.These studies were discarded from the multi-epitope vaccine design analysis.Theses articles correctly indicate their vaccine candidate source, but with no standardization of what protein ID code (NCBI, Uniprot) is used.

DISCUSSION
Despite being treatable and detectable, bacterial Sexually Transmitted Infections have been in resurgence in highly developed countries ("Global health sector strategy on Sexually Transmitted Infections, 2016-2021", Spiteri et al. 2019)).Vital innovations in combating STIs can be achieved with the help of immunoinformatics ("Global health sector strategy on Sexually Transmitted Infections, 2016-2021", Jameie et al. 2021, Martinelli, 2022), as it allows for the screening of genome sets and identification of putative candidates (Pizza et al. 2000, Jaiswal et al. 2017).In the case of T. pallidum, where vaccine candidate identification and testing can be made costly by the bacterium's characteristics, in silico methodologies are vital to vaccine development (Cameron et al. 2014, Hook 2017, Singh et al. 2020).In addition to vaccine candidate identification, immunoinformatics also allows for the design of immunogenic peptides based on the sequences of existing vaccine candidates, or multi-epitope vaccines (Moyle et al. 2013, Jameie et al. 2021, Martinelli 2022).Other genome and proteomebased approaches may also seek to screen for candidate drug targets in the genome, which may open new avenues for STI treatment (Barh et al. 2011).
This review aimed to identify fields in which immunoinformatics has been applied within STI research, with a goal of determining fields in which investment of research time and funding can be beneficial.Immunoinformatics is a recent field, with its first application to bacterial STIs being published in 2004, and production was low for the following eleven years.More recently, since 2016, there has been a rise in prominence for the field.Most of this research has been conducted in the research hubs of the USA, China, Pakistan, and India.The bibliographic coupling-based citation network also points to this concentration of research efforts in tightly knit hubs.
Most immunoinformatics-based STI research have centered on C. trachomatis and N. gonorrhoeae.Chlamydia is estimated to have been responsible for 129 million new STI cases, and gonorrhea for 82 million new STI cases in 2020 ("Global progress report on HIV, viral sexually transmitted infections, 2021," 2021, "Sexually transmitted infections (STIs)," 2022, Fatima et al. 2022).Other pathogens such as T. pallidum, M. genitalium, H. ducreyi, and T. vaginalis have received little attention despite their occurrence rates (7.1 and 156 million for T. pallidum and Trichomonas vaginalis, respectively) ("Global progress report on HIV, viral sexually transmitted infections, 2021," 2021, "Sexually transmitted infections (STIs)," 2022, Fatima et al. 2022).This represents a gap in the knowledge we have on these infections.The lack of studies regarding T. vaginalis is of particular concern, considering the occurrence of trichomoniasis A gap was also seen in the applied methodologies.While most articles performed R.V. analyses to identify vaccine candidates within the genomes, few used these candidates to design a multi-epitope vaccine.Fewer articles still identified potential drug targets within the genome.A common thread between articles that performed both R.V. and V.D.S. analyses is a lack of consistency between methodological approaches.There was no standardization in the tools, methods, or terminology used to determine and describe good drug targets and vaccine candidates.As for epitope prediction and multi-epitope vaccine design, most works only performed epitope prediction on vaccine candidates without designing a vaccine from the epitopes.Standardization is vital to the quality of research, particularly in a field such as the field of Omics, composed of various interacting highthroughput techniques (Holmes et al. 2010).

CONCLUSION
This review sought to evaluate the current literature on the development of vaccines for non-viral Sexually Transmitted Infections through the lens of Immunoinformatics towards vaccine and drug development.In doing so, we identify gaps in the current state of the art, where diseases like Chlamydia or Gonorrhea have received more attention than the others.Diseases like Trichomoniasis and Syphilis still require work in vaccine candidate identification and development.We have found that works in the field have not been determined common drug and vaccine targets between the microorganisms, which could be used in the development of multivalent vaccines.We also point to issues in the methodological standardization of the field, and the need to develop consistent tools and protocols for this field.
Five author groups were determined by this network, each represented by a different node color.The most cited authors were Zielk et al. (2014 and 2016), Semchenko et al. (2019), and Butt et al. (2012).

Figure 2 .
Figure 2. Citation network Node size correlates to citation numbers; colors indicate the formation of strongly associated groups.

Figure 3 .
Figure 3. Word cloud constructed based on the selected article set.Word size indicates a higher frequency of the word.

Figure 4 .
Figure 4. Frequency of STI immunoinformatics studies and their worldwide distribution, as well as the most studied STI-associated bacteria in each country.

Figure 5 .
Figure 5. Chord-plot associating methodology usage and the microorganisms they are applied to.

Only 3
papers (21, 602, and 609) designed a muti-epitope construct [Figure S4].The first two constructs were designed for C. trachomatis and the last for M. genitalium.Beta defensin, and Cholera toxin subunit B (CTB) proteins were coupled to the N-terminal of the epitopes as adjuvant in the C. trachomatis works, respectively.No adjuvant was indicated in the last paper.

Figure 6 .
Figure 6.Complex network connecting drug targets to the microorganisms in which they were identified.Node size is proportional to the node's degree.Color legend is embedded.

Figure 7 .
Figure 7. Complex network connecting vaccine candidates to the microorganisms in which they were identified.Node size is proportional to the node's degree.Color legend is embedded.

Table I .
Eligibility and exclusion criteria for the inclusion of articles in the review.
2Method: immunoinformatics, such as the application of Reverse Vaccinology and the development of multiepitope chimeric vaccines.