Reverse vaccinology and subtractive genomics approaches for identifying common therapeutics against Mycobacterium leprae and Mycobacterium lepromatosis

Abstract Background Mycobacterium leprae and Mycobacterium lepromatosis are gram-positive bacterial pathogens and the causative agents of leprosy in humans across the world. The elimination of leprosy cannot be achieved by multidrug therapy alone, and highlights the need for new tools and drugs to prevent the emergence of new resistant strains. Methods In this study, our contribution includes the prediction of vaccine targets and new putative drugs against leprosy, using reverse vaccinology and subtractive genomics. Six strains of Mycobacterium leprae and Mycobacterium lepromatosis (4 and 2 strains, respectively) were used for comparison taking Mycobacterium leprae strain TN as the reference genome. Briefly, we used a combined reverse vaccinology and subtractive genomics approach. Results As a result, we identified 12 common putative antigenic proteins as vaccine targets and three common drug targets against Mycobacterium leprae and Mycobacterium lepromatosis. Furthermore, the docking analysis using 28 natural compounds with three drug targets was done. Conclusions The bis-naphthoquinone compound Diospyrin (CID 308140) obtained from indigenous plant Diospyros spp. showed the most favored binding affinity against predicted drug targets, which can be a candidate therapeutic target in the future against leprosy.


Conclusions:
The bis-naphthoquinone compound Diospyrin (CID 308140) obtained from indigenous plant Diospyros spp. showed the most favored binding affinity against predicted drug targets, which can be a candidate therapeutic target in the future against leprosy.

Background
Until 2008, the only organism known for causing leprosy was Mycobacterium leprae, but then a new species was identified as the causative agent of diffuse lepromatous leprosy (DLL). The newly identified species was Mycobacterium lepromatosis, obtained from the blood sample of two patients of Mexican origin who passed away because of the disease and identified as a causative agent for atypical leprosy [1][2][3]. This disease may occur at any age, mostly affects the skin, peripheral nerves, mucosal surface of the upper respiratory tract and eyes [4,5]. In terms of microbiology, M. lepromatosis is much similar to M. leprae, and both species are non-cultivable, acid-fast, and have the ability to infect peripheral nerves. Clinically and microbiologically, these two organisms are so similar that they were counseled to represent the "M. leprae complex" Singh et al. [6], like the Mycobacterium species that denotes the tuberculosis complex [7]. The transmission mechanism of leprosy is still uncertain; it is hypothesized to be transmitted by the firm contact between leprosy-infected and healthy individuals [8]. Emerging trends however point out to other possibilities of transmission through insects, which cannot be debarred completely [4,8]. The usual symptoms of the disease are skin lesions, which could be macule (flat), papules (raised) or nodules, and sensory loss [8]. According to WHO, 211,973 new cases of leprosy were reported globally in 2015 (2.9 new cases/ 100,000 people). Global statistics show that 94% of leprosy cases were reported in only fourteen countries and a high number of new cases indicate the degree of unremitting transmission of infections [9,10]. Approximately, 81% of the new cases worldwide are accounted from Brazil, India and Indonesia where it is currently the most endemic [11]. WHO's evaluation of Brazilian cases between 2011 and 2013 reveals ten areas with the highest endemicity, which is located mainly in the states of Bahia, Goiás, Mato Grosso, Maranhão, Pará, Rondônia and Tocantins. These places however represent almost 14% of the Brazilian population [11,12].
None of the laboratory tests is considered sufficient enough to diagnose leprosy. Usually, clinical data accompanied with semiological techniques like evaluation of skin sensitivity and pilocarpine or histamine testing, can conclude the diagnosis [8,13].
Presently, the diagnosis of leprosy is carried out by expert clinicians using well-defined criteria, beside the use of slit-skin smears and biopsies. With the decrease in occurrence of the disease, clinical expertise is also shrinking, leading to prolonged delays between onset of clinical signs and identification of disease, resulting in improper maintenance of transmission of M. leprae. Hence, efforts to eradicate the disease are undermined.
In the absence of impeccable tests to detect all M. leprae infected individuals, a diagnostic test to confirm leprosy at initial stages among symptomatic patients would be an adequate and certainly useful shorter-term conciliation [14].
Although leprosy is curable, the emergence of antibiotics resistant strains is of major concern and highlights the risk of the disease, especially for those that are under secondary prevention (chemotherapy) as the main component of their control strategy [8,13]. The multidrug therapy (MDT) was the major factor to decrease the leprosy burden from 1981 until the year 2005; afterwards, slower reduction was reported as Rifampicin resistance in various endemic areas against leprosy, which was the backbone of multidrug therapy of leprosy [9,15]. Because of this Rifampicin resistance, fluoroquinolones became the preferred category of second-line drugs. Unfortunately, the stains of M. leprae with quinolone-resistance have been reported in several countries [16]. This might be because of the wide use of quinolones for treating numerous types of infections. To meet the problem of containing the disease and responding to an increasing circulation of drug-resistant strains, it is essential to assess drug-sensitivity patterns globally [17][18][19]. These two organisms are microbiologically and clinically very similar but there is a 9% difference at the genome level that was reported in [6]. For the rapid identification of novel vaccine targets, reverse vaccinology is a popular and more conventional approach in the post-genomic era. Approaches like comparative and subtractive genomics and differential genome analyses are extensively used for therapeutic target identification in several human pathogens, including M. leprae [10], M. tuberculosis [20] Treponema pallidum [21], Haempphilis ducrei [22], Mycoplasma pneumonae [23], Corynebacterium diphtheriae [24] and many other pathogenic microorganisms. Here, in this work we applied the integrative in silico approaches of reverse vaccinology and subtractive genomics on M. leprae and Mycobacterium lepromatosis strains to identify common putative therapeutic targets from the genomic information. Furthermore, this study identifies plant-derived lead antimicrobial compounds, with favorable interactions, lowered energy values, and high complementarity with the predicted drug targets.

Data retrieval
The genome sequences of all six strains of M. leprae and M. lepromatosis (4 and 2 strains, respectively) were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/ genome/genomes/903&https://www.ncbi.nlm.nih.gov/genome/ genomes/36766). All genomes were annotated using the RAST server Rapid Annotation using Subsystem Technology] [25] for the homogenization in the functional annotation. The genome of M. smegmatis was used as a non-pathogenic reference where applicable.

Identification of conserved non-host homologous, pathogenicity islands and genomic islands
We compared 6 strains of M. leprae keeping M. leprae TN as the reference genome, using orthoMCL software [26], with an E-value of 1e-50. Proteins shared by all strains were considered to be a part of the core genome. To avoid the autoimmunity, identified candidates for drug and/or vaccine should be nonhomologues to Human [21,24,27]. Therefore, these core genes were subjected to orthoMCL software with default parameter (E-value = 1e-50 and >98% identity over >98% of query sequence length) against human genome for the identification of non-host homolog targets. The identification of pathogenicity islands in the genome of M. leprae TN was performed with GIPSy (Genomic Island Prediction Software) [28] through the detection of regions presenting: deviations in genomic signature, i.e., anomalous G+C and/or codon usage deviation; presence of transposase, virulence or flanking tRNA genes; and absence in the nonpathogenic organism Mycobacterium smegmatis.

Reverse vaccinology approach for the prediction of putative vaccine targets against M. leprae and M. lepromatosis.
The non-host homologous conserved proteome of M. leprae TN was screened using SurfG+ software [29] to identify secreted, membrane and putative surface exposed proteins. We searched cleavage sites and transmembrane helices and functional domains in the protein to identify vaccine candidates by online tools, SignalP predicts the presence of signal peptides and the location of their cleavage sites in proteins in microorganisms [30], TMHMM (predict the trans-membrane helix in protein) [31] and InterProScan (InterPro provides functional analysis of protein sequences by predicting the presence of domains and important sites as well as classify them into families) [32]. Furthermore, the dataset was screened by Vaxign [33], an online web tool based on reverse vaccinology approach, to search for proteins with the following features: Major histocompatibility complex (MHC) I and MHC II binding properties; adhesion probability greater than 0.51; and no similarity to host proteins.

High throughput structural modelling and prioritization of identified drug targets
MHOLline (http://www.mholline.lncc.br) was used to predict the 3D models of a complete set of proteins for the whole conserved core of the non-host homologous proteome. MHOLline utilizes multi-fasta file of amino acids as an input data for model generation using the MODELLER program. The adopted methodology was revised accordingly from the original work published earlier [21,24,[34][35][36]. The final identified candidate drug targets were prioritized based on criteria (i) the target must have no homology with host; (ii) the target must be a core gene of the pathogen; (iii) the target involved in the pathogen's unique pathway or multiple pathways are considered superior; (iv) pathways with multiple targets are superior to those having just a single target; (v) in the case of enzyme, targets in hostpathogen common pathways, should not be of the same class of protein, and the EC. no. (Enzyme Commission number) of the target should not match that of any protein product of the host; and (vi) the targets related to pathogenic island or virulence proteins are considered superior, as described by Barh et al. [27], Hassan et al. [35] and Jamal et al. [24] [21,[34][35][36]. For this, the pathways of identified drug target proteins have been checked using (KEGG Kyoto Encyclopedia of Genes and Genomes) [37], and functionality analysis (Molecular Function and Biological Process) was done using UniProt Universal Protein Resource [38,39]. Furthermore, PAIDB (Pathogenicity island database) [40] was used to cross check the virulence other than GIPSy [28].

Ligand library and docking
The ligand library of 28 natural compounds was used for docking from Tiwari et al. [41] and Jaiswal et al. [21]. The 3D structures of all target proteins were carefully analyzed for structural error in ADT (Auto Dock Tool), and MGLTool (Molecular Graphics Laboratory, version-1.5.4) [42]. Grid box parameters and configuration files were generated separately for all targets. Configuration files for the targets Ml_TN_0449, Ml_TN_1385 and Ml_TN_3807, almost covering the whole proteins, were set as described below. Target  The 3D poses of docked molecules were analyzed in Chimera [44]. Molecular function (MF) and biological process (BP) for each target protein were determined using UniProt [38,39]. The biochemical pathways of these proteins were checked using KEGG (Kyoto Encyclopedia of Genes and Genomes) [37] and SurfG+ software [29], and their virulence was checked using GIPSy [28].

Results
Total number of proteins described in each section and methodologies used in this work are described in the workflow ( Figure 1).

Identification of conserved non-host homologous proteins and pathogenicity island
Comparative genomics approach was carried out in order to cluster orthologous genes to get a framework to incorporate information from multiple genomes, highlighting the conservation and divergence of gene families and biological processes; for pathogens clustering, orthologs can simplify the identification of drug and/or vaccine targets. We compared 6 strains of M. leprae and M. lepromatosis (4 and 2 strains, respectively) ( Table 1) keeping M. leprae TN as reference using the orthoMCL software [26]. A total of 1444 shared proteins by all species were considered to be a part of the core genome. Considering human as a host, a set of 411 conserved non-host homologous proteins were identified. The knowledge about pathogenicity islands, the virulence factors, and their mobility structure is helpful in understanding the bacterial evolution and their interactions with host cells [45]. The prediction of Genomic islands GIs] were subsequently performed using GIPSy. GIs are gene clusters, usually >8 kb in size, likely acquired via horizontal gene transfers (HGT) [28]. GIs considerably influence bacterial evolution and play a role in the environmental or host adaptation of bacterial species [46]. For M. leprae and M. lepromatosis strains, 32 putative GIs were identified through GIPSy using M. smegmatis as a closely related non-pathogenic organism.
Of the 32 GIs, 11 are classified as pathogenicity islands (PAIs), i.e., they present high concentration of virulence factors and are absent in the aforementioned closely related non-pathogenic organism ( Figure 2).

Localization and vaccine target prediction
The possible vaccine target identification, subcellular localization and the secretion of pathogenic proteins are important factors for consideration. The secreted and membrane proteins are the first to be in contact with the host, signaling an immune response. Thus, the prediction of the exo-proteome or secretome, composed of the proteins limited to the extracellular matrix or outer membrane of the organism, is of great importance for reverse vaccinology strategies. Therefore, reverse vaccinology in combination with subtractive genomics can offer more reliable output as compared to screening of the whole dataset without taking into account the prioritizing parameters [27,47]. The subcellular localization of conserved non-host homologous proteins of M. leprae and M. lepromatosis strains were predicted with SurfG+. We identified 141 genes as putative surface-exposed (PSE) proteins, secreted proteins or membrane proteins and 270 cytoplasmic proteins (Table 2, Figure 3). We used 141 proteins to predict vaccine candidates with adhesion probability of 0.51 using Vaxign. We identified 12 proteins in M. leprae strain TN, which are commonly shared with M. lepromatosis and may be considered as potential common vaccine candidates for the leprosy disease.
Some prior computational and experimental studies on M. leprae have identified antigenic targets for the development of    Secreted protein a Putative surface exposed vaccine, where they have shown antigens that are recognized by antibody response of the patients [10,48,49]. However, they have only focused on antigenic targets against M. leprae. In our reverse vaccinology analysis, we have also worked with M. lepromatosis to identify a more global vaccine against all manifestations of the disease (i.e., leprosy and diffuse lepromatous leprosy). We have identified 12 vaccine targets (Table 3) with adhesion probabilities greater than 0.51; interestingly, we found that the protein diacylglycerol acyltransferase/mycolyltransferase (ML0098/ NP_301196.1) was identified as an antigenic protein in the previous in vitro study of Kumar et al. [21], which may validate the importance of our in silico predictions for the identification of common vaccine candidates against the leprosy disease.

High throughput structural modelling
Cytoplasmic proteins are also very important for the physiology of bacteria, as they are involved in many important metabolic functions. The pivotal role of cytoplasmic proteins in the maintenance of cell viability makes them more favorable as drug targets [47,50]. Therefore, the identified 270 cytoplasmic proteins were submitted to the online tool MHOLline for prediction of their 3D structure. The transmembrane regions are detected by the program HMMTOP. The BLAST algorithm was used for the identification of template structures by performing a random search against the Protein Data Bank (PDB) [51]. Blast Automatic Targeting for Structures (BATS) performs the refinement in the template search. BATS selects the best template for 3D model generation and performs automated alignment used by the Modeller program. Moreover, it gathers all the BLAST output files into four distinctive groups, i.e. G0, G1, G2, and G3, according to the following criteria; G0 = Not aligned sequence; G1 = E-value > 10e -5 or Identity < 15%; G2 = E-value ≤ 10e -5 and Identity ≥ 25% AND LVI ≤ 0.7; G3 = E-value ≤ 10e -5 and Identity ≤ 15% and <25% OR LVI> 0.7.
Length Variation Index (LVI) is a concept of coverage calculated by the MHOLline software to identify the number of aligned amino acids between query and subject sequences (LVI ≤ 0.1 is equivalent to a coverage ≥ 90%). Only the first three distinct quality model groups of G2 were taken into consideration in this study; these were: 1-Very High quality model sequences (identity ≥ 75%) (LVI ≤ 0.1), 2-High quality model sequences (identity ≥ 50% and < 75%) (LVI ≤ 0.1), and 3-Good quality model sequences (identity ≥ 50%) (LVI > 0.1 and ≤ 0.3) (http:// www.mholline.lncc.br) [21,24,34]. Therefore, all the considered protein 3D models were constructed from sequences for which the template was available with identity ≥ 50%. We identified 75 proteins (Very High: 42, High: 24 and Good: 9) in the first three distinct quality model groups of G2. After that, out of these 75 proteins, the ones that were present in any identified GIs were reported as candidate drug targets. As a result, we found 3 non-host homologues proteins. Furthermore, these 3 proteins were considered for the drug target prioritization and docking studies (Table 4).

Virtual screening and molecular docking of non-host homologous targets
For each targeted protein (Ml _TN_0449-ML0294 NP_301331.1ThiC), Ml_TN_1385-ML0808 (entC), Ml_TN_3807-ML2123 (NP_302402.1), 28 natural antimicrobial compounds were docked to examine each molecule individually for the selection of the final set of promising molecules that showed favorable interactions with the active residues of the target. We considered the lowest Autodock vina binding affinity for the molecules and interactions with the target residues. The biological importance for each target is described in Table 4 along with an analysis of the predicted protein-ligand interaction(s). The name of molecules, Autodock vina binding affinity scores for the selected ligands and number of predicted hydrogen bonds with the interacting residues involved are shown for each target protein (Tables 5-7).
NP_301331.1 (ThiC, Thiamine biosynthesis Protein) is the only known enzyme in vivo that is required for the conversion of AIR (5-amino-imidazole ribonucleotide) to HMP-P (4-amino-5-hydroxymethyl-2-methylpyrimidine phosphate). Inhibitors of these enzymes are capable of blocking the endogenous Thiamine biosynthesis leading to vitamin deficiency, and hence responsible for damaging the survival and growth of the cell. It has been reported as possible drug target for M. tuberculosis [52]. Essential enzymes of the thiamine biosynthetic pathway are possible targets for antibiotic development [53]. Based on the crystallographic structural comparison of the ThiC template (PDB ID: 4S28), none of the active site residues were identified. The docking analysis was performed to identify the minimum energy binding affinity score. Table 5 and Figure 4 show the set of 3 most interacting ligands according to their minimum affinity and the number of hydrogen bond interactions.  Ml_TN_1385 (entC, Putative phosphoglycerate mutase family protein) is involved in the biosynthesis of enterobactin. Concisely, isochorismate is a common predecessor for the siderophore enterobactin and menaquinone (vitamin K2) biosynthesis in E. coli, which is shaped by shikimate pathway from chorismate by the enzyme isochorismate synthase encoded by the entC gene [54]. A comparison between modeled structure and the template (PDB ID: 2A69) could not identify any active site residue. The docking analysis was performed to identify the high ranked minimum energy binding affinity score. Table 6 and Figure 5 show the set of 3 best interacting ligands according to their minimum affinity and the number of hydrogen bond interactions.
NP_302402.1 (ML2123, Two-component system response regulator, TCS regulator): the TCS is known as a basic mechanism of stimulus-response coupling that helps in sensing and Yes a Molecular weight was determined using the ProtParam tool (http://web.expasy.org/protparam/). b Molecular function (MF) and biological process (BP) for each target protein was determined using UniProt. c Cellular localization of pathogen targets was performed using SurfG+. d KEGG was used to find the role of these targets in different cellular pathways. e PAIDB and GIPSy were used to check if the putative targets are involved in pathogen virulence.     responding to changes in different environmental conditions in microorganisms. This system is found mostly in bacteria, in domains of microorganisms, and accomplishes signal transduction through phosphorylation of its cognate response regulator. This signaling approach for coupling changes in the environment to cellular physiology is abundant throughout the bacteria. These signaling proteins are found in all the sequenced bacterial genomes [55,56]. Their abundance differs in each domain, where His-Asp phosphotransfer system accounts for the mainstream signaling pathways in eubacteria, but are rare in eukaryotes [57]. Based on the crystallographic structural comparison with the template (PDB ID: 1YS7), none of the active site residues were identified. The docking analysis was performed to identify the minimum energy binding affinity score. Table 7 and Figure 6 show the set of 3 best interacting ligands according to their minimum affinity and the number of hydrogen bond interactions. In our docking analysis, the drug molecule Diospyrin (CID 308140) showed good binding affinity with all three drug targets. Diospyrin is a DNA Gyrase inhibitor with a different mechanism of action and it has been reported as a possible therapeutic target against Mycobacterium tuberculosis [58]. The binding strength of our identified molecules with Diospyrin suggests that the latter can be potentially used as a new drug for the treatment of leprosy.

Discussion
Leprosy is an infectious disease that targets skin and peripheral nerves caused by Mycobacterium leprae and Mycobacterium lepromatosis. It is curable if identified on time, however a delay in diagnosis and treatment, could lead to permanent nerve damage that cannot be retreated by antibiotics. Leprosy is an important health concern around the globe and it is prevalent in many regions of the world and also a public health problem in Brazil. Yearly, more than 30,000 new cases of leprosy are diagnosed in Brazil [59]. Moreover, it presents a wide range of clinical manifestations, which are dependent on pathogen and host interaction, and are allied to the degree of immunity to the bacillus. Currently, the main strategy for the prophylaxis of leprosy is to identify the infection at an early stage and treat it, because there is no specific vaccine against M. leprae; although, the BCG vaccine is widely acclaimed and used in endemic countries, with reliable evidence of its protection against leprosy [8,13] The bacteria have developed resistance against several antibiotics, thereby obliging the scientific community to start investigating new therapeutic targets against M. leprae [10,60]. The comparative genomics, subtractive genomics and reverse vaccinology of 6 genomic strains of M. leprae and M. lepromatosis identified new vaccine and drug targets that could be tested in the near future in order to solve this public health problem. We identified 12 non-host homologous proteins, which can be used as vaccine candidates and 3 non-host homologous proteins as drug targets. The molecular docking analysis showed Diospyrin (CID 308140) as the most promising compound with the best interactions with our identified drug targets. Compound Diospyrin obtained from indigenous plant Diospyros spp possessing anti-leishmanial [58,61] was the best drug candidate in our analysis, which has already been reported as a potential therapeutic agent against Mycobacterium tuberculosis and could be considered for antimicrobial chemotherapy in future studies for the development of drugs and vaccines against leprosy disease.

Conclusions
We used bioinformatics approaches in this study for the identification of common potential drug and vaccine candidates against M. leprae and M. lepromatosis. The 6 genomic strains of M. leprae and M. lepromatosis were used. Moreover, reverse vaccinology and subtractive genomics approaches were employed for the prediction of new drugs and vaccine candidates. After the detailed in silico analysis, we present 12 non-host homologous protein targets as vaccine candidates and 3 non-host homologous proteins as drug targets. We hypothesize that these identified therapeutic targets and antimicrobial drugs [bis-naphthoquinone compound Diospirin (CID 308140)] could be considered for prophylaxis of leprosy and hence should be subjected to further experimental validations.