Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome

The population of India harbors one of the world’s most highly diverse gene pools, owing to the influx of successive waves of immigrants over regular periods in time. Several phylogenetic studies involving mitochondrial DNA and Y chromosomal variation have demonstrated Europeans to have been the first settlers in India. Nevertheless, certain controversy exists, due to the support given to the thesis that colonization was by the Austro-Asiatic group, prior to the Europeans. Thus, the aim was to investigate pre-historic colonization of India by anatomically modern humans, using conserved stretches of five amino acid (EPIYA) sequences in the cagA gene of Helicobacter pylori. Simultaneously, the existence of a pathogenic relationship of tyrosine phosphorylation motifs (TPMs), in 32 H. pylori strains isolated from subjects with several forms of gastric diseases, was also explored. High resolution sequence analysis of the above described genes was performed. The nucleotide sequences obtained were translated into amino acids using MEGA (version 4.0) software for EPIYA. An MJ-Network was constructed for obtaining TPM haplotypes by using NETWORK (version 4.5) software. The findings of the study suggest that Indian H. pylori strains share a common ancestry with Europeans. No specific association of haplotypes with the outcome of disease was revealed through additional network analysis of TPMs.


Introduction
The dawn of the 20 th century witnessed the discovery of one of the most controversial microorganisms, Helicobacter pylori (H. pylori), responsible for provoking a rupture in contemporary popular medical doctrines, thereby enforcing changes in global conceptions of gastroduodenal disorders (Marshall, 2001). In spite of a general consensus regarding the existence of a causal relationship, there is still disagreement as to how a single bacterium could possibly cause such a variety of disease conditions (Montecucco and Rappuoli, 2001). The apparent paradox suggests that the mere presence of H. pylori in the stomach is insufficient to cause gastric disease, this requiring one or more additional conditions (Crowe, 2005). Apart from several others, H. pylori must bear an arsenal of specific virulence-genes such as the cag-pathogenicity island (cag-PAI), the vacuolating associated cytotoxin gene A (vacA), the outer membrane protein A (oipA), and blood group antigen binding adhesin (babA), to be potentially toxigenic (Censini et al., 1996, Cover et al., 1994, Zambon et al., 2003. Infection by cag-PAI bearing H. pylori is recognized as increasing the risk of overt gastric disorders, such as peptic ulceration, gastric cancer and mucosa associated lymphoid tissue (MALT)-lymphoma. Seven genes of this pathogenicity island (hp0524, hp0525, hp0527, hp0528, hp0530, hp0532 and hp0544) form a typical needle-syringe assembly called the type IV secretion apparatus (T4SS) which translocates 120-145 kDa CagA proteins, either directly into host cells or into the bacterial environment (Bourzac and Guillemin, 2005;Christie and Cascales, 2005). During translocation, these undergo tyrosine phosphorylation by several members of the Src family kinases (SFK) such as c-Src, Fyn, Lyn and Yes (Stein et al, 2002). This phosphorylation is reported to occur at specific sites in the CagA protein known as tyrosine phosphorylation motifs (TPMs), characterized by the presence-of a stretch of conserved nucleotide sequences (CNS). Three predicted motifs (TPM-A, TPM-B and TPM-C) have already been reported, based on these CNSs (Odenbreit et al, 2000;Owen et al, 2003). In addition, phosphorylation is also known to occur at the Glu-Pro-Ile-Tyr-Ala (EPIYA) amino acid sequence in TPMs.
The Indian population is widely known for its unique genetic and cultural diversity. Numerous phylogenetic studies based on mitochondrial DNA (mt DNA) and Ychromosomal variation have proved that India played crucial role in the first major colonization by anatomically modern humans (AMH), at least half-a-million years ago (Chaubey et al, 2008). Although, to a certain extent these studies have brought enlightenment to the evolutionary history of AMHs, the number of waves and the periods of migration still remain uncertain and subject to dogmatic views.
Several investigators have used H. pylori as a biological model when studying waves of immigration, owing to co-evolution with its human host (Devi et al, 2007). Coincidentally, the phylogenetic analysis of H. pylori housekeeping gene sequences mirrors the migratory path of AMHs. The number and pattern of the five conserved amino acid sequences (EPIYA) present in the repeat-region of cagA have been well-explored in several phylogenetic studies, with the aim of dissecting the genetic origin of H. pylori strains. These have been broadly divided into Western CagA (WSS) and East-Asian CagA (ESS) specific sequences. WSS usually possess an A-B-C pattern characterized by the presence of flanking conserved amino acids, although several other subtypes of this pattern (ABC, ABCC, & ABCCC etc) are already known (Higashi et al, 2002). On the contrary, ESS contains a JSR region, previously defined by Yamaoka et al, (1999), besides possessing an EPIYA motif, designated "EPIYA-D, thereby justifying their classification as A-B-D (Azuma et al, 2004). Furthermore, only few studies (Owen et al., 2003) have focused on the status of CagA phosphorylation motifs (TPMs), with no data available from the Indian peninsula. Therefore, the present study attempted to address the prehistoric colonization of humans in India by using the H. pylori genome and, additionally, assess the association of haplotypes of tyrosine phosphorylation motifs with gastroduodenal diseases.

Materials and Methods
A total of 32 indigenous H. pylori strains isolated from individuals with various gastrointestinal disorders, and undergoing treatment at the Department of Gastro-enterology, Deccan College of Medical Sciences, were included for detailed analysis. Information regarding the clinical status and ethnic origin of the study-subjects are given in Table 1. Due, relevant ethical approval for undertaking the study, as well as written informed-consent from the participants prior to inclusion was obtained.
PCR based analysis was applied to the target genes, namely EPIYA motifs and tyrosine phosphorylation motifs-A, B, C, as reported previously, using designated oligonucleotide primers (Karita et al, 2003, Owen et al, 2003. Amplification was performed twice and the sequencing of the amplified products was done with both forward and re- Tiwari et al. 281  verse primers using 5.2.0 Version software from Applied Biosystems (Applied Biosystems, Foster City, USA). For analysis, Quality Values (QVs) were sought from the software itself. The sequences were edited and assembled using AutoAssembler (version 1.4) software (Applied Biosystems) to obtain consensus sequences. Furthermore, in order to minimize ambiguities in the sequences, the set-up was assembled with a minimum overlap of 20 bases and 20% percent error. The sequences were then translated into amino acid sequences by using MEGA (version 4.0) software (Tamura et al, 2007). Bootstrap phylogeny tests with 500 replications and 1234 seeds were used for this purpose. Finally, they were assigned to the Western or the East Asian specific groups based on the presence of C-or D-repeats in the EPIYA motifs, respectively (Figure 1 and Figure S1). Translated amino acid sequences of individual strains were comparatively analyzed using neutral sequences of the J99 H. pylori reference strain. The selection of model parameters was done using Median joining network, based on the Maximum Parsimony method from a set of splits, optimal realizations and reticulograms from a distance matrix. NETWORK (version 4.5) software was used for medianjoining network construction (see Figures S2 a-c) (Saitou and Nei, 1987). The sequences pertaining to three tyrosine phosphorylation motifs were aligned using AE000511 for TPM A and TPM B, as well as an AF202973 reference sequence for TPM C, the differences being noted and displayed in phylogenetic networks (Figures 2, 3 and 4). The cagA nucleotide sequences containing the tyrosine phosphorylation motifs of the 32 Indian isolates were deposited in GenBank (accession numbers FJ599712-FJ599743). A codon selection test was performed using an online non-synonymous to synonymous substitution ratio calculator (HIV Databases).

Results
We have analyzed a total 32 strains of H. pylori isolated from various clinical backgrounds (Table 1). Se-quence analysis of the repeat region in cagA for assessing EPIYA sequences, revealed the A-B-C pattern to be common in Europeans. Neither the A-B-D pattern (East Asian), 282 H. pylori EPIYA and tyrosine phosphorylation motifs    nor any other A-B-C subtypes were present in any of the 32 isolates ( Figure 1). The selection test gave a value of 1.4345, thereby signifying positive selection and co-evolution. Amplification followed by deduced amino acid sequence analysis at each of the three phosphorylation sites, TPM-A motif, characterized as KFGDQRY, at site 122 in all the strains analyzed (100%) ( Figure S2a). Similarly, the TPM-B motif, originally defined by the amino acid sequence KNS(T/g)EPIY, was found as KNEPIY at site 899 in all the sequences ( Figure S2b). Nevertheless, TPM-C, as characterized by KLKDSTKY, was found at site 1029 in only 3.1% of all the Indian strains screened ( Figure S2c).
High resolution analysis of the TPM-A motif revealed 16 distinct haplotypes (Figure 2 Distribution of all the cagA tyrosine phosphorylation motifs was clinically irrelevant, as TPM A and TPM B were found to be present in 100% of the strains (Figures 2 and 3), whereas TPM C was observed in only 1 (3.1%) (Figure 4). Similarly different haplotypes of TPM A, TPM B & TPM C also showed no disease specific association.

Discussion
The Indian microbial genome is a melting-pot for both evolutionary and pathogenic studies, seeing that it accounts for one of largest gene pools, with more than 1 billion denizens. The A-B-C pattern of EPIYA sequences in Indian strains of H. pylori represents a common ancestral root of origin with Europeans, as reported previously (Devi et al., 2007). Although both the time and number of migrations cannot be estimated with software, exploration through various approaches, such as anthropological (Mishra, 2001), historical (Kennedy et al, 1987), mitochondrial DNA and Y-chromosomal studies, have demonstrated that the Austro-Asiatic (AA) tribal groups were the first settlers of India (Majumder, 2001;Metspalu et al., 2004;Thangaraj et al., 2005;Kumar et al., 2007;Chaubey et al., 2008). Furthermore, this microbe is also known to have co-evolved with anatomically modern humans (AMH) in Africa (~50-70 kYa) (Linz et al., 2007). On considering the above hypothesis, one can say that AA language speakers would have been the first to introduce H. pylori into India, and not Indo-European, as previously reported (Devi et al, 2007). In fact Indo-Europeans played a major role in the later-day colonization and expansion of this bacterium during the Neolithic era of the Stone age (Mishra, 2001).
The AA family is broadly divided into two subfamilies, i.e. the Mundari and the Mon-Khmer (Ethnologue web-site). The former, found exclusively on the Indian sub-continent, are considered to have been traditional hunters, their feeding on uncooked food having been the most probable acquisition-route of H. pylori in humans (~50-70 kYa) (Mishra, 2001).
Although tyrosine phosphorylation reportedly occurs at any of the three motifs (TPM-A, TPM-B and TPM-C), detailed sequence-analysis of individual distribution proved to be of no prognostic value, as no site-specific mutation in any of the three tyrosine phosphorylation motifs was observed to be directly associated with disease status ( Figure S2 a, b, c), thereby implying the outcome to be TPM-independent. The presence of KNEPIY in the place of KNS(T/g)EPIY at site 899 in the strains studied, requires further investigation on large number of samples from different geographical locations of the Indian subcontinent ( Figure 3). These observations are in absolute conformity with those reported by Owen et al, (2003), who demonstrated there to be no association of the number and type of TPMs present, with the severity of the disease. Nonetheless, they reported relatively lower frequencies in all the three TPMs (Owen et al., 2003) than was the case in the present study. Another study from Costa Rica (Occhialini et al, 2001), reported relative frequencies of 100% and 58% for TPM A and B, respectively, but were unable to detect the TPM C motif in any of the strains studied. The reason for such discordance between studies is unclear thereby warranting detailed investigation of large clinical isolates from several geographical areas. Although, according to MJ-networks, TPM genes may not be pathogenetically relevant, an attempt was made to understand whether haplotypes play any specific, associated role in altering the outcome of a disease. Our high-resolution study based on MJ networks in 32 H. pylori strains showed this was not so.
In conclusion, the results of the present investigation support the dogma of European roots for Indian H. pylori. Nevertheless, their first introduction to the Indian subcontinent by Indo-Europeans remains a highly contentious issue, with sufficient reports favoring Austro-Asiatic speakers as having been the first settlers. Finally, sequence-analysis of the cagA tyrosine phosphorylation motifs revealed no association with their clinical presentation, as evident from frequency distribution and MJ-network analysis, thereby implying that the nature and severity of gastroduodenal diseases are independent of tyrosine phosphorylation motifs.

Supplementary Material
The following online material is available for this article Figure S1 -Deduced amino acid sequences of 3' repeat region of cagA gene from 32 Indian strains of H. pylori showing EPIYA sequences. This material is available as part of the online article from http://www.scielo.br/gmb.

Associate Editor: Luís Carlos de Souza Ferreira
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.