Genomic analysis on Brazilian strains of Anaplasma marginale

Anaplasma marginale is a vector-borne pathogen that causes a disease known as anaplasmosis. No sequenced genomes of Brazilian strains are yet available. The aim of this work was to compare whole genomes of Brazilian strains of A. marginale (Palmeira and Jaboticabal) with genomes of strains from other regions (USA and Australia strains). Genome sequencing of Brazilian strains was performed by means of next-generation sequencing. Reads were mapped using the genome of the Florida strain of A. marginale as a reference sequence. Single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) were identified. The data showed that two Brazilian strains grouped together in one particular clade, which grouped in a larger American group together with North American strains. Moreover, some important differences in surface proteins between the two Brazilian isolates can be discerned. These results shed light on the evolutionary history of A. marginale and provide the first genome information on South American isolates. Assessing the genome sequences of strains from different regions is essential for increasing knowledge of the pan-genome of this bacteria.


Introduction
Anaplasma marginale is an obligate intracellular, vector-borne pathogen belonging to the order Rickettsiales, family Anaplasmataceae, which causes a disease known as anaplasmosis. Bovine anaplasmosis occurs in tropical and subtropical areas throughout the world, and this disease is a major constraint on cattle production in several countries (Kocan et al., 2010(Kocan et al., , 2015. Previous research reported differences in virulence among A. marginale strains (da Silva et al., 2016;Machado et al., 2015;Silva et al., 2015), which indicates that there is a need for better understanding of the genetic mechanisms involved in A. marginale infection and the pathogenesis of anaplasmosis.
Availability of genome sequences obtained through next-generation sequencing (NGS) has revolutionized the field of infectious disease research. The plethora of data not only has enabled advances in fundamental biology, thereby helping to unveil the pathogenesis of infections and the genomic evolution of microorganisms, but also has contributed to advances in clinical microbiology (Fournier et al., 2014).
Members of the order Rickettsiales are small, obligate intracellular bacteria (Dumler et al., 2001) that typically have small genomes, which is attributed to reductive evolution following long-term intracellular parasitism (Andersson et al., 1998;Brayton et al., 2005;Ogata et al., 2001;Wu et al., 2004). The first genome of A. marginale was published in 2005 and showed that the surface coat is composed by many surface proteins, which includes, in addition to several outer membrane proteins, two gene families encoding immunodominant proteins, the major surface protein 1 (msp1) and major surface protein 2 (msp2) superfamilies. Out of the 949 annotated coding sequences, just 62 are predicted to be outer membrane proteins and, of these, 49 belong to one of these two superfamilies.
The genome contains unusual functional pseudogenes that belong to the msp2 superfamily and are involved in surface coat antigenic variation. This differs from pseudogenes that are described as byproducts of reductive evolution in other members of Rickettsiales (Brayton et al., 2005). Comparisons within North American strains have revealed that A. marginale has a closed-core genome with a few highly plastic regions. These include the msp2 and msp3 genes and the appendage-associated protein (aaap) locus (Dark et al., 2009).
Analysis on multiple A. marginale strains has suggested that intracellular bacteria have more variable single nucleotide polymorphism (SNP) retention rates than previously reported. Moreover, they may have closedcore genomes that have developed in response to the host organism environment and/or reductive evolution (Dark et al., 2009). Recently, phylogenetic analyses on two Australian strains revealed that they presented a marked evolutionary distance from North American strains. Single nucleotide polymorphism analysis showed strikingly reduced genetic diversity among the Australian strains, with the smallest number of SNPs detected between two A. marginale strains (Pierlé et al., 2014).
No South American genomes for A. marginale have yet become available. In fact, genomic information is only available for strains from North America and Australia. Regarding Brazilian A. marginale strains, analyses on genetic diversity have been based on molecular markers such as the msp4 and msp1α genes (Baêta et al., 2015;de la Fuente et al., 2002de la Fuente et al., , 2003de la Fuente et al., , 2004Pohl et al., 2013;Silva et al., 2014a,b;Vidotto et al., 2006). Information about A. marginale in southern Brazil is even scarcer. It is noteworthy that cattle tick fever (a popular expression that in Brazil usually refers both to cattle babesiosis and/or anaplamosis) is the most significant cause of cattle death in this region. Indeed, the climate, vector epidemiology and cattle breeding systems of this region are markedly different from those of the rest of the country.
In this context, the aim of the present study was to characterize and analyze the first genome sequences of Brazilian strains of A. marginale, with comparison of two strains from different Brazilian regions (south and southeast). In addition, the sequences obtained from the Brazilian strains were compared with the genomes of strains from other countries in order to contribute to global and local knowledge of bovine anaplasmosis.

Anaplasma strains
Two strains of A. marginale were evaluated in this work. The A. marginale strain Palmeira was isolated in the early 2000s, from Holstein dairy cattle that showed clinical anaplasmosis from the municipality of Palmeira das Missões, state of Rio Grande do Sul, southern Brazil (27° 54' 09" S; 53° 18' 26" W). The Jaboticabal strain was isolated from cattle in the municipality of Jaboticabal, São Paulo state, southeast Brazil (21° 15' 18" S; 48° 19' 20" W). After isolation, both strains were cryopreserved with 10% dimethyl sulfoxide (DMSO) in liquid nitrogen. The Palmeira strain was isolated from a region where there had been massive use of a heterologous live vaccine based on Anaplasma centrale. Indeed, after clinical cases of anaplasmosis caused by the Palmeira strain had been identified, occurrences of novel anaplasmosis outbreaks on the ranch were avoided by vaccinating the cattle with A. centrale. On the other hand, the Jaboticabal strain was originally isolated from a region where the A. centrale live vaccine is rarely used. There is no evidence that any of two studied strains is tick transmissible. It is important to note that in Brazil, the only species with economic and epidemiological importance as tick of cattle is Rhipicephalus microplus. And, in this sense, to date, there is no clear/unquestionable evidence in the literature on A. marginale transovarial transmission in R. microplus ticks (Esteves et al., 2015).
DNA isolation and library construction DNA was isolated from cryopreserved blood using the PureLink genomic DNA mini-kit (Invitrogen™, Carlsbad, CA, USA), following the manufacturer's recommendations. The DNA samples were quantified using NanoDrop (Thermo Scientific, Waltham, MA, USA) and Qubit (Life Technologies™, Carlsbad, CA, USA), and DNA integrity was checked by means of agarose gel electrophoresis. For library synthesis, the DNA was cleaved using transposase (Nextera DNA sample preparation kit; Illumina®, San Diego, CA, USA) and purified using a commercial kit (Qiaquick PCR purification kit; Qiagen, Hilden, Germany). After that, the samples were indexed and ligated to Illumina-specific adapters, which, respectively, allow post-sequencing sample identification and linkage to the flow cell (Nextera index kit; Illumina®, San Diego, CA, USA). The libraries were amplified by means of the polymerase chain reaction (PCR) and then purified (AMPure XP; Beckman Coulter Inc, Brea, CA, USA) and quantified by means of real-time PCR (RT-PCR), using SYBR green and PhiX as the control (PhiX control kit v3; Illumina®, San Diego, CA, USA).

Genome sequencing, assembly and annotation
Genome sequencing was performed in the MiSeq desktop sequencer (Illumina®, San Diego, CA, USA) using the Miseq reagent kit v2 and v3. Raw fastq files were mapped using the Geneious 9.1.2 software (Biomatters Ltd., Auckland, New Zealand) (Kearse et al., 2012;Ripma et al., 2014). The A. marginale Florida strain genome (USA origin) (GenBank reference: NC012026) was used as a reference for assembling the genomes of the Palmeira and Jaboticabal strains. The reads were assembled into a single pseudochromosome against the genome of the reference strain. Coding sequence (CDS) annotation and circular genome maps were made using the Geneious software. Only gene sequences of similarity higher than 95% in relation to the St. Maries reference strain were annotated.

Sequence analyses
In order to allow identification of SNPs and insertions/deletions (INDELs), the A. marginale Palmeira and Jaboticabal strain sequences were aligned with the genome of the Florida strain (Reference) using the LASTZ alignment tool, version 1.02.00 (Harris, 2007;Schwartz et al., 2003), from the Geneious software. SNPs were identified among selected sequences using the Find Variations/SNPs tool from the Geneious software. For the A. marginale Palmeira and Jaboticabal strains, the SNP calling procedure was performed independently for each reference strain. The data generated from the SNP analyses were exported to a Microsoft Excel spreadsheet (Microsoft, Redmond, WA, USA) to build a Venn diagram. The INDEL analyses were also performed using the Find Variations tool from the Geneious software, to search for INDELs. For phylogenomic analysis, genome sequences were analyzed by means of the maximum likelihood algorithm, using the PhyML method with the Seaview 4 software (Gouy et al., 2010).

Results and Discussion
High genome coverages were obtained for the A. marginale Palmeira and Jaboticabal strains, of 28.1 and 49.6-fold, respectively. A total of 317,067 paired-end reads were obtained, and the fragments ranged from 35 to 301 bp (49.8% G/C content) for the Jaboticabal strain. For the Palmeira strain, 141,491 paired-end reads were obtained, and the fragments ranged from 35 to 251 bp (49.8% G/C content). The Palmeira and Jaboticabal strains were aligned using the genome sequence of the Florida strain to produce single contiguous pseudochromosomes of 1,195,100 and 1,195,221 nucleotides, respectively ( Figure 1). The annotation of these genomes resulted in a total of 894 CDs in the Palmeira strain and 888 CDs in the Jaboticabal strain. FASTQ files containing raw sequences and sequence qualities of the Palmeira and Jaboticabal strains were deposited at the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) under the accession number SRP091646, and Bio Project Accession number PRJNA348690. Genome sequences of the Palmeira and Jaboticabal strains were deposited at the genome databank of NCBI under the accession numbers CP023730.1 and CP023731.1, respectively.
Pairwise comparison of the genomic sequences revealed symmetric identity of 98.5% between the two Brazilian strains, using NCBI genome neighbor reports. Table 1 shows a summary of symmetric identity among Brazilian isolates and some available A. marginale genomes. As expected, the Brazilian strains showed higher identity to each other than to isolates from other countries. Phylogenomic analysis grouped the Palmeira and Jaboticabal strains in one particular clade near to the North American Florida strain, and to the North American St. Maries strain. On the other hand, the Brazilian strains were separate from the Australian strains (Gypsy Plains and Dawn) (Figure 2).  Whole-genome comparisons of A. marginale strains identified 27,203 SNPs in Palmeira and/or Jaboticabal, in comparison with the Florida strain. A total of 5,952 SNPs were unique to the Palmeira strain, while 6,714 SNPs were unique to the Jaboticabal strain, and 14,537 SNPs were common to the two Brazilian strains (see Venn diagram in Figure 3). In comparing the Dawn and Gypsy Plains strains from Australia with the North American St. Maries strain, Pierlé et al. (2014) found 9,813 SNPs that were common to the two Australian strains, while only 97 and 98 of these SNPs were unique to Gypsy Plains and Dawn, respectively. Single nucleotide polymorphism comparison of the Florida strain with four other strains from North America (Puerto Rico, Virginia, Mississippi and St. Maries) revealed the presence of a total of 20,028 sites with SNPs in at least one of the strains; there were 9,609 SNPs in common between the Florida and St. Maries strains, comprising 0.8% of the larger Florida genome (Dark et al., 2009).
Regarding INDEL analysis, comparison of the genomes of the Brazilian strains showed the presence of 3,043 INDELs, encompassing 11,899 bp, considering Florida as the reference strain. There was just one large INDEL region in the Palmeira and Jaboticabal strains, compared with the reference strain. Deletions of 141 bp and 19 bp in the omp9 gene sequence were found in the Palmeira and Jaboticabal strains, respectively.
A large deletion in the omp9 gene had previously been reported for the Australian Dawn and Gypsy Plains strains (Pierlé et al., 2014). Indeed, the Australian Dawn strain also showed large deletions in the omp8 gene (327 bp) and AM415 gene (1,194 bp). Pierlé et al. (2014) suggested that the low virulence of the Dawn strain could be related to these genomic deletions. The observation of omp9 deletions in Brazilian strains suggests that omp9 size variability may be more common than previously assumed and that this does not constitute a geographical signature of the Australian strains. Moreover, the distinct size of the omp9 deletion among different strains might be associated with a highly polymorphic region.  Based on gene transcription level, it is plausible to hypothesize that OMP7, OMP8 and OMP9 proteins seem to be related to the stages of A. marginale in its vertebrate-host, mainly in persistently infected cattle (Noh et al., 2006). These genes are seen to have low transcription levels in tick cell cultures, in contrast to cultures performed in vertebrate cells. In addition, knockout of omp10, which also caused a reduction of omp9 expression, did not impair A. marginale viability in the ISE6 tick cell lineage. Indeed, it seems that membrane proteins of A. marginale from erythrocytes form a large complex constituted by OMP7, OMP8 and OMP9, arranged with MSP2, MSP3, MSP4, OMP1, OPAG2, AM779, AM780, AM1011, AM854 and VirB1 (Noh et al., 2008). Meanwhile, the surface protein complex of A. marginale from tick cells was formed only by MSP2, MSP3, MSP4, AM778 and AM854. It is important to state that most of these results were obtained using in vitro model, and may not represent exactly what happens in the naturally infected organisms.
In this regard, this complex of surface proteins seems to play a key role in the antigenic/immunological response against A. marginale. Noh et al. (2013) showed that the antibody response induced by the linked surface complex was much greater than that induced by a pool of the proteins in non-assembled form. Since the host response to membrane antigens is strictly dependent on the composition of the surface complex, it is reasonable to hypothesize that the lack/deletion of one molecule could affect the immunological response of other surface proteins. Thus, as omp9 has been considered a vaccine candidate, and it is highly recognized by immune serum of cattle immunized with outer membrane fraction; it is possible to hypothesize that changes in gene sequence (or deletions) may affect the interaction of A. marginale and host immune system (Deringer et al., 2017). Hence, further studies involving omp9 genotyping in other A. marginale strains seem to be essential for improving the epidemiological data and elucidating the roles of this gene in infection.
The most studied genes/proteins of A. marginale are membrane proteins, particularly MSPs. They belong to a particular group of outer membrane proteins widely studied because their importance as immunodominant antigens. The proteins coded by the msp1 and msp2 superfamilies represent a significant proportion of the molecules expected on the surface of the organism. The MSP1, MSP2 and MSP3 proteins are immunodominant molecules to which most of the host immune response is targeted (Alleman et al., 1997;Barbet et al., 1987;Blouin et al., 2003;Brown et al., 1998Brown et al., , 2001aBrown et al., ,b, 2003Kocan et al., 2001;Oberle et al., 1988;Palmer et al., 1994).
Anaplasmosis persistent infection is a phenomenon in which A. marginale remains viable for long-term in cattle even after its clinical recovery. Carrier cattle maintain low-level of A. marginale, and may serve as a reservoir for vector infection. Also, this asymptomatic form, eventually, could be reverted to a clinical presentation depending on the host immune status (Kieser et al., 1990). It is plausible that antigenic variation on major immunodominant proteins helps the parasite in this process of persistent survival in an immunocompetent host. In this sense, antigenic variation of MSP2, MSP3 and other MSP proteins may be part of this process. Brayton et al. (2003) demonstrated that simultaneous switching of MSP2 and MSP3 variants occurs during infection. Moreover, the ability of these two molecules to work in concert may serve to amplify the antigenic diversity of the surface coat, thus enabling the microorganism to survive the host immune response (Brayton et al., 2005). A summary of the 147 SNPs found in the membrane protein genes, with comparison between the Brazilian strains, is shown in Table 2. Details of the INDELs identified in genes from surface proteins can be found in Table 3.
The msp2 superfamily includes the members of the msp2, msp3 and msp4 gene families. The genome contains one full-length expression site for these genes. In addition, there are seven msp2 and seven msp3 functional pseudogenes in the St. Maries strain (Brayton et al., 2005). The MSP1 protein is a surface-exposed heteromeric complex consisting of MSP1a and MSP1b. msp1α is a single-copy gene and exhibits differences among strains caused by variable numbers and sequences of tandem repeat units of 86-89 bp in length . MSP1b is encoded by a small multigene family of five genes, consisting of two full-length and three partial versions (pg) (Brayton et al., 2005). The genes encoding MSP1a-like proteins 2 and 3 (mlp2 and mlp3) showed four and five SNPs, respectively. Six SNPs were found in the msp2 operon-associated gene 3 (opag3), three SNPs in the membrane protein terC and four in the outer membrane protein tolC ( Table 2). Regarding the omp8 gene, there was a single nucleotide insertion causing a frameshift in the Palmeira strain (Table 3).
Polymorphisms found among strains from different continents can be related to different selection pressures, such as differences in management, treatment, prevention, host genetic constitution and arthropod vectors. While in Brazil the vectors are presumed to be primarily insects of the order Diptera and, also possibly, males of the cattle tick Rhipicephalus microplus (Estrada-Peña et al., 2006); in Australia the vector species is presumed to be Rhipicephalus australis (Burger et al., 2014); and in the United States the major vectors are the ticks Dermacentor andersoni, Dermacentor variabilis, and Dermacentor albipictus (Kocan et al., 2008). Importantly, previous studies have shown that the Jaboticabal strain is not transmitted transovarially in R. microplus ticks (Esteves et al., 2015). Moreover, widespread use of the live attenuated vaccine for A. centrale has been reported in Australia (Herndon et al., 2013) and southern Brazil, but not in the United States or in southeastern Brazil.
This paper presents the first genomes of the South American strains of A. marginale. These results suggest, considering the genomic information available at this moment, that Brazilian strains constitute one particular clade that is grouped in a larger American group together with North American strains, and is more distantly related to the  Australian strains. These data contribute to better understanding of the evolutionary relationships of A. marginale strains. In addition, they show that there is a need for further studies on the genetic variability of other A. marginale strains, both in South America and in other regions, to obtain fuller epidemiological and pan-genome understanding.