Extinction and emergence of genomic haplotypes during the evolution of Avian coronavirus in chicken embryos

Abstract Avian coronavirus (AvCoV) is ubiquitously present on poultry as a multitude of virus lineages. Studies on AvCoV phenotypic traits are dependent on the isolation of field strains in chicken embryonated eggs, but the mutant spectrum on each isolate is not considered. This manuscript reports the previously unknown HTS (high throughput sequencing)-based complete genome haplotyping of AvCoV isolates after passages of two field strains in chicken embryonated eggs. For the first and third passages of strain 23/2013, virus loads were 6.699 log copies/ μL and 6 log copies/ μL and, for 38/2013, 5.699 log copies/μL and 2.699 log copies/μL of reaction, respectively. The first passage of strain 23/2013 contained no variant haplotype, while, for the third passage, five putative variant haplotypes were found, with > 99.9% full genome identity with each other and with the dominant genome. Regarding strain 38/2013, five variant haplotypes were found for the first passage, with > 99.9% full genome identity with each other and with the dominant genome, and a single variant haplotype was found. Extinction and emergence of haplotypes with polymorphisms in genes involved in receptor binding and regulation of RNA synthesis were observed, suggesting that phenotypic traits of AvCoV isolates are a result of their mutant spectrum.


Introduction
The evolution of RNA viruses must be assessed from the perspective of the quasispecies, defined as population of genomes linked through mutation with each other and with the genome of higher frequency, i.e., the dominant sequence, all of the variant and the dominant genomes resulting in a mutant spectrum that interacts at the functional level and works as a selection unit (Lauring and Andino, 2009). Structurally, each individual genome in the spectrum is defined by a collection of alleles in different sites occurring on the same individual or genome, which is known as the haplotype, including the variant and the dominant haplotypes.
The viral species Avian coronavirus or AvCoV (Nidovirales: Coronaviridae: Coronavirinae: Gammacoronavirus) is an enveloped virus of 120 nm in diameter. Its positive single-stranded RNA of circa 27 kb in length is subjected to an evolutionary rate (substitutions/site/year) of a 10 -3 magnitude (Marandino et al., 2015), which supports the radiation of AvCoV in six genotypes and 34 lineages (Valastro et al., 2016;Jiang et al., 2017).
AvCoV lineages show a variable pathogenicity and tropism and are the cause for the acute and chronic presentations of the highly contagious disease of poultry avian infectious bronchitis in its natural host Gallus gallus (Cook et al., 2012). The control of this disease is based on a plethora of vaccine strains produced in embryonated chicken eggs.
The 5' 2/3 of AvCoV genome code for the replicase complex is composed by non-structural proteins 2 to 16 arranged in ORF1ab, followed by genes for the spike envelope glycoprotein S (with the S1 and S2 domains), envelope E, membrane M and nucleocapsid N. Accessory proteins 3a,b genes are located upstream of E and 5a,b, and protein X genes are upstream of N genes, respectively. Untranslated regions (UTRs) are found at both the 5' and 3' ends of the genome (Cavanagh, 2007;Gomaa et al., 2008).
The isolation of field strains of AvCoV in chicken embryonated eggs is a preliminary procedure for a series of downstream applications, such as diagnosis of avian infec-tious bronchitis, virus attenuation for vaccine manufacture, production of challenge virus for vaccine trials, pathogenicity and tropism assays, and complete genome sequencing. Nonetheless, the mutant spectra of isolates are not considered in most studies, and the phenotypic traits of AvCoV strains are explained based solely on the dominant sequences.
AvCoV quasispecies has been tentatively assessed using molecular cloning of partial or complete genes, followed by Sanger sequencing and comparison of polymorphic sites among sequences, or visual checking for multiple peaks in chromatograms without cloning (van Santen, 2008;Gallardo et al., 2010;Saraiva et al., 2018). However, these approaches are less sensitive for the detection of variant states of nucleotide sites when compared to HTS (High Throughput Sequencing) (Gregori et al., 2014) and, more importantly, do not allow for the prediction of alleles that co-occur on a same segment or complete genome, as haplotyping can do.
Considering the absence to date of reports on full genome haplotypes for coronaviruses based on HTS and the need for a more complete characterization of AvCoV isolates in chicken embryos, this study was designed to (a) describe the mutant spectra for AvCoV strains based on full genomes haplotypes assembly, and (b) assess the mutant spectra after three passages in chicken embryos.

Materials and Methods
Origin and isolation of AvCoV strains Strains GammaCoV/AvCoV/Gallus gallus/Brazil/23/2013 and GammaCoV/AvCoV/Gallus gallus/Brazil/38/2013 (from now on referred to simply as 23/2013 and 38/2013) were obtained from kidneys and cecal tonsils, respectively, collected in 2013 from chickens from two different farms in Brazil. AvCoV PCR screening and Sangersequencing spike typing done directly on pools of these organs were performed as in Cavanagh et al. (2002) and Worthington et al. (2008), respectively. The two strains were typed as GI-11 according to the classification in Valastro et al. (2016).
Suspensions of the respective field samples were submitted to isolation in embryonated SPF chicken eggs via inoculation on the allantoic route, and the second and third passages were prepared with the allantoic fluid from the previous passage. All passages were monitored for AvCoV, with the PCR described by Cavanagh et al. (2002).
The first (P1) and third (P3) passages were selected for the downstream analyses because three passages are recommended for AvCoV isolation (OIE, 2018), and the first passage would provide a truer picture of AvCoV sequence data as compared to the original chicken sample to assess the closest-to-wild AvCoV populations.
The study was approved by the Ethics Committee on the Use of Animals of the School of Veterinary Medicine, University of São Paulo under the register #2174/2011.

Determination of virus load
The number of AvCoV genome copies in allantoic fluids harvested up to 72 h post-inoculation of P1 and P3 of strains 23/2013 and 2013/38 was obtained with a quantitative RT-PCR (qRT-PCR) based on SYBR green detection system, with primers reported by Callison et al. (2006) for the 5'UTR, with absolute genome quantifications/ mL cDNA obtained by comparison with a ten-fold dilution standard curve, ranging from 10 1 to 10 7 copies of a plasmid containing the target. All reactions were run in duplicate.

Full genomes High Throughput Sequencing (HTS)
Allantoic fluids (strains 23/2013 and 38/2013, P1 and P3) were clarified by centrifugation at 16,000 x g for 3 min at4 ºC, filtered through 0.45 mm syringe filters, and treated with DNase-free RNase. Total RNA was then extracted with Trizol Reagent (Life Technologies) and RNeasy Mini kit (Qiagen), and used with Superscript III and Klenow exo-DNA polymerase (Life Technologies) to obtain random ds-cDNAs.
Libraries and sequencing kits were Nextera XT Index and Nextera XT DNA (Illumina), and reads were obtained with a NextSeq500 (Illumina) sequencer using the NextSeq500 Mid output v2 kit (2 x 150 bp). Read quality was assessed with FASTQC, and full genomes were assembled with CLC Genomics Workbench v. 11.0.1 (Qiagen), with the reference-mapping approach.

Haplotype assembly
Paired and trimmed reads were clustered by Bayesian inference to assemble the putative global haplotypes for 23/2013 and 38/2013 P1 and P3 using PredictHaplo 1.0 (Prabhakaran et al., 2014), with the respective dominant genome as reference and the following parameters: max reads in window 10,000; entropy threshold 4e -2 ; min mapping qual 30; min read length 64; max gap fraction 0:05 (relative to alignment length); min align score fraction 0:35 (relative to read length); alpha MN local 25; min overlap factor 0.85; local window size factor 0.7; max number of clusters 25; MCMC iterations 501; and deletions inclusion.
The number of polymorphic sites between passages and among the different coding regions was compared with the Fisher's Exact Test at a significance level of 0.05 using the Easy Fisher Exact Test Calculator.

Genealogical analysis
A Maximum Likelihood nucleotide tree with the TN model was built with MEGA X software (Kumar et al., 2018), using 100 bootstrap replicates and 5 gamma categories for the full genomes of variant haplotypes and dominant sequences.

Recombination analysis of haplotypes
Within-passages recombination analysis was run for both P1 and P3 of the two AvCoV strains dominant and variant haplotypes full genomes using RDP4 (Martin et al., 2015) with the methods RDP, Geneconv, Chimaera, MaxChi, BootScan, SiScan, 3Seq, and LARD, with minimum p-value = 0.05.

Dominant complete genomes
P3 complete genomes of 23/2013 and 38/2013 can be found under the GenBank Accession numbers KX258195.1 and MG913342; 23/2013 P3 genome had been published already (Brandão et al., 2016); first passages and variant haplotypes sequences were not submitted to the GenBank to avoid redundancy.
Brandão et al. Dominant  H1  H2  H3  H4 H5 aa 0.00001) in X, IG (intergenic region between M and 5a), 5a and 5b, and with further polymorphic sites in the 5'UTR, nsps3, 14-16, S, 3a, and N. Premature stop codons were in 3a in five and 5a in four variant haplotypes, while only three polymorphic sites, all in the 3'UTR, were found for P3, with a significant (p < 0.00001) drop in the number of variable sites between these two passages. For 23/2013 P3, eight polymorphic positions were found among the five variant haplotypes (nsps 5-6 and 9, S, E and 3'UTR), with a significant higher number in nsp9 only (p < 0.00001), with no premature stop codons.

Position in dominant
A genealogy of variant haplotypes and dominant sequences for 23/2013 and 38/2013 P1 and P3 is displayed in Figure 1, which shows the segregation in two main strainspecific clusters.
No recombination was found for 23/2013 P3 an 38/2013 P1 haplotypes. Recombination analysis for 23/2013 P1 and 38/2013 P3 was not possible, as the number of sequences to be analyzed was only one and two, respectively.

Discussion
For both AvCoV strains used in this study, virus loads decreased from the first to the third passage, but a more marked drop was evident for 38/2013, as a 3-log decrease was found from P1 to P3, while, for 23/2013, the decrease was of 0.699 log.
The genomic nucleotide identities between P1 and P3 dominant sequences was high (> 99.9%) for both strains, with a unique substitution in both cases. While for 38/2013 the substitution occurred in the 3'UTR, for 23/2013 a nonsynonymous substitution led to a change of a N to a K, both hydrophilic amino acids, in residue 244 of the spike gene, within the 253 N-terminal amino acids of the spike glyco-protein. This had already shown to be required for receptor binding (Promkuntod et al., 2014).
A more complex evolutionary pattern was found, however, after haplotype analysis, as, for instance, all of the 13 S gene polymorphisms present in the five 38/2013 P1 variant haplotypes (Table 1) have been extinguished on P3 and a single mutation (A732T/K244N, position 21039 regarding the genome) emerged on variant haplotypes H2-3 and 5 of 23/2013 P3, preserving the state found in P1 dominant sequence.
Mutations on the S1 subunit of spike glycoprotein gene have already been described as being related to adaptation of AvCoV during passages from chickens to em-AvCoV quasispecies evolution 5 Position in dominant  Dominant  H1  H2  H3  H4  H5  aa   26412  T  C  C  C  C  C  syn175   26911  A  T  T  T  T  T  T342S 3'UTR 27333 bryos and back (Cavanagh et al., 2005;Geerligs et al., 2011;Leyson et al., 2016). The a2,3-linked sialic acid has been shown to be the cell receptor for the spike glycoprotein of AvCoV (Winter et al., 2006) and is present on the chorioallantoic membranes (CAM) of chicken embryos, while the a2,6-linked arrangement is not (Ito et al., 1997).
The presence of a2,6-linked sialic, lacking on CAM but co-existing with the a2,3 arrangement in adult chickens, has been suggested to hava a role in the selection of amino acid substitutions during serial passages of AvCoV from embryos to chickens (Leyson et al., 2016). Thus, the low diversity on spike genes at passage 3 might have been a consequence of both purifying (regarding strain 38/2013) and positive selection (in the case of the single variant haplotype and the N244K found on 23/2013 P3), leading to fine tuning of the affinity of the mutant spectra to CAM a2,3-linked sialic acid. As the receptor type is the main difference between embryos and adult chickens, receptordriven variant haplotypes selection might be a major force for AvCoV mutant spectrum evolution during isolation in egg, as attachment is the primary step for the intracellular virus life cycle.
Though the 5' and 3'UTRs have a role on RNA transcription and replication, all SNPs on the variant haplotypes and dominant sequences occurred outside the conserved octamer described for AvCoV (nt positions 27,527-27,534) and of the stem-loop like motif (sm2, nt 27,471-27,511) involved in RNA replication, both on the 3'UTR, and out of the leader sequence region involved in subgenomic RNA synthesis on the5'UTR (Hsue and Masters, 1997;Masters, 2006). This makes any gain or lossof-function unlikely in this case, favoring genetic drift as an evolutionary force, similarly to the observed for the also non-coding intergenic (IG) region between M and 5a that also accumulated several SNPs in P1 only of 38/2013 (n = 17) in all five variant haplotypes.
Variant haplotypes H1-H5 of 38/2013 P1 accumulated SNPs on nsps in ORF1ab that were extinguished in P3, while SNPs in these areas emerged in four of the five variant haplotypes in 23/2013 P3 (Table 1). Considering the role of these nsps in RNA replication and transcription, and that mutations on the replicase genes have already been shown to occur after serial passages in embryos and are related to attenuation of virulent strains of AvCoV (Ziebuhr and Snijder, 2007;Albanese et al., 2018), these mutations could also have resulted in an initial degree of attenuation of these two strains at passage 3.
Proteins 3a and 5a, for which premature stop codons were found for variant haplotypes H2-5 and 1-5, respectively, of 38/2013 P1, were reported as nonessential for AvCoV replication, though a lower virulence has been reported after the deletion of accessory genes 3 and 5 (Laconi et al., 2018). Thus, a truncated 3a and 5a and SNPs in 5b, also found in all variant haplotypes of 38/2013 P1, would not abolish the replication of these haplotypes but could account for the decrease in virus loads. ORF X, also called 4b, has been shown to code for an accessory protein of 11kDa (Bentley et al., 2013), but, as all polymorphisms on this ORF have experienced extinction between P1 and P3 of 38/2013, it is under a selection regime similar to that of a structural protein, like the spike glycoprotein. Hence, an essential role for X on the AvCoV life cycle should be further investigated.
A unique SNP on the E protein gene leading to a V99F amino acid substitution emerged on three variant haplotypes at passage 3 of 23/2013 (Table 1). E is a core protein, as well as M, for the assembly of the virion (Masters et al., 2006), and is thus expected to present low polymorphism due to purifying selection.
The nucleocapsid protein N is subjected to a more intense selection pressure as a result of stronger structural constraints, due to its intricated conformational and charge-based interactions with the genomic RNA on the nucleocapsid and interaction with nsp3 on the cytoplasm during RNA replication (Masters, 2006;Hurst et al., 2013). This evolutionary pattern could explain the low (two nonsynonymous and one synonymous) number of polymorphisms found for this gene, detected exclusively on haplotypes H1-H5 of 38/2013 P1.
The evolution of 23/2013 and 38/2013 AvCoV strains after three passages in chicken embryos did not result in shared nucleotides states that could be considered as attenuation markers. This is also illustrated by the absence of convergent evolution as noticed in Figure 1, as no haplotypes crossed the strain-specific clades limit at passage 3.
Significant AvCoV quasispecies differences occur in a same chicken during infection regarding the diverse organs in which the virus replicates (van Santen and Toro, 2008;Gallardo et al., 2010). Hence, differences in starting mutant spectra populations resulting from different origins for the two strains used in this study, i.e, kidneys (23/2013) and cecal tonsils (38/2013), as well as in virus loads present on the starting inocula, before P1 (not measured in this experiment), could account for the different evolutionary paths, resulting in non-coincident polymorphic sites.
Applying Sanger sequencing to complete or partial genes to assess diversity based on molecular cloning or chromatograms and even NGSs complete genomes variant analyses are deeply reductionist approaches compared to complete genome haplotype analyses, which enable a more accurate characterization the mutant spectra. Understanding the mutant spectra must be considered not only for studies on AvCoV evolution, but also those focused on the virus pathogenesis, vaccinology and diagnosis.
As a conclusion, the evolution of AvCoV haplotypes in chicken embryos after a low passage number, as used in isolation routine, results in a reduced fitness with a lowered diversity essentially in genes that regulate virus RNA synthesis and attachment to cell receptors.