Causal variant loci and protein-coding genes for soybean bacterial pustule resistance in the flowering stage

Bacterial pustule is an important soybean disease caused by Xanthomonas citri pv. glycines, but information about genetic resistance to this pathogen is scarce. This study aimed to investigate soybean genetic resistance to bacterial pustule in the flowering stage through association analysis, characterization of candidate SNP markers, and identification of protein-coding genes potentially regulating defense processes. Therefore, 109 soybean cultivars genotyped with a 6k Illumina platform were assessed for disease severity. A genome-wide analysis revealed a total of 13 SNPs significantly associated. Through protein annotation, we identified three markers located inside the coding regions of uncharacterized protein LOC100779077, histone-lysine Nmethyltransferase SUVR4, and ABC-transporter B family member-9. Nucleotide polymorphism on the first two of these markers produces a non-synonymous polymorphism with polarity shift from hydrophobic to polar amino acid. It is convenient to prioritize these three candidate markers for validation procedures with the purpose of using them in marker-assisted soybean breeding programs.


INTRODUCTION
Bacterial pustule caused by the gram-negative phytobacterium Xanthomonas citri pv. glycines is a common disease infecting soybean cultivars (Goradia et al. 2009). Small yellowish halos are the symptoms produced by its infection, which coalesce with disease progression, leading to necrosis and premature loss of soybean leaves (Park et al. 2008). Severity as high as 70% occurs under certain environmental conditions, such as high temperature and humidity, which favor colonization and pathogen development (Zinsou et al. 2015).
The recessive resistance gene rxp, located in soybean chromosome 17 and linkage group D2 (Narvel 2001), is the main studied source of resistance to Xanthomonas citri pv. glycines. Several studies have shown that resistance to bacterial pustule is a complex trait, for example, the study of glyphosate-resistant soybean and characterization of bacterial pustule aggressiveness conducted by Goradia et al. (2009). Therefore, the identification of genes and the SNP haplotypes controlling the trait is of interest to be introduced in elite breeding lines used in crop breeding. The genome-wide association study (GWAS), based on the linkage disequilibrium of the phenotype with an SNP marker in a great PC Fonseca et al. number of individuals (Uffelmann et al. 2021), is a statistical tool that allows the identification of significant singlebase polymorphisms (SNPs), which can provide information about disequilibrium blocks, proximal genes and QTLs, for example. Numerous studies have been conducted using GWAS for different pathosystems, such as Phytophthora root and stem rot (Ludke et al. 2019) and cyst nematode (Vuong et al. 2015) in soybean.
An increased disease incidence has been observed in cultivars considered resistant, mostly in the flowering stage of the soybean development. Because the plant developmental stage can actively influence specific or broad-spectrum resistance, such a factor is crucial in our understanding of the genetic basis of plant-pathogen interactions and plant defense mechanisms (Develey-Rivière and Galiana 2007). This study aimed to investigate soybean genetic resistance to bacterial pustule in the flowering stage through the phenotyping of cultivars with recommended strains and an association analysis, followed by the characterization of associated SNPs and indication of nearby protein-coding genes of likely great importance for the development of resistant cultivars in a breeding program.

Location and experimental design
A total of 109 soybean elite cultivars from various Brazilian breeding programs were sown in 200 mL plastic cups filled with substrate Tropstrato HT Hortaliças (Vida Verde, Mogi Mirim/SP, Brazil) and placed in a greenhouse located at the Diogo Alves de Melo Experimental Field, Viçosa, MG (lat 20° 45' S, long 42° 52' W, alt 648 m asl). The trial was arranged in a completely randomized design from May to July 2018, with three replications of each soybean genotype for each strain. Greenhouse conditions were kept with about 70% of humidity and 28 °C.

Bacterial strains, inoculation and evaluation
Strains XAG2440 p (pathovar) and XAG2447 of Xanthomonas citri pv. glycines were obtained from the Instituto Biológico de Campinas. They were cultured in 523 medium plates (Kado and Heskett 1970) at 28 °C for 48h and suspended on a 10mM MgCl 2 solution until OD 600 of 0.3 prior to inoculation. Phenotyping trials were independent for each strain and the same inoculation protocols were applied. A humid chamber was simulated inside the greenhouse from 24h prior to inoculation to 24h after inoculation, to achieve higher humidity levels, which favored soybean colonization by the pathogen. Inoculation was conducted in the soybean flowering stage R3-R4 (Fehr and Caviness 1977). A 2.9 cm x 0.65 mm needle was used for the perforation of 28 spots per leaflet, and 3.5 mL of inoculum suspension was pulverized on each plant with the aid of an atomizer. The evaluation of symptoms was proceeded with assignment of scores ranging from 1 to 5 (Resistant-1; Moderately resistant-2; Moderately susceptible-3; Susceptible-4; Highly susceptible-5), as described by Goradia et al. (2009), nine days after inoculation.

Genotyping and association analysis
All 109 cultivars were genotyped by 6,000 SNPs. The data were obtained with the use of SoySNP6K BeadChip (Akond et al. 2013), a high-throughput genotyping platform by Illumina (Illumina Inc., San Diego, USA). The genotyping was run by Deoxi Biotechnology Ltda®, in Araçatuba, São Paulo, Brazil, and the same data set was used in other studies, including GWAS for agronomic traits (Contreras-Soto et al. 2017). Tassel 5 software (Bradbury et al. 2007) was used for data processing, with a maximum of 10% of missing genotypes, and minor allele frequency higher than 5%.
The genotypic values in soybean cultivars used in this study were predicted by the Linear Mixed Model with Software Selegen-REML/BLUP (Resende 2016). Its results were applied in a genome-wide association analysis through the Genome Association Prediction Tool (Lipka et al. 2012), GAPIT package, in R software (R Core Team 2018). The population structure and kinship coefficient matrix used in the association analysis were inferred as described by Contreras-Soto et al. (2017). Manhattan plots were obtained using the qqman 0.1.4 R package (Turner 2018).

Haplotype block and annotation of nearby genes
Statistically significant SNPs were mapped into linkage disequilibrium blocks with the aid of Haploview 4.2 (Barrett et al. 2005), and the extensions of the blocks were estimated. The proteins within each block were annotated with Augustus 3.2.2 (Stanke and Morgenstern 2005) using Arabidopsis genome as a reference, and Gene Ontology was retrieved with Goanna tool from AgBase (McCarthy et al. 2006).
Flanking sequences of each significant SNP associated with bacterial pustule, with 2000 bp downstream and upstream, were compared with the annotation reference (Williams 82 assembly V2.1, NCBI RefSeq assembly accession: GCF_000004515.5) (O'Leary et al. 2016) using translated BLAST (blastx) (Johnson et al. 2008). Then, the protein sequences of Open Reading Frames (ORFs), protein polarity and stability were identified with the use of the web tool Expert Protein Analysis System (ExPASy) (Gasteiger et al. 2003). The sequence alignment was carried out using Clustal Omega Multiple Sequence Alignment (Sievers et al. 2011) with BLOSUM 62 matrix to verify the identity of the sequence compared with database information.

Soybean cultivar screening for bacterial pustule
The set of 109 cultivars was inoculated with XAG2440 p and XAG2447, Xanthomonas citri pv. glycines strains, using the same design and methodology in separated trials. Considering the inoculation of XAG2447, the average score of the cultivars ranged from 1 to 1.67, and 76.15% of the cultivars were classified as resistant. The remaining cultivars showed a few symptoms and were consequently classified as moderately resistant, including FUNDACEP 57 RR, CD 249 RR STS, CD 2630 RR, and CD 233 RR. In contrast, the XAG2440 p inoculation resulted in a wider range of scores, and 17.43% of the cultivars were categorized as resistant or moderately resistant; 35.78%, as moderately susceptible; and 46.79%, as susceptible or highly susceptible, with TMG 7161 RR and TMG115 RR having the least severe symptoms. Also, the reference resistant cultivar of the Brazilian Department of Agriculture, BRS 133, was found to be resistant with XAG2447 screening, but susceptible to XAG2440 p strain inoculation. Random residual effects are normally distributed.
The different responses for XAG2440 p and XAG2447 could be due to some pathogen specificities. For example, the pathogen type III secretion system (TTSS) could be more intense on XAG2440 p . TTSS is encoded by avirulence gene avrBs3 homolog avrXg1 (Athinuwat et al. 2009), associated with the pathogen aggressiveness, and may be the reason for more severe symptoms caused by XAG2440 p infection. A second explanation would be a mutation in the extracellular diffusible factor (DSF). The RpfF gene is responsible for the expression of the DSF signaling molecule in Xanthomonas citri pv. glycines, and it is related to pathogen colonization (Thowthampitak et al. 2008). Thereby, the virulence of the rpfF mutants, as it could be the case of XAG2447, is reduced by the interruption of quorum sensing signaling. Thus, plant scientists should be aware of the variability of symptoms caused by XAG2440 p and XAG2447 strains in the flowering stage in the development of future cultivars.

The genome-wide association (GWA) analysis
After the genotyping data were processed, 3807 polymorphic SNPs that meet MAF > 0.05 and call rate < 0.1 in the population were kept for statistical analysis. The distribution of these SNPs was on average 190.35 markers per chromosome, with a minimum of 140 markers in chromosome 1 and a maximum of 266 in chromosome 13. Markers of greater effects, with a threshold of −log 10 (p-value) ≥ 2.53, were selected, so that we would have a higher number of SNPs to be used in subsequent searches for putative nearby genes and proteins associated with bacterial pustule. Since the set of soybean genotypes inoculated with XAG2447 strain showed only resistant or moderately resistant phenotypes, the GWAS was solely conducted with strain XAG2440 p phenotyping data, which resulted in 13 non-redundant SNP markers across six chromosomes significantly associated with bacterial pustule (Table 1). Out of the 13 SNPs, five are located in chromosome 18, three in chromosome 13, two in chromosome 3, and one in chromosomes 5, 10, and 15, as illustrated in the Manhattan plot (Figure 1).
The two markers most significantly associated with bacterial pustule (Gm18_4188718_A_C and Gm10_48426970_A_G) showed low MAF values, which reflect low allele frequency in the studied cultivars, due to recent mutation events or their low fitness and consequent reduced reproductive success. On the other hand, the Gm03_46592189_A_G and Gm03_46889507_T_C markers had the highest values of MAF. The equality of MAF values of markers in chromosome 3 is not surprising, since those alleles are segregating together in a linkage disequilibrium block of 702 kb. Several genes and transcription factors coding for defense-related proteins are found in this region, which makes it promising and worthy of further investigation.

Candidate genes identification and protein annotation
No significant similarity was found between Gm05_33176582_G_A, Gm10_48426970_A_G, and Gm13_34818193_C_T flanking sequences of 2000 bp upstream and downstream and reference soybean gene annotation at NCBI. Markers Gm03_46889507_T_C and Gm18_122382_A_G are also located in intergenic regions, distant less than 2000 bp from the nearest gene. The Gm18_4188718_A_C, Gm18_3954704_C_T, Gm13_34946643_T_C, Gm18_54979_G_A, and Gm18_4324818_G_T markers are located in intragenic regions (Table 2).
Markers Gm13_898111_T_G, Gm03_46592189_A_G and Gm15_6925513_T_C are located inside protein coding regions of genes. They code, respectively, for uncharacterized protein LOC100779077, histone-lysine N-methyltransferase SUVR4, and ABC transporter B family member 9. Markers Gm13_898111_T_G and Gm03_46592189_A_G, located in ORFs 5'3', frames 2 and 3, second and first bases of the codon, respectively, resulted in an amino acid change with allele variation, in addition to a polarity shift. Marker Gm13_898111_T_G is at 1334 bp from the initiation of the ORF and allele variation T/G modified the amino acid produced to Phenylalanine (F)/Cysteine (C), a shift from hydrophobic to a polar amino acid. Considering marker Gm03_46592189_A_G, located 208 bp from the beginning of the ORF, the nucleotide change A/G altered the amino acid produced to Threonine (T)/Alanine (A), a shift from a polar amino acid to a hydrophobic one, similarly to Gm03_46592189_A_G. Marker Gm15_6925513_T_C is located at 237 bp or 79 codons from the beginning of the ORF, 3'5' frame 1, the third base of the respective codon. Nucleotide base variation T/C did not change the translated amino acid, which continues to be the polar Lysine (K).
Since LOC100779077 is an uncharacterized protein, we cannot infer about its function or connection to a specific plant defense mechanism based on previous studies. Histone-lysine N-methyltransferase SUVR4 is highly related to chromatin structure modulation (Thorstensen et al. 2006), possibly regulating transposons damage with post-translational silencing histone H3K9 through methylation, an epigenetic control (Veiseth et al. 2011). Histone lysine methyl transferases are known for regulating ETI and affecting plant immunity pathways, correlated with lipid, cuticular wax, and carotenoid biosynthesis, for example . ABC transporter B family member 9 is an ATP-binding cassette transporter of molecules across membranes, and a protein that can participate in resistance and parasitism processes (Hwang et al. 2016). During infection, for example, ABC transporters in bacteria can use efflux pumps to carry virulence factors or quorum sensing signaling molecules and overcome plant defenses (Du et al. 2018). Furthermore, ABC transporters play an important role in the transportation of secondary metabolites, often related to plant defense, including alkaloids and phenolic compounds (Yazaki 2006). In potatoes, the ABC transporter subfamily, composed of pleiotropic drug resistance Amino acid residue in the first base of the SNP designation, position in protein sequence, amino acid residue in the subsequent base of the SNP designation.

PC Fonseca et al.
proteins, is related to the pathogen response regulation through foliar secretion exudates with antimicrobial activity (Ruocco et al. 2011).
Five out of 13 SNPs are within intragenic regions, related to protein LOC100797777, protein DJ-1 homolog D, dynamin-related protein 3A isoform X2, pheophytinase (chloroplastic), and KH domain-containing protein SPIN1 isoform X2. Protein DJ1 family is related to the plant stress response in yeast homolog Hsp31 through detoxifying capabilities (Melvin et al. 2017). It probably has the same function in plants, since it is a highly conserved protein (Xu et al. 2010). Dynamin-related protein 3A is associated with peroxisome replication in Arabidopsis (Lingard et al. 2008), an organelle related to molecular signaling, which accumulates endogenous reactive oxygen species (ROS) (Nyathi and Baker 2006). In the event of host cell infection, ROS act on cell response and plant-pathogen interaction, and ozone plays an important role in the triggering of a hypersensitive response (Waszczak et al. 2018). For the remaining markers, no association with plant defense response was found.
Markers Gm13_898111_T_G, Gm03_46592189_A_G, and Gm15_6925513_T_C and other defense-related candidate genes, identified to be associated with soybean bacterial pustule in the flowering stage, should be investigated simultaneously and separately. While focusing on the comprehension of resistance mechanisms and host-pathogen interactions through gene expression, we might be able to identify molecular markers to improve the marker-assisted selection used in molecular breeding programs.
The biological information related to the small effects of single SNPs obtained from the association analysis was important for the identification of candidate genes regulating the soybean response in the flowering stage to Xanthomonas citri pv. glycines complex pathosystem defense process. The study of resistance considering the possible existence of a relationship between the developmental stage and defense mechanisms can improve our knowledge about key resistant genes and their role in plant-pathogen interactions, which is essential for a target-specific and effective molecular breeding program. Since GWAS is based on single tests for each marker, further understanding of gene-gene and gene-protein interactions, post-transcriptional gene regulation, and epigenetics control would allow for unprecedented advancement in GWAS interpretation, and its applications. Therefore, the complementary use of other methodologies, such as a network analysis, may provide biological insights to elucidate the results of soybean pustule resistance association.