SNP markers associated with soybean partial resistance to Phytophthora sojae

Phytophthora root and stem rot is one of the most aggressive diseases in soybean crop. The use of resistant cultivars is the main strategy to reduce losses caused by the pathogen. This study aims to identify SNP markers associated with genes or QTLs that provide soybean with partial resistance to Phytophthora sojae. A total of 169 soybean cultivars were inoculated with Phytophthora sojae and genotyped with 3,807 SNP markers. Genome-wide association analysis was carried out via multiple linear models, followed by multiple linear regression and linkage disequilibrium analysis. Four QTLs associated with the characteristic were identified: two on chromosome 3 and two on chromosome 15. The regions containing these QTLs contain genes already annotated as providers of resistance to pathogens, in plants. The use of those markers in the selection of resistant plants can increase the efficiency of breeding programs in the development of soybean varieties resistant to P. sojae.


INTRODUCTION
Soybean (Glycine max (L.) Merri.) is one of the most outstanding species in the world scenario of agricultural commodities, where it is considered one of the main sources of protein and vegetable oil. In the soybean crop, diseases are responsible for annual production losses of 15 to 20% (EMBRAPA 2013). Phytophthora root and stem rot (PRSR) is caused by the pathogenic oomycete Phytophthora sojae (Kaufmann and Gerdemann) (Schmitthenner 1985). The disease is favored by water availability in the soil, which is common in clayey and compacted soils, and by prolonged periods of moisture saturation and temperatures equal to or higher than 25 ºC. Annual losses generated by PRSR have been estimated as two billion dollars, on a global scale.

WH Ludke et al.
Most published studies on resistance to PRSR involve traditional linkage mapping methods. As an alternative, association mapping is a tool used in the identification of markers associated with a trait, which is based on the linkage disequilibrium between alleles in a population. One of its advantages is not require the development of biparental populations. With the use of high-density genotyping with single nucleotide polymorphism (SNP) markers, association mapping has been successfully employed to analyze complex traits in several agricultural crops ).
The present paper presents a genome-wide association study (GWAS) using 3,807 high-quality SNPs to identify markers and genes associated with resistance to PRSR. Results can help soybean breeders to more easily develop varieties resistant to PRSR, using marker-assisted selection (MAS).

Genetic material and experimental location
A total of 169 soybean cultivars were used in the trials, which took place in a greenhouse at the Department of Phytopathology at the Federal University of Viçosa, located in Viçosa -MG, Brazil.

Phenotypic analysis
For phenotyping, P. sojae inoculum with the virulence formula 1d, 2, 3b, 3c, 4, 5, 6, and 7 was used. The experiment was set up as a completely randomized design with three replicates, and the experimental units consisted of 500 mL plastic cups with eight plants per cup. The substrate used was composed of vermiculite and sand at the 2:1 ratio, autoclaved.
Inoculation was performed occurred at the time of seeding, by the mycelium-inoculum layer method, proposed by Schmitthenner and Bhat (1994). Phenotyping was carried out every 21 days after seeding, by evaluating the roots according to a scale of scores: 1 = no root rot; 2 = up to 25% of root mass rotted; 4 = 50% of root mass rotted; 5 = all roots rotted, up to 20% of seedlings killed; 6 = up to 50% of seedling killed; 7 = up to 75% of seedlings killed; 8 = up to 90% of seedlings killed; 9 = all seedlings killed. Subsequently cultivars with scores of 1.0 to 4.0 are considered to have high partial resistance; 4.1 to 5.0 have moderate partial resistance; 5.1 to 6.0 are moderately susceptible; and 6.1 to 9.0 are highly susceptible to phytophthora root and stem rot (PRSR), as proposed by Dorrance et al. (2003).

Genotyping with SNP markers
DNA samples from the 169 soybean cultivars were extracted according to protocol described by McDonald et al. (1994) with some modifications and genotyped with 6,000 SNPs using the BARCSoySOY6K BeadChip platform (Illumina Inc., San Diego, USA), which corresponds to a subset of the SoySNP50K BeadChip platform (Song et al. 2013). Genotyping was performed by Deoxi Biotechnology Ltd ® (Araçatuba, São Paulo, Brazil).
After the monomorphic markers or markers with 10% of missing data (call rate equal to 90%) or minor allele frequency (MAF; lower than 5%) were eliminated, 3,807 markers were left for the genome-wide association analyses. Loci with a heterozygous genotype for the SNPs were considered missing data.

Population structure
The structure of this variety population was previously evaluated by Contreras-Soto et al. (2017) with the same set of markers. In summary, the population-structure analysis was conducted by the Markov Chain Monte Carlo (MCMC) methodology, with Bayesian estimation algorithm, using InStruct software (Gao et al. 2007). Posterior probabilities were estimated using five independent rounds of the MCMC sampling algorithm for each number of simulated groups (k), with k ranging from 2 to 10, with no preliminary information of the population. A burn-in period of 5,000 iterations and a run length of 50,000 iterations were adopted. The best estimate of the k number of genotypes was defined by the lowest mean likelihood and deviance information criterion values among the nine simulated clusters (Spiegelhalter et al. 2002).
where D̅ is the Bayesian measure of model fit, defined as the a posteriori expectation of deviance (D̅ = E θ/y [-2 . lnf(y/θ)]); and pD is the effective number of parameters, which measures the model complexity.

Genome-wide association (GWA) analysis
The analysis of association of genotypic data with the phenotype was undertaken by the Compressed Mixed Linear Models method (Zhang et al. 2010) for the 169 cultivars, using the Genome Association and Prediction Integrated Tool (GAPIT) package of the R analytical platform. The mixed linear model (MLM) can be described as y = Xβ + Zu + e, where y is the vector of phenotypic observations; β is a vector containing the fixed effects, including the molecular markers and the population structure (Q); u is a vector of random additive genetic effects of the multiple QTLs in the background of the cultivars; X and Z are the incidence matrices; and e is the vector of random residual effects. Vectors u and e have normal distribution, with mean zero and variance shown as follows: where G = σ 2 a K, in which σ 2 a is the additive genetic variance and K is the kinship matrix. Homogeneous variance is assumed for the residual effect, with a mean of = Iσ 2 e , where σ 2 e is the residual variance. Markers with probability (P) values <0.001 were considered significant in the SNP/phenotype association analysis, and the Manhattan and Quantile-quantile (Q-Q) plots were developed from the P-value significance data.
To identify non-redundant markers and consequently the number of QTLs, a multiple regression analysis was carried out with stepwise model selection, using only the significant markers in the analysis of mixed linear models. Multiple regression analysis was performed in the JMP program, at the 5% input and output probability level.
The linkage disequilibrium (LD) measure was calculated using Haploview, by the following formula: , where D' is a linkage disequilibrium measure and pA, qa, pB, and qb are the frequencies of the alleles being compared. The root square of r 2 can be defined as the coefficient of correlation between the corresponding alleles.

Linkage disequilibrium
Linkage disequilibrium blocks were obtained by the Solid Spine method, using Haploview software. A cutoff point of 1% was adopted; i.e., if the addition of a SNP in a block results in allelic recombination greater than 1%, then the SNP is not included in the block.

Identification of candidate genes
The candidate genes were identified through the soybean database of the PlantGDB platform (Duvick et al. 2008). The genes closest to the markers associated with the trait were selected and the protein synthesized by those genes was subsequently identified.

Phenotypic variation
Of the 169 cultivars analyzed for partial resistance to PRSR (P. sojae strains 1d, 2, 5, 6, and 7), 24.85% were highly susceptible, 18.34% were moderately susceptible, 7.69% were moderately resistant, and 49.11% were highly resistant. Cultivar BRS 133, considered a standard for susceptibility, was highly susceptible, with a score of 6.56. 'Resistance standards' CD 206 and CD 201 were highly resistant (1.0 and 2.33, respectively). These results demonstrate the viability and efficiency of the P. sojae inoculum used. Because an inoculum containing more than one strain was used, one can infer that the set of varieties evaluated includes varieties resistant or moderately resistant to more than one strain of P. sojae. Qin et al. (2017) classified 97 accessions of soybean as resistant to one or more strains (1, 3, 7, 17, and 25) of P. sojae. Huang et al. (2016), in turn, identified 168 accessions resistant to 1 -11 P. sojae isolates. The use of these cultivars resistant to multiple strains can contribute to the management of PRSR and to maintain the effectiveness of Rps genes.

WH Ludke et al.
Population structure Contreras-Soto et al. (2017) evaluated the structure of this population of soybean varieties with this same set of markers. The number of populations that met the lowest likelihood mean and lowest deviance information criterion was nine. In this nine-group structure, 43% of the varieties have more than 50% probability of belonging to their respective groups.

Genome-wide association analysis
The 3,807 SNP used in the present study are widely distributed on all chromosomes of the soybean genome. Consequently, few genomic regions were not covered by the markers, thereby allowing for greater reliability in the detection of possible QTLs related to soybean resistance to PRSR. The mixed linear model includes fixed and random effects, and for the evaluation of this model, we used the population structure (Q matrix) and kinship (K) matrices. Considering the Bayesian information criterion, the obtained results showed adequate fit to the model (Q + K). The compressed mixed linear model (CMLM) approach, in turn, reduces the effective size of the dataset, grouping the individuals (Zhang et al. 2010).
Few GWASs of partial resistance to P. sojae can be found in the literature. In the present study, we used 169 soybean cultivars and 3,807 SNP markers. According to the CMLM analysis, eight markers were significantly associated with soybean partial resistance to P. sojae, five of which were located on chromosome 3 and three on chromosome 15 (Table 1 and Figure 1a).
The Q-Q plot indicated the occurrence of divergence between the observed and expected values in the null hypothesis, by the χ 2 test, with 7 points displaced from the diagonal line (Figure 1b). According to Patterson et al. (2006), deviations from the diagonal line indicate true associations between markers and phenotype. The fact that most of the points are positioned on the diagonal line proves that CMLM was adequate in eliminating false positives. The points deviating from the diagonal, positioned above that line, confirm the association of significant SNPs related to partial resistance to P. sojae.
For a more in-depth evaluation of these regions of chromosomes 3 and 15 containing significant markers, an analysis of linkage disequilibrium (LD) between the SNP markers in these intervals was carried out. On chromosome 3, the region comprised between the significant markers includes 1.88 Mb, containing four blocks in LD (Figure 2a). Marker Gm03_3225968 is in a LD block of 247 Mb, which contains two other markers. Markers Gm03_4782127 and Gm03_5106459 are 324 kb apart and are in linkage disequilibrium in a 440-Kb block, which contains four other markers. Marker Gm03_3225968 is in linkage disequilibrium in relation to markers Gm03_4782127 and Gm03_5106459 and 1.56 Mb apart from marker Gm03_4782127.
In the region of chromosome 15 containing the significant markers, there are three blocks in LD (Figure 2b). Marker Gm15_11496274 is in a DL block of 124 Kb, which has two other markers. Marker Gm15_18422604 is in LD Gm15_19326210, and the two are in a LD group of 2.5 Mb, which contains another four markers. Between these two DL blocks there is another LD block with 2.9 Mb, in addition to a 0.93-Mb region in linkage equilibrium.   The eight significant markers in the CMLM analysis were used in multiple regression analysis with the stepwise model selection procedure (Table 2). Markers Gm03_5217414, Gm15_18422604, and Gm03_3225968 were significant in the multiple regression analysis, indicating the existence of three resistance QTLs: two on chromosome 3 and one on chromosome 15. Together, these markers explain 21.1% of the variation observed for soybean resistance to P. sojae.
Based on the distance between markers Gm15_18422604 and Gm15_19326210 in relation to marker Gm15_11496274 (7.8Mb) and considering the linkage disequilibrium between these regions, this last marker would be expected to be associated with another QTL on chromosome 15. In the multiple regression analysis, marker Gm15_11496274 was not input into the model, considering an input probability of 5%, but the probability associated with this marker in the model was 5.5%. The inclusion of this marker in multiple regression analysis elevates the R 2 value from 21.10% to 22.90%. This indicates that there are two QTLs of resistance to P. sojae on chromosome 15: one in the region of marker Gm15_11496274 and another in the region of markers Gm15_18422604 and Gm15_19326210.
Markers Gm03_4782127 and Gm03_5106459 are in linkage disequilibrium with each other and in linkage equilibrium with the other significant markers of chromosome 3. These markers were not input into the multiple regression model, and their association with resistance to P. sojae may be because they are between two QTLs of chromosome 3.
By clustering the varieties per haplotypes observed in the significant markers in the multiple regression, we observe the effect of their association with resistance to P. sojae (Figure 3). The difference in average mortality between the varieties containing the haplotypes with alleles of resistance for markers Gm03_3225968, Gm03_5217414, and Gm15_18422604 (G, C, and T alleles, respectively) and the varieties containing the haplotype with alleles of susceptibility in these markers (A, T, and C, respectively) is 100% (Figure 3a). None of the varieties used in this study has the haplotypes with A, C, and C or G, C, and C alleles in markers Gm03_3225968, Gm03_5217414, and Gm15_18422604. The gradual variation in average mortality of the groups of varieties with the other haplotypes indicates the additive action of the QTLs identified.
Considering also marker Gm15_11496274, the inclusion of the T allele in the haplotypes together with the other three markers slightly reduces mortality, whereas the addition of the C allele slightly increases it. The exception is the haplotype associated with greater susceptibility, for which the difference in average mortality with the inclusion of the C or T allele is 28% (Figure 3b). This finding points to the existence of a QTL associated with resistance to P. sojae in the region of marker Gm15_11496274. This QTL acts by reducing susceptibility in the varieties that do not possess the resistance alleles of the other QTLs, but does not increase resistance when the other QTLs have resistance alleles.

Candidate genes identified
All markers identified as being associated with soybean resistance to P. sojae are in regions containing genes that provide the plant with resistance to pathogens.
Marker Gm03_3225968 is within the coding sequence of gene Glyma03g03480 (Table 3), which belongs to the auxin-responsive family (ARF) protein, which are components that provide specificity of response to auxins . Auxins, in turn, contribute to the susceptibility of plants to pathogen attack, since they provide these pathogens with nutrients (Pieterse et al. 2012).
Marker Gm03_5217414 is located within gene Glyma03g04990 (Table 3), which is an alanine-glyoxylate aminotransferase/beta-alanine-pyruvate aminotransferase, a protein related to photorespiratory activities involved

WH Ludke et al.
in defense mechanisms in the plant-pathogen relationship (Kim et al. 2005). In the same linkage-disequilibrium block containing marker Gm03_5217414 and gene Glyma03g04990 is also gene Glyma03g05070, 5 kb away from marker Gm03_5341695. This gene is related to the family of short-chain dehydrogenase/reductase (SDR) proteins, which are also involved in the defense mechanism (Hwang et al. 2012).
Marker Gm15_11496274 is located at 5.3 kb from gene Glyma15g15030, which does not have gene annotation. Marker Gm15_18422604 is located at 4 kb upstream of gene Glyma15g20550, involved in the expression of the pectinesterase family, which in turn are involved in soybean defense mechanisms against infection by cyst nematodes (Ithal et al. 2007). Marker Gm15_19326210, in linkage disequilibrium with marker Gm15_18422604, is located within gene Glyma15g21130, whose main functions are cell wall expansion and degradation.
The markers identified in this study can be used in soybean breeding programs to increase the frequency of lines with partial resistance to P. sojae. Selection of plants containing the resistance haplotype containing the G, C, and T alleles of markers Gm03_3225968_G_A, Gm03_5217414_C_T, and Gm15_18422604_T_C significantly reduces the average mortality of the selected plants.