Genetic analysis of wheat grains using digital imaging and their relationship to enhance

: Phenomic characterization through digital imaging (DI) can capture the three dimen-sional variation in wheat grain size and shape using different image orientations. Digital imaging may help identifying genomic regions controlling grain morphology using association mapping with simple sequence repeats (SSRs) markers. Accordingly, seed shape phenotypic data of a core collection of 55 wheat genotypes, previously characterized for osmotic and drought tolerance, were produced using computer based Smart grain software. Measured dimensions included seed volume, area, perimeters, length, width, length to width ratio, circularity, horizontal deviation from ellipse (HDEV), vertical deviation from ellipse (VDEV), factor form density (FFD) etc. The thousand grain weight (TGW) was positively correlated with grain size direct measurements; however, VDEV, FFD and other derived grain attributes showed no or negative correlation with TGW. Digital imaging divided the genotypes correctly into well-defined clusters. The wheat genotypes studied were further grouped into two sub-clusters by the Bayesian structure analysis using unlinked SSR markers. A number of loci over various chromosomal regions were found associated to grain morphology by the genome wide analysis using mixed linear model (MLM) approach. A considerable number of marker-trait associations (MTAs) on chromosomes 1D and 2D may carry new alleles with potential to enhance grain weight due to the use of untapped wild accessions of Aegilops tauschii . Conclusively, we demonstrated the application of multiple approaches including high throughput phenotyping using DI complemented with genome wide association studies to identify candidate genomic regions underlying these traits, which allows a better understanding on molecular genetics of wheat grain weight.


Introduction
Wheat grain size and shape are important characteristics, influenced by genetic and environmental factors causing variations in these attributes. Various grading methods using diverse morphological attributes for sorting different cereal grains and varieties through image processing techniques have been reported in the literature (Majumdar and Jayas, 2000;Visen et al., 2001). The digital imaging analysis (DIA) is the process of converting digital images into quantitative measurements based on pixel counts. This can result in changing the generation of large quantitative data and has previously been used in many works (Kwack et al., 2005;Dana and Ivo, 2008).
Genetic improvement in yield related attributes has become possible through the introgression of dwarfing genes and recently the use of synthetic hexaploid wheats (Mujeeb-Kazi et al., 2013). From agronomic perspectives, wheat grain yield is the most important trait underpinned by two numerical components including grains per square meter and grain weight (Calderini and Reynolds, 2000). In addition, improvement in TGW is considered a promising approach to improve wheat yield potential and is considered an important area of wheat genetic and breeding studies (Peng et al., 2003;Su et al., 2011). Seed quality is mainly determined by interactions between the environment and the genome during seed development and maturation (Groos et al., 2003;Tsilo et al., 2010;Jamil et al., 2017).
Exploring quantitative trait loci (QTLs) for grain weight and its related component are important steps to deploy favorable alleles through marker-assisted selection. However, the relative disadvantages of linkage mapping studies over linkage disequilibrium (LD) or association mapping (AM) for the underlying trait control mechanisms suggest the use of mapping technique to be more appropriate for diverse germplasm (Breseghello and Sorrells, 2006;Huang and Han, 2013).
Since several studies in wheat have reported QTLs for grain size and weight (Breseghello and Sorrells, 2007;Sun et al., 2009;Ramya et al., 2010, Rasheed et al., 2014. However, few studies have reported on QTLs for grain shape (Gegas et al., 2010;Williams et al., 2013;Williams and Sorrells, 2014) and studies under drought stress conditions are very rare (Nezhad et al., 2012). This study therefore aimed to i) characterize a core collection of wheat germplasm (Ali et al., 2015;Ali et al., 2017) for grain weight and explore its relationship to size and shape using digital imaging for high-throughput phenotyping, ii) manipulate digital imaging data for genotype discrimination, and iii) to identify SSR markers associated with grain phenotypes using the association mapping analysis.

Genetics and Plant Breeding
Research Article grain weight Genetic analysis of wheat grains Sci. Agric. v.77, n.6, e20190069, 2020

Materials and Methods
The main population consisted of conventional and synthetic-derived wheat lines developed for drought tolerance. Derived synthetics were produced earlier by crossing primary synthetic hexaploid wheats from crosses of durum wheat (T. turgidum)/Ae. tauschii with susceptible bread wheat cultivars and originated from CIMMYT through Wheat Wide Crosses and Cytogenetics program in Pakistan. From this main population, a core collection of 55 wheat genotypes was previously characterized for drought stress tolerance under field condition for three consecutive years and under osmotic stress tolerance in lab conditions. Details of the germplasm, its pedigree, experimental layout and the studied attributes are given in Ali et al. (2015 and. Spikes were harvested individually by hand and threshed with a mechanical belt thresher.

Digital imaging analysis of wheat seeds
Twenty-five intact randomly selected seeds of each genotype were used for the imaging analysis using the procedure discussed in Williams et al. (2013), Tanabata et al. (2012). Vertical and horizontal photography of seeds was accomplished on black sheet in a 5 × 5 grid spaced 1.5 cm apart with a digital camera. The two photographs for each genotype were named horizontal (H) and vertical (V) image (Figures 1A and 1B). The procedure followed was according to Williams et al. (2013). Digital photos were transferred to a photographic expert group (JPEG) files and were further processed by using the software package SmartGrain version 1.1, which was initially designed for rice seed images (Tanabata et al., 2012). Photograph files were renamed according to concerned genotype. In Smart-Grain software, quantitative measures were directly derived from JPEG image, by setting a uniform scale in each photograph as seen in Figures 1A and 1B, to allow estimation of the pixel based actual distance. Seed color and background color commands were selected after loading the image into the software. Then, the morphology of selected seeds (25) was analyzed, outlines detected, longitudinal and horizontal axes were drawn automatically to calculate seed length, seed width, seed area, seed perimeter, length to width ratio, circularity and distance from the gravity point (Figures 2A and 2B). The result file was transferred as CSV (comma-separated values) output file that could be opened in spread software (Tanabata et al., 2012) for further statistical analysis. Then, Elliptic Fourier Descriptors (EFDs) analysis was performed to assess the variations in grain shape (Iwata and Ukai, 2002;Iwata et al., 2010). For that purpose, bitmap images of wheat seeds were taken with a digital camera, processed through software SHAPE that alters the shapes of objects (wheat seeds) into a data apposite to record the EFDs and calculate the PCA score for each object. The PCA score simplifies a large number of EFD coefficients produced for each shape and minimizes them to quantitative values (Williams et al., 2013).

Molecular analysis
The same germplasm was previously genotyped with 101 simple sequence repeat (SSR) markers. Details of the allele number, polymorphic information content (PIC), gene frequency, SSR genetic diversity, and population structure analysis carried through STRUTURE software (v. 2.1; Bradbury et al., 2007) were discussed in Ali et al. (2015). Observed Q matrix for maximum ΔK was used for the association mapping analysis carried out on phenotypic and genotypic data of the studied germplasm with TASSEL 2.0.1 according to Bradbury et al. (2007). Due to the small population size, a marker-trait association study was conducted based on the mixed linear model (MLM) approach. The MLM approach is relatively strict because it takes into account both population structure coefficients and kinship matrices (K), hence, it is highly recommended in such cases in order to avoid false positives or spurious marker trait associations. A significant threshold considered was based on Bonferroni correction (Bland and Altman, 1995). Associated markers were assigned to wheat chromosomes based on their position on a wheat consensus map (Somers et al., 2004) and data in the GrainGenes database (www.graingenes.org).

Statistical analysis
Relationships between variables were determined using the Pearson correlation test with the STATISTICA software (Statsoft v. 7.0, 2004). Based on the correlation approach, the data obtained were also subjected to the principal component analysis (PCA) in order to generalize and characterize the germplasm more comprehensively by using the multivariate option in PAST 2.12 (Hammer et al., 2001). The cluster analysis was performed to find similarity matrix with UPGMA (unweighted pair-group method with arithmetic mean) by using the same software PAST 2.12.
The multiple regression analysis was performed to check the variation in TGW as dependent variable affected by the seed shape parameters taken as independent variables using Lisrel 9.1 software package. Moreover, the covariance matrix and structure equation of the model was formulated. Path diagram with appropriate coefficients was drawn by the same software package elucidating the interaction of direct effects of dependent variables either positively or negatively to thousand-grain weight.

Pearson coefficient of correlation and path coefficient analysis for grain size descriptors
The Pearson coefficient of correlation (r) between the studied grain size descriptors are given in Table 2. The maximum positive correlation (r = 0.98) was recorded between horizontal area (HA) and horizontal perimeter (HP), between vertical perimeter (VP) and seed thickness (ST), between vertical area (VA) and seed thickness (ST), and, between vertical area (VA) and vertical perimeter (VP). Similarly, the maximum negative correlation (r = -0.91) was recorded between density (FFD) and seed weight (SW). Further, ST, SW, and SL had positive correlation with TGW (r = 0.61, r = 0.46 and r = 0.62 respectively). The FFD and VDEV (vertical deviation from ellipse), important derivatives, were negatively correlated with TGW, (r = -0.29 and r = -0.31 respectively). Similarly, the aspect ratio (AR) correlated negatively with most studied measurements.
For a further investigation of individual measurement on grain weight, the path coefficient analysis was conducted using TGW as dependent variable. The grain Genetic analysis of wheat grains Sci. Agric. v.77, n.6, e20190069, 2020 size descriptors, taken as independent variables, comprised those describing the different aspects of grain size and shape as well as some miscellaneous derivatives (Figure 3). Horizontal and vertical deviations from the ellipse have an indirect positive effect on grain weight and both vertical and horizontal perimeters have direct positive effect on grain weight, because these are derivatives of grain length, width, and thickness. Grain thickness revealed a maximum direct effect (0.71) on TGW, followed by vertical area (VA, 0.68), vertical perimeter (VP, 061) and seed volume (SV, 0.41), while seed length (SL) exhibited the least direct effect of 0.05. Similarly, HDEV and VDEV, derivatives of seed thickness, width, and length, showed indirect positive effect on TGW. However, negative direct effect (-0.08) was observed for the derived variable FFD on TGW.

Principal component analysis (PCA)
In order to identify the most appropriate combination of the attributes studied for grain yield, the PCA and biplot analyses were conducted using mean values ( Figure 4 and Table 3). Trait vectors displaying angles smaller than 90° have a positive association, while vectors with angles greater than 90° have a negative association. Further, correlation intensity increased in angles near 0° and 180°. The vector length shows the extent of variation explained by respective trait in the PCA. The first two axes, that is, PC1 (eigen value = 9.1) and PC2 (eigen value = 2.7), explained up to 74 % of the total variability. The attributes in order of their positive contribution to PC1 included HA (0.322), HP (0.316), SL (0.302), SW (0.318), SV (0.324), ST (0.303), and VP (0.295). Similarly, for PC2, the major contributing attributes were TGW (0.416) and DS (0.236). For PC3, only .580 HA = horizontal area; HP = Horizontal perimeter; SL = seed length; HC = horizontal circularity; HDEV = horizontal deviation from ellipse; AR = aspect ratio; SW = seed width; SV = seed volume; ST = seed thickness; VP = vertical perimeter; VA = vertical area; VC = vertical circularity; VDEV = vertical deviation from ellipse; FFD = factor from density; DS = distance between IS (intersection of length and width) and CG (center of gravity); TGW = thousand grain weight.  FFD (0.624) was noted for its prominent contribution. Eigenvectors from the biplot analysis clearly indicated that HDEV, FFD, AR, and VDEV displayed a negative association. Genetic analysis of wheat grains Sci. Agric. v.77, n.6, e20190069, 2020

Marker Trait Association
The scoring patterns of SSRs loci have shown 525 alleles across 55 wheat accessions. Number of alleles per locus ranged from two to 14 with an average of 5.2, indi-cating that the diversity between wheat accessions was relatively high. The PIC values also confirmed these results. The population structure analysis was performed using STRUCTURE software version. 2.1. The Q matrix was recorded by running structure at K= 2 and 7, where the highest value of ΔK occurred, demonstrating its maximum likelihood. The data obtained through the structure analysis was further validated by the clustering (UPGMA) method. For the AM analysis, K = 2 from population structure data was used. Significant markers (p ≤ 0.01) are given in Table 4 along with their chromosomal locations. In total, 26 MTAs were found identified for 101 SSRs used in this study, of which, the number of multi-trait MTAs (markers associated with more than one trait) was 22, while trait-specific MTAs (marker associated with only one trait) were four in number. The specific trait MTAs included Xwmc798-1BS (r 2 = 0.03) for AR, Xgdm35-2DS (r 2 = 0.06) for HA, Xgwm372-2AL (r 2 = 0.05) for AR and Xgwm544-5BS (r 2 = 0.03) for distance between intersection of length and width (DS). The chromosomes with association included two from A-genome (2A and 3A), four from B-genome (1B, 3B, 5B and 7B), and two from D-genome (1D and 2D). Chromosome 2D had the largest number of MTAs (total 9 which included MTAs for HA, SL, HP, TGW, VA, VP and SW). Regarding homologous chromosomes, group 2 displayed   the largest number (10) of MTAs, followed by group 3 (7 MTAs) while the smallest number was observed in group 5 (with only one MTA for DS). While the homologous groups 4 and 6 showed no association with any of the studied grain phenotypes.

Discussion
Seed shape and size are vital agronomic traits because of their immense effects on yield and market value. Large collections of measurements are needed to obtain precise seed size data because very small differences exist in size between seeds from a plant. Manual measurement methods have certain limitations, namely limited data, low quality measurements, and variety of shape data that could be gleaned. Therefore, a well-organized, consistent, high-throughput grain phenotyping method is needed to validate the genetic analysis and selection for seed shape in plant breeding (Breseghello and Sorrells, 2007;Gegas et al., 2010;Williams et al., 2013). Smartgrain software was originally used to analyze the rice seeds. However, here we are reporting digital im-  aging analysis of wheat grains to determine seed shape and size with some precise and updated attributes. This software identifies the seeds in digital image, recognizes seed outlines, and calculates seed length (SL), seed width (SW), length-to-width ratio (aspect ratio i.e. AR), seed thickness (ST, which is width of vertical image), seed area (SA), horizontal perimeter (HP), horizontal circularity (HC), distance of length to width intersection with the center of gravity (DS). As a preliminary examination, we analyzed the seeds with ImageJ software; however, SmartGrain was superior and user friendly and also described seed shape and size with more parameters as compared to ImageJ. Secondly, SmartGrain uses JPEG format, thus, there is no need to alter picture file formatting into bitmap, 8-bit or black and white as in ImageJ. Moreover, output files produced by SmartGrain are saved automatically, and could transformed and analyzed by any spreadsheet software. Circularity and distance of L/W intersection with gravity are the additional parameters determined by SmartGrain as compared to ImageJ.
In this work, computational methods using DI imaging technique enabled to automatically measure a large quantitative dataset of robust size descriptors [AR, aspect ratio; DS, distance between IS (intersection of length and width) and CG (center of gravity); FFD, factor form density; HA, horizontal area; HC, horizontal circularity; HDEV, horizontal deviation from ellipse; HP, Horizontal perimeter; SL, seed length; TGW, thousand grain weight; ST, seed thickness; VA, vertical area; VDEV, vertical deviation from ellipse; VP, vertical perimeter; SV, seed volume; SW, seed width; TE, total effect].
Significant correlations between various seed morphological attributes provided new insights into the complex composition of grain size and shape components. For instance, vertical (VDEV) and horizontal (HDEV) deviations from ellipse were negatively correlated with grain length and width, meaning that deviation from the ellipse enhances grain length and width, and ultimately TGW. Similarly, the positive correlation between SL and SW indicated the possibility of finding some potential cultivars/genotypes possessing wider and lengthy grains simultaneously which may lead to enhanced TGW. The findings are in accordance to those reported recently by Zhang et al. (2013) and Rasheed et al. (2014); however, SL had more positive impact on TGW as compared to SW. Similar reports of mild to moderate correlations between grain weight, SL and SW with r = 0.21-0.75 were discussed in Shouche et al. (2001), Okamoto et al. (2013), and Abdipour et al. (2016). Similarly, the studies conducted by Gegas et al. (2010) and Williams et al. (2013) for seed shape variations targeted traits influencing grain size/weight and results are comparable to our findings. The positive direct and indirect effects of a trait on grain weight may allow its use in selection under specific conditions (Ramazani et al., 2017).
The principal component and cluster analysis revealed that variability in grain attributes could be captured quantitatively. In the PCA, differences in grain attributes were decomposed into mutually independent quantitative characteristics, that is, principal components (PCs). Using symmetric standardized coefficients for PCA, 74 % of the variation was attributed to the first two PCs. The trend observed from PCA pave a way for possible unsupervised classification algorithm development for grain type identification and classification. Similar inferences were also discussed in details by Mebatsion et al. (2012). With advances in imaging systems, identification and classification of cereal grain using one or a combination of morphological features has been attempted with different levels of success (Majumdar and Jayas, 2000;Choudhary et al., 2008). In this context, the present work is crucial as it aims at developing a consistent procedure for an objective and quantitative classification of cereal grains. General concerns in plant breeding are focused on the quantitative data generated per genotype and decrease in breeding cycle duration. This could be well achieved with the help of high throughput phenotyping techniques in breeding programs (Heffner et al., 2010). Unravelling morphological data through DIA accomplished in this study is consistent with these trends and has potential to allow direct use of quality and agronomic attributes. Thus, this study, which used grain morphometric parameters, could lead to wheat class/variety identification with useful implications in plant breeding. Precise clustering of the studied wheat genotypes based on diverse grain attributes obtained could further lead to genotype discrimination. This is supported by the fact that SBW and CBW were found clearly separated in different clusters. Further inferences could be made for individual genotypes within the sub-clusters based on its pedigree. Further studies are recommended for developing a universal statistical model to achieve more precise results (Zapotoczny, 2011). The DIA test stringency has potential for discerning genetic expression aspects of targeted diverse wheat genome accessions introgressed via interspecific hybridization and is under exploration.
For a better understanding of the effect of individual measurement on grain weight, the path coefficient analysis was conducted by having TGW as dependent variable, which showed the phenotypic model with greater precision. Grain thickness was found to reveal maximum direct effect, whereas SL exhibited the least direct effect of 0.05. Similarly, HDEV and VDEV, derivatives of seed thickness, width and length, exhibited indirect positive effect on TGW. However, negative direct effect was observed for the derived variable FFD on TGW, meaning that the loci harboring their control should undergo negative selection in order to get superior grain weight genotypes. The efficiency of indirect selection depends on heritability of the selected trait as well as correlation between a targeted and selected trait. The results further revealed that genes having pleiotropic effect or closely Genetic analysis of wheat grains Sci. Agric. v.77, n.6, e20190069, 2020 linked genes might probably involve for correlations between these traits (Cooper et al., 2012;Abdipour et al., 2016). Gegas et al. (2010) and Rasheed et al. (2014) previously confirmed that kernel size and shape were largely independent traits in a study of six wheat populations.
The scoring patterns of SSRs loci have shown 525 alleles across 55 wheat accessions. Number of alleles per locus ranged from 2 to 14 with an average of 5.2, indicating that the diversity between wheat accessions was relatively high. The PIC values also confirmed these results. This indicated the importance of such diverse bread wheat germplasm in order to improve further wheat diversity and productivity in a scenario of climate change, which has posed serious problems in recent years. Henkrar et al. (2016) have also reported similar findings for enhancing genetic diversity and improving wheat productivity.
Since the experimental germplasm basically comprised synthetic derived (SBW) and conventional (CBW) bread wheat, along with check cultivars (CCT), a comparison between these groups with respect to marker trait associations revealed that most MTAs for grain shape and size variation were allocated in the D genome of allohexaploid wheat. This means that natural variations in Ae. tauschii populations are the most important sources to enhance the D-genome diversity in bread wheat (Jones et al., 2013). Derived-synthetic wheats are thus valuable sources in order to identify important loci from agricultural perspectives and investigate the expression of D-genome variation in different grain phenotypes that act in the hexaploid genetic background (Rasheed et al., 2014).
Genetic partition of the grain shape related attributes by means of association mapping and QTL mapping, followed by MAS (marker-assisted selection), is an active research area, and several QTLs have already been reported (Reif et al., 2011). Recently, Yan et al. (2017) identified a new QTL QTgw.cau-2D controlling grain weight from the synthetic allohexaploid wheat, which may play significant role in genetic improvement of wheat breeding. Similarly, Chen et al. (2019) reported on new QTLs that provided insights into the genetic basis of grain shape as well as additional genetic resources to develop elite rice varieties. In this study, 23 genomic loci were identified having association with wheat grain weight and other seed phenotypes for the germplasm grown under different water regimes. Jing-Lan et al. (2015) previously reported 208 MTAs in wheat under four different environments. They further identified SSR loci associated with some grain shape related attributes, which were common in hexaploid wheat and Ae. tauschii, and hexaploid wheat, suggesting that the presence of common alleles may elucidate selection between Ae. tauschii and hexaploid wheats, and the evolutionary process. Koebner and Summers (2003) stated that favorable allele identification could help select parents for breeding programs in order to ensure high level of favored alleles across different sets of loci. However, the probability of other genetic effects should not be ignored while considering the additive effects of genes that resulted from linear association between grain shape phenotypes (including TGW, SL, SW and ST) and favorable alleles. Wu et al. (2012) reported on QTL related to TGW on chromosome 2D, which might be analogous to TGW-MTA found in this work with associated marker Xgdm6-2DL. The differences in chromosomal positions may be attributed to the different germplasm sources used, that is, mainly synthetic-derived wheats. The same MTA (Xgdm6-2DL) is also comparable to QTLs namely QTkw.ncl-2D.2 (located between Xwmc41 and Xgwm349) and QTkw.ncl-2D.2 (located between Xwmc601 and Xwmc41) reported by Ramya et al. (2010).

Conclusion
The integrated approach of using genomics with phenomics resulted in documentation of many genomic loci with its putative functions to enhance TGW in wheat. In this study, different alleles were identified from SSRs in the wheat germplasm studied that showed significant association with grain-shape related attributes and TGW. Some of these alleles exhibited positive effect on wheat genotypes, meaning the enhanced phenotypic value of grain-related attributes. As indirect selection indices, these traits could help deepen understanding on grain weight components in wheat. Further studies, including comparative genomics approaches, are recommended to investigate the association for sorting variations at loci with important grain shape related attributes.