Location: Home >> Detail
This work is licensed under aCreative Commons Attribution 4.0 International License
Crop Breed Genet Genom. 2020;2(3):e200013. https://doi.org/10.20900/cbgg20200013
1 Department of Agronomy, Purdue University, 915 West State Street, West Lafayette, IN 47907, USA
2 US Department of Agriculture, Agricultural Research Services, Small Grains Genotyping Laboratory, Raleigh, NC 27695, USA
3 Ohio Agricultural Research and Development Center, The Ohio State University, Wooster, Ohio 446931, USA
* Correspondence: Mohsen Mohammadi.
Grain yield is the primary target for increasing soft red winter wheat production in the United States. Genome wide association studies were performed for yield-related traits using field-based evaluation of 270 elite breeding lines. Yield ranged from 3.9 to 7.5 tons·ha−1. The effect of genotype by year interaction was significant for most traits. Heritability estimates ranged from 0.21 for biomass to 0.84 for plant height. Biomass, kernel weight, and grain weight per spike were positively correlated with grain yield while days to heading and maturity were negatively correlated with grain yield. We used FarmCPU method and 45K markers to identify marker-trait associations. Fifty-nine MTAs were identified based on two-year phenotypic estimates. An association for grain yield on chromosome 7D showed a −logP value of 16.35 and explained 18% of the phenotypic variation. Associations were identified at −logP values of 6.30 and 4.08 on chromosomes 5A for thousand kernel weight and grain per spike that explained 10% and 6% of phenotypic variations, respectively. Grain yield and days to heading data from previous trials in other states were used to validate and evaluate the transferability of associations. We observed that for grain yield, only 7 out of 28 associations and for days to heading, only 8 out of 47 associations were validated across seemingly homogenous environments, indicating the majority of associations are not stable QTL across environments. This study suggests that achieving genetic gains appears to require utilizing genotype by environment interaction and local field based germplasm testing.
Over 35% of the world’s population relies on wheat as a main source of food  Wheat is widely cultivated around the world for its adaptability to diverse growing regions and environmental conditions. The United States produced 51.3 million metric tons of wheat during the 2018-19 season , where most wheat harvested occurs west of the Mississippi River and in the Great Plains . However, soft red winter (SRW) wheat is grown mainly in the Midwestern and eastern United States, accounting for 15–20% of US wheat production . Specifically, the growing region extends from 30° N to 45° N in latitude and about 73° E to 96° W in longitude . In the Midwest and eastern wheat region, grain yield and resistance to Fusarium head blight disease are the underlying traits for profitability. The soft wheat products require minimal gluten protein, and lower protein levels than hard wheats . Therefore, producing more grain is the first focus of most SRW wheat breeding programs.
Wheat provides approximately 20% of the protein and calories for human consumption worldwide . In order to meet the needs of the growing population, food supplies from major cereals such as maize, rice and wheat will need to increase by 2–3% annually, and wheat has shown the lowest rate of increases . Ray et al.  estimated wheat yields are increasing at 0.9% per year, much less than the 2.4% required to double global production by 2050. With future food security and climate challenges ahead, wheat breeding efficiency and genetic gains must improve significantly to develop stable, adapted, and high-yielding wheat varieties.
Global demand for wheat is growing faster than genetic gains in yield potential . In the Great Plains region, the annual rate of genetic gain was estimated at 0.44%, mainly due to traits contributing to an increase in grain number . The USDA winter wheat regional performance nurseries for the Great Plains region displayed similar results over a 50-year period, with estimated genetic gain for grain yield at 0.79% per year. From 1919 to 2008, the genetic gains in SRW wheat in multiple environments ranged from 0.56% to 1.41% .
Much of hereto forth yield increases were due to increases in the number of spikes per area, the number of seeds per spike and spikelet, and harvest index—producing more grain from increasing yield components but maintaining the same biomass . With harvest index approaching its theoretical maximum biologic limits , increasing biomass can provide an opportunity to increase the photosynthetic tissues for fixing carbon and a productive canopy to capture radiation energy and convert it into dry matter. Reynolds et al.  reported that an increase in radiation use efficiency, grain number, and grain yield were positively associated with an increase in above ground plant biomass.
Breeding methodologies and techniques have changed drastically over the years. Further advances in statistical methodology and molecular markers led to the construction of genetic maps, evaluating complex traits, and associating the phenotypic variation with molecular markers [14,15]. The genetic maps facilitated the identification of quantitative trait loci (QTL)—the genomic region responsible for trait variation . In wheat, QTL mapping has been performed for traits including yield components , plant height , heat tolerance , grain quality , and disease tolerance [21–23], among others. The identification of QTL has led to the use of molecular markers in screening germplasm for trait improvement [24,25]. Bi-parental mapping is a powerful tool. However, the limited number of recombination events in bi-parental populations restricts the phenotypic diversity [16,26] and leads to a low mapping resolution .
The need to dissect complex traits within a large, diverse population led to the development of statistical methods that gave rise to genome-wide association studies (GWAS). Unlike bi-parental mapping, GWAS consists of genetically diverse germplasm that harbor many historical and ancestral recombination events. GWAS is based on the strength of linkage disequilibrium (LD) between the markers and the observable phenotypes in a population [27,28]. The statistical power to detect causal polymorphisms is based on the extension of LD in the population . Wheat, being a self-pollinating species, experiences relatively slow LD decay. Selection on wheat, as is practiced in breeding programs, leads to relatively slower rates of LD decay, as Liu et al.  displayed that the extent of high LD islands is much greater in cultivars (1053 kb) than landraces (785 kb) due to the effect of artificial selection.
GWAS has been used previously to study wheat kernel size and milling quality [31–33], spike traits , root traits , and grain yield and yield components traits [35,36]. These studies implemented the various GWAS mapping approaches such as mixed linear model (MLM)  and compressed mixed linear model (CMLM)  to appropriately account for the underlying population structure and kinship. Recent studies have shown that single locus models, such as MLM and CMLM, generate more false negatives due to overfitting [39,40]. The multi-locus Fixed and Random Model Circulating Probability Unification (FarmCPU) model was shown to better control false positives and false negatives [39,41], improving statistical power to identify true marker trait associations (MTAs).
In this study, our goal was to identify MTAs for yield and yield component traits in an elite SRW wheat population developed by eastern and midwestern public breeding programs. Previous work by Gaire et al.  in this population identified MTAs concerning SRW wheat end use quality traits in this population, but no work to date has explored yield related traits in the context of GWAS. We achieve this goal by field-based phenotyping and high-throughput genotyping.
Experimental design. The Triticeae Coordinated Agricultural Project (TCAP) population consists of lines developed from breeding programs in Illinois, Kentucky, Maryland, Missouri, New York, Ohio, Indiana, and Virginia. The pedigree of lines are detailed in Huang et al. . The germplasm were grown in two growing seasons 2016-17 (WL17) and 2017-18 (WL18) at the Purdue Agronomy Center for Research and Education (ACRE) in West Lafayette, IN (40.43° N, 86.99° W) after a previous soybean crop. Similar field layouts and germplasm were planted in both years. Trials were planted in late September and harvest in late June of the following year. The experiments were planted using a Hege (Wintersteiger, Australia) drill planter and harvested with a Wintersteiger plot harvester at physiological maturity. In each year, two replications were planted. Each replicate was a 13-row × 24-column layout, consisting of eight incomplete blocks, each accommodating 39 plots. Each plot measured 1.22 m × 1.22 m and we planted 20 grams seed per plot, which amounts to approximately plant density of 370–420 seeds per square meter. Before planting, 336 kg·ha−1 of mono-ammonium phosphate (11-52-0) was applied. A spring nitrogen top-dress application of 112 kg·ha−1 in the form of liquid urea ammonium nitrate (28-0-0) was applied as recommended by crop management practices in the region. Trials were rainfed and did not rely on any form of irrigation. Monthly precipitation and temperature obtained from iClimate (2019) are detailed in Supplementary Table S1.
Trait measurements. We measured grain yield (YLD), days to heading (HD), days to maturity (MD), thousand kernel weight (TKW), biomass (BIO), number of spikes per area (NS), number of grains per spike (GPS), grain weight per spike (GWS), and plant height (PH). YLD was measured at harvest, adjusted for 13% seed moisture, and was expressed as kg·ha−1. HD was determined by complete emergence of heads (Feekes 10.5, Zadoks 58) in more than 50% of individual plants in a plot and expressed as the number of days after January 1st. Similarly, MD was determined when more than 50% of plot reached physiological maturity (Feekes 11.3, Zadoks 91) and expressed as the number of days after January 1st. At maturity, PH was recorded by four random measurements per plot, from the ground to the top spikelet, excluding the awns, and expressed in centimeter (cm). Yield components were evaluated by measuring traits from an area of 0.25 m × 0.3048 m that was cut from the ground level after physiological maturity. First aboveground BIO was dried to constant weight, measured and expressed in grams (g). Next effective tiller numbers per unit area were counted from the cut sample and represented as number of spikes (NS). Then, five random spikes were randomly sampled from the total cut area to measure the number of grains per spike (GPS), and grain weight per spikes (GWS)—also expressed in grams. TKW was measured by counting and weighing 1000 kernels, which was expressed in grams.
Description of genotypic data. This population was initially genotyped by using the 90K SNP chip array , and the marker density was later increased by completing genotyping-by-sequencing method, as explained in Poland et al. . Briefly, reduced genomic libraries were created using Pst1-Msp1 restriction enzyme combination consistent with Poland et al. . The samples were pooled together at 96-plex to create libraries and each library was sequenced on a single lane of Illumina Hi-Seq 2500 (Illumina, San Diego, USA). Variant calling was performed using the TASSEL 5 GBSv2 pipeline  with 64 base k-mer length and minimum k-mer count of five. Reads were aligned to the wheat genome sequence assembly IWGSCv1.0 , using aln method of Burrows-Wheeler aligner (BWA) version 0.7.10 . For filtering of both 90K SNP chip array and GBS markers, we excluded any markers missing ≥10% data and those with minor allele frequency less than 0.05. We then used Linkage Disequilibrium K-number neighbor imputation (LD-kNNi) algorithm  implemented in TASSEL 5  to impute the missing markers. Markers that were not mapped to any specific chromosome were excluded from further analysis. The final genotypic dataset that was used in this study consisted of 45K variants of which 13K were produced from the 90K SNP chip array pipeline and 32K were produced from GBS pipeline.
Statistical analysis of phenotypic data. In order to test the significance of genotypes, year, and genotype × year interaction, analysis of variance (ANOVA) was performed in R environment . For each trait, the following ANOVA model was fitted:
Where the response variable is the observed phenotypic value of the ith genotype, in the jth year, in the kth replicate, and the lth incomplete block; is the overall mean, is the effect of the ith genotype, is the effect of the jth year, is the effect of the kth replicate within the jth year, is the interaction effect of the ith genotype by the jth year, and is the effect of the lth incomplete block within the kth replicate and the jth year. The represents the residual error.
To produce phenotypic values of each line for GWAS analysis, the best linear unbiased estimate (BLUE) values were derived by implementing a mixed model  using the “lme4” package  in R environment  in equation , where genotype was considered as fixed effect and other terms were considered as random effects. The Pearson correlation coefficient was calculated by cor function in R by using BLUE values. Path analysis was performed on BLUE values by using the latent variable analysis “lavaan” package  in R environment .
Estimating heritability estimates. Estimation of heritability based on experimental design requires a balanced design where all experimental entries are included in each replicate. Therefore, for producing variance components for estimating the broad sense heritability (H2), we used a reduced model as follows:
Where the response variable is the observed phenotypic value of the ith genotype, in the jth year, in the kth replicate, and the lth incomplete block; is the overall mean, is the effect of the ith genotype, is the effect of the jth year, is the effect of the kth replicate within the jth year, and is the interaction effect of the ith genotype by the jth year. The represents the residual error. In this model all terms were considered as random effect. The broad sense heritability (H2) on an entry-mean basis was estimated following the equation [52,53]:
where is the genetic variance, is the error variance, and y is the number of years (), and is the number of replications per year (.
Genome-wide association studies (GWAS). Principal component analysis (PCA) of marker data was used to visualize the underlying population structure. We used the first three principal components (PCs) to produce a 3D scatter plot. Pair-wise LD estimates between adjacent markers were calculated, as the squared coefficient of correlation (r2), using TASSEL 5  with a sliding window of 1000 markers. The pairwise LD estimates were plotted against the physical distance to determine the decay of LD against physical range on each chromosome, and in particular around the regions, where marker-trait associations were identified in GWAS. LD decay plots generated in R using the Hill and Weir  method and loess regression with assessment at r2 value of 0.2 [55,56].
GWAS was performed using the GAPIT software  in R for each trait using the Fixed and Random Model Circulating Probability Unification (FarmCPU) model  and first 3 PCs were used to control the population structure . We reported MTAs that were identified at −logP > 4.0 (p-value < 0.0001). If a genomic region was identified with multiple MTAs close to each other, we only report a representative MTA. We also identified MTAs that passed a 5% false discovery rate (FDR) for controlling multiple testing . The coefficient of determination (R2) for each identified MTA was determined by fitting a linear model in R environment with the contrasting alleles of the marker and the 3 PCs as the covariates using an ordinary least squares regression.
Transferability and validation of GWAS results. YLD and HD data obtained from trials conducted in previous years and other states during the Triticeae CAP project were used to validate the transferability in other environments of the MTAs we identified in Indiana. This data comes from diverse environments i.e., five different locations and two growing seasons 2011-12 and 2012-13, as described by Huang et al. . These environments are: moderate nitrogen in Kentucky 2011-12 (KYM12), moderate nitrogen in Maryland 2011-12 (MDM12), moderate nitrogen in Missouri 2011-12 (MOM12) and 2012-13 (MOM13), low nitrogen in Ohio 2011-12 (OWL12) and 2012-13 (OWL13), moderate nitrogen in Ohio 2011-12 (OWM12) and 2012-13 (OWM13), low nitrogen in Virginia 2011-12 (VAL12) and in 2012-13 (VAL13), and moderate nitrogen in Virginia 2011-12 (VAM12) and 2012-13 (VAM13). We abbreviated grain yield and heading date we obtained from our 2016-17 and 2017-18 seasons as WL17 and WL18. In total, we assembled data from 14 environments for validation and transferability examination. For WL17 and WL18 environments, we first accounted for incomplete block design and then included the data in the multi-environment data analysis. Multi-dimensional scaling and linear discriminant analysis were used to cluster environments into seemingly homogeneous groups based on YLD or HD data. Then the accuracy of grouping was examined by cross validation. The cmdscale function in R was used to perform multidimensional scaling with Euclidean distances extracted using the dist function. Eigenvalues from three dimensions were extracted and incorporated into the lda function in the MASS package  for linear discriminant analysis and cross validation by setting CV = TRUE. Upon confirmation of groupings, BLUEs were obtained for each homogeneous group following the same model  and GWAS analysis was completed for each homogeneous group. We considered a MTA as validated or transferable if identified with a −logP > 1.3 (p-value < 0.05) in another homogenous group of environments. We chose this threshold because for validation, we are only interested in one specific marker and there is no need to control for testing of multiple hypotheses.
Phenotypic analysis and relationship among traits. We evaluated grain yield and yield components of a soft red winter wheat population in West Lafayette, Indiana for two years. For all traits, the effect of genotype was significant at 0.001, indicating the presence of noticeable genetic variation in the germplasm. In addition, the effect of year, and replicate within years were significant at 0.001 for all traits except for GPS, where the effect of year, and rep within years were significant at 0.01. More importantly, the genotype × year interaction effect was significant at 0.001 for YLD, BIO, PH, HD, and MD, at 0.01 for NS, and at 0.05 for GWS, but not significant for TKW and GPS (Table 1). The significant effect of genotype × year interaction will be further discussed in the GWAS section.
Grain yield ranged from 3900 to 7500 kg·ha−1 with a mean of 5830 kg·ha−1 and heritability of 0.50 (Table 2). The top 10% highest yielding lines in the population averaged at 6940 kg·ha−1, while the 10% lowest yielding lines averaged 4650 kg·ha−1—a 1.5-fold difference. Not all of the 10% highest yielding entries were developed by one breeding program, indicating a potential for achieving genetic gains via germplasm exchange. Among these, high yielding lines developed from public breeding programs at Purdue (10 lines), Illinois (7 lines), Missouri (3 lines), Ohio State (2 lines), Kentucky (2 lines), and Maryland (2 lines) were identified.
The traits with the greatest and significant positive phenotypic correlation to YLD were BIO (r = 0.29 ***), TKW (r = 0.29 ***), and GWS (r = 0.29 ***) (Table 3). BIO had an average of 201 grams per cut area and heritability of 0.21. TKW ranged from 27.8 to 38.8 grams with a mean of 32.3 grams and heritability of 0.49 (Table 2). The three lines with the greatest kernel weight were MD04W249-11-7, 04702A1-18, and MD03W64-10-3 and the three lines with the smallest kernel weight were OH08-178-52, VA09W-188WS, and MO080584. However, looking at the top 10% high yielding entries, the range of thousand kernel weight was narrower (30-35 grams), and around the average value for kernel weight. Total grain number in wheat is the cumulative effect of spike number per unit area and the number of grains per spike. The NS per cut area ranged from 74 to 152 spikes. The lines with greatest number of spikes were OH08-172-42, TRIBUTE, and IL08-12174 and the lines with lowest number of spikes per measured area were INW1021, 0566A1-3-1-67, and 05251A1-1-136-9-5. GPS and GWS had a mean of ~37 grains per spike and 0.98 grams, respectively and similar heritability estimates (Table 2). A significant negative correlation (r = −0.22 ***) was observed between NS and GPS (Table 3). This negative correlation has been observed previously in multiple experiments [61,62]. PH had the highest heritability of 0.84, averaged at 90 cm, and showed a standard deviation of 6.2 cm. The tallest lines were CAYUGA, MO101329, and MO100647 while the shortest lines were 03207A1-7-3-1, 9346A1-2-5-5-2-1, and MD03W665-10-5. The height of the 10% shortest lines averaged 80 cm. Lastly, the HD and MD had a mean of 133 and 171 days, respectively, and were highly correlated with one another (r = 0.68 ***) (Table 3). Lines that headed later (>138 days) and matured later (>173 days) included NY103-208-7263, NY99066-3444, CAYUGA, NY96009-3037, and MEDINA, all varieties developed in New York and adapted to the eastern climate region. Both traits were significant and negatively correlated with YLD (r = −0.19 **, r = −0.18 **) (Table 3) as this relationship has been documented previously .
Path coefficient analysis. The correlation magnitudes were further broken down by using path analysis, following Dewey and Lu . Path analysis parses out the correlation magnitude to direct and indirect components of influence by measuring the direct influence of one variable upon another . In Supplementary Figure S1, the single arrow lines indicate direct influence as measured by path coefficients (PXX) and the indirect effects are the association between variables measured by correlation coefficients (rxx). The indirect effects are the product of the path coefficients and correlation coefficients. The sum of the path coefficients and indirect effects of correlation coefficients equal the phenotypic correlations, thus breaking down the reasoning for positive and negative correlations observed. In the a priori model, grain yield is directly affected by traits with significant phenotypic correlation (Table 3). These traits are thousand kernel weight, grain weight per spike, biomass, heading date, and maturity date. Biomass had the largest direct path coefficient of 0.27, followed by grain weight per spike and thousand kernel weight coefficients of 0.21, and 0.19, respectively. (Supplementary Table S2). The indirect effect of thousand kernel weight on grain weight per spike represents almost one-sixth of the phenotypic correlation (Table 3) and direct path coefficient between grain weight per spike and yield (Supplementary Table S2). Biomass and grain weight per spike are correlated (r = 0.22 ***; Table 3) and positively contribute to correlations with grain yield. Days to heading showed a negative direct effect on grain yield with path coefficient of −0.15 (Supplementary Figure S1), consistent with its negative correlation with grain yield (r = −0.19 **; Table 3). Similar patterns were observed for days to maturity.
Genome-wide association studies. The objectives of this study were to identify MTAs that control grain yield and other agronomic traits in this population in the Indiana environment and examine the transferability of MTA results across other environments. Of the 45K variants used in this study, approximately 17K, 22K, and 5.7K were located on sub-genome A, B, and D, respectively. The first three principal components (PCs) of all marker data explained 6.5%, 5.2%, and 3.8% of the total variation (Supplementary Figure S2). Consistent with the reports of Gaire et al.  and Huang et al. , PCs separated two distinct groups, which were previously attributed to whether germplasm is progeny, close relative, or descendants of the soft red winter wheat variety “Truman” or not . Linkage disequilibrium persisted variably across different chromosomes and the half decay distance (in base pairs) are presented in Supplementary Table S3 for each chromosome. For example, LD persisted the longest physical range on chromosomes 2B (~125 mega base pairs:Mbp) and 7D (109 Mbp). In contrast, chromosomes 5D (0.74 Mbp) and 6D (0.71 Mbp) displayed the fastest LD decay.
We used the first 3 PCs to account for the underlying population structure in GWAS analysis for all traits evaluated in West Lafayette, IN, USA. For GWAS we used estimates of phenotypic data based on two years of study i.e., WL17 and WL18, termed WL1718 throughout, and 45K genome-wide variants. In this study, we reported and discussed MTAs that were identified at –logP > 4.0 (p-value < 0.0001) threshold. A total of 62 MTAs were identified for eight traits in WL1718 except for NS on 20 chromosomes (all excluding 3D). Based on their physical distances and the LD decay, the 62 MTAs were resolved in 59 independent loci (Figure 1). Of the 59 loci, 11 passed the 5% FDR threshold for grain yield, days to heading, days to maturity, and plant height. Chromosome 3B showed the highest number of loci. Regions on chromosome 5A were found to be associated with four phenotypic traits including grain per spike, days to maturity, plant height, and thousand kernel weight (Figure 1; Table 4). Plant height showed maximum number of MTAs among traits. None of the MTAs were associated with multiple traits.
For YLD, eleven MTAs were reported on chromosomes 1A, 3B, 6A, 6B, 7A, 7B, and 7D (Figure 2). The MTA with the largest −logP value of 16.35 on chromosome 7D located at 633,027,374 base pairs (bp) explained 18% of phenotypic variation for grain yield. The next largest signal on chromosome 1A of −logP = 8.27 had allele effect of 174 kg·ha−1 (Table 4).
Five MTA were identified for GPS on 3B, 4D, 5A, 5B, and 7D (Table 4). These marker effects accounted for approximately 2 grain per spike and explained 4–7% of the phenotypic variation. One MTAs for GWS was found on chromosome 3B. Marker gbs_3A_739555657 explained 8% of the phenotypic variation and accounted for an allele effect of 63 milligrams of grain weight per spike (Table 4). Lastly, TKW had 7 MTAs on chromosomes 1D, 2A, 3B, 5A, 6A, and 6B (Table 4). The strongest signal for TKW was identified on 5A at position 685,795,509 bp. This region exerted an effect of 540 mg and covered 10% of total phenotypic variation. The next largest signal was observed at position 206,962,855 bp on chromosome 2B with an effect of 690 mg and phenotypic explanation of 8%.
For BIO, we identified 4 MTAs on chromosomes 1B, 3A, and 5D (Figure 2, Table 4). The largest signal for biomass was identified on chromosome 5D at position 365,732,020 bp with –logP of 5.65 that explained 9% of variation observed in biomass. The next large signal for biomass was −logP of 4.69 on chromosome 3A. Independent MTAs for BIO represented 4–9% of the phenotypic variation with positive allele effects between 8.00 and 13.41 grams.
Ten MTAs were identified for days to heading for WL1718 across nine chromosomes (Figure 2). Two MTAs on 7D had −logP values of 5.77 and 8.38 with allele effects of 0.74 and 0.58 earlier heading date, respectively. Ten MTAs were identified with –logP up to 9.41 for days to maturity (Table 4). The most significant signal was identified at 44,485,665 bp position of chromosome 2D, which explained 16% of variation. Eleven MTAs were identified with –logP up to 9.90 for PH (Figure 2). One marker on 6D explained 16% of the phenotypic variation for plant height and had a minor allele frequency of 0.07 (Table 4).
Transferability of GWAS results. We used existing YLD and HD data that were generated from the same germplasm in other states and seasons. Altogether, we assembled 14-environment datasets, of which WL17 and WL18 are from our field testing in Indiana. Linear discriminant analysis (LDA) on grain yield resulted in three homogeneous groups (Figure 3A) and on heading date resulted in four homogenous groups (Figure 3B). Strikingly, we observed that year-to-year variations resulted in different groupings in some cases (Supplementary Table S4). For example, for grain yield, LDA group 1 included WL17, KYM12, MDM12, MOM12, and MOM13, group 2 included WL18, OWL12, OWM12, VAL12, OWL13, and OWM13, and group 3 consisted of VAL13 and VAM13. We observed that for example, VAL12 and VAL13 are categorized in different groups (Figure 3A). Similar observation was true for WL17 and WL18. In addition, we noticed that groupings were different for grain yield and heading date. LDA for grain yield and heading date had a percent separation above 87% for each discriminant function and cross-validation confirmed successful separation of environments.
We performed GWAS for YLD and HD based on phenotypic observations from four environments: WL1718, Group 1, Group 2, and Group 3. In Group 1, twelve MTAs were identified in chromosomes 1B, 2B, 5A, 5B, 6A, 6B, 7A, 7B, and 7D. Three MTAs were present on chromosome 6B and two MTAs on 7A. For Group 3, eight MTAs were identified on chromosomes 3B, 5B, and 7B, however, applying the same standard for markers in LD as above, resolved to five independent MTAs. No MTAs were identified in Group 2. When we compared YLD signals among the three homogenous groups, there was not any MTA identified in more than one group, indicating that QTL are specific to each group.
A total of 28 independent MTAs were identified across environmental groupings for YLD but we only noticed seven MTAs that were identified in at least two environments which are indicative of transferability across environments. Two of these MTAs are located on chromosomes 6B and 7D. The MTA on chromosome 6B for YLD at position 73,187,805 bp was identified in WL1718, Group 1, and Group 3 environments with −logP of 4.07, 7.75, and 2.38, respectively (Table 5). The marker effect for this validated MTA showed an effect size of 238–250 kg·ha−1 across environments. The MTA on chromosome 7D for YLD is at 633,027,374 bp, and was identified in WL1718, Group 1, and Group 3 environments with −logP of 16.35, 20.87, and 1.64, respectively (Table 5). The marker effect of this MTA was approximately 492 kg·ha−1 in WL1718, 393 kg·ha−1 in Group 1, and 184 kg·ha−1 in Group 3 (Table 5).
For HD, LDA grouping clustered environments into four groups. LDA group 1 included WL17, WL18, MOM13, OWL13, and OWM13, group 2 included MOM12, OWL12, OWM12, VAL13, and VAM13, and group 3 consisted of MDM12, VAL12, and VAM12 (Supplementary Table S4). KYM12 was a singleton Group 4, with no other group member (Figure 3B), and was left out of the analysis. GWAS was performed for these three groups. Group 1 had 35 MTAs that were grouped into 26 independent loci. Eleven of these loci were located on chromosome 7A and five were located on chromosome 7D. For Group 2, eleven MTAs were identified on chromosomes 1A, 1B, 3A, 4B, 5B, 6A, 6B, 7A, 7B, and 7D. Lastly, Group 3 did not show any significant MTAs for HD. When we compared HD signals among the three homogenous groups, only one MTA, marker 7D_301325415 on chromosome 7D, was present in more than one group (Table 5).
A total of 47 MTAs were detected for heading date across environments but we only noticed eight MTAs that were identified in at least two environments which are indicative of transferability across environments. These MTAs were identified on chromosomes 2B, 3A, 3B, 4B, 5B, 7A, and 7D (Table 5). For HD, one marker from the SNP chip array, IWB34502 located at 553,613,770 bp on chromosome 2B was associated with days to heading (Table 5), in WL1718, Group 1, and Group 2 environments with allele effect of 0.44, 0.29, and 0.30 days, respectively. A marker with similar effects in the same environments was identified at 690,860,911 bp on chromosome 7A with a −logP of 4.86, 9.10, and 2.23 in WL1718, Group 1, and Group 2, respectively (Table 5). Chromosome 7D contained two markers significant for days to heading. The positive allele associated with this marker (301,325,415 bp) on 7D showed effect sizes of 0.74, 0.68, and 0.98 days for WL1718, Group 1, and Group 2, respectively (Table 5). The marker at position 58,927,880 bp on chromosome 7D was found to be associated with heading date in environment WL1718, Group 1, and Group 2 with −logP of 8.38, 4.44, and 1.48 (Table 5).
For YLD, seven out of 28 MTAs and for HD, eight out of 47 MTAs were found to be transferable across seemingly homogenous environments. Therefore, we concluded that not all marker-trait associations are transferable and MTAs are often environment specific.
In this study, we analyzed associations between genotypes and phenotypes in a US SRW wheat elite population, consisting of breeding lines that were developed by breeding programs in the Midwest and east. Marker-trait associations for this population have been previously identified for Fusarium head blight , days to heading , and grain quality  from plants grown in Ohio and Virginia. We dissected the genetic architecture of grain yield and related traits based on phenotypes observed in Indiana. In addition, we examined the transferability of SNPs across environments for the traits of YLD and HD.
Phenotypic correlations among traits and deciphering their relationship can give insight into identifying selection criteria for improving traits of interest. Our study showed that grain components including TKW, BIO, and GWS were significantly and positively correlated with YLD. Previous studies have documented positive relationship between TKW and YLD as well [67,68]. In wheat breeding research, biomass is often referred to as the whole aboveground plant parts. The pre-Green Revolution wheat germplasm were tall, and their height was the driver of plant aboveground weight. Therefore, during the Green Revolution the main force that led to increases in harvest index and productivity was only reducing plant height. In this population, although variation in biomass was observed, we think that in this era a “useful biomass” is one that can lead to non-competing multiple well-grown culms (tillers) with the potential to lead to a fertile spike. Increasing tiller numbers or protecting tillers in soft red winter wheat is one approach that can produce useful biomass. Our data showed that NS and BIO were significantly correlated (r = 0.47) and that NS is distributed in a wide range from 74 to 152. For example, the varieties OHO8-172-42, IL08-12174, MD05W1292-11-1, 05264A1-1-3-2, and IL07-20728 showed averages above 240 grams for BIO and 134 NS. Other traits that can lead to useful biomass are smaller leaves with enhanced photosynthetic capacity and the levels of spike fertility, among others.
While TKW, BIO, and GWS showed positive correlation with YLD, the duration of vegetative growth period, indicated by days to heading (and similarly days to maturity) negatively correlated with YLD. Similar negative correlation was reported by Addison et al. . Addison et al.  noticed this trend in a SRW wheat recombinant inbred line (RIL) population across nine environments in the southern US, with the population segregating for photoperiod and vernalization loci. Grain number is the main driver of grain yield but no correlation was observed. This is a population of elite lines therefore; loci influencing traits relating to grain number could be potentially fixed in the population of elite germplasm.
There are reports in the literature that shows positive correlations between days to heading and grain yield, , especially under cooler temperatures for hard red spring wheat . The primary reason for the observed negative correlation between days to heading and yield in this population could be that most of the late heading germplasm were developed by and adapted to the state of New York. Therefore, a hidden G × E interaction works contrary to the yield formation. Path analysis affirms the consequence of heading later is indirectly decreasing grain development as observed in thousand kernel weight and grain yield. Therefore, a practical consideration for future characterization of populations that are mixture of germplasm from multiple crop breeding programs is that the experimenters can use days to heading as biomarkers because a the shift in phenology could mask yield traits. When a drastic change between native germplasm and others is observed, yield differences are likely expected. Tessmann et al.  used QTL markers for plant height, vernalization, and photoperiod genes along with the actual heading date trait as covariates in the GWAS model to account for the latitude differences. This method is also routinely performed for maize but including flowering time (days to anthesis) as covariates [72,73].
One major concern in GWAS discoveries is marker density. Wheat is a self-pollinated crop and the germplasm has been under selection. Therefore, in the beginning of the experiment, 45K markers seemed unnecessarily dense. We found evidence to the contrary. Significant MTAs with SNPs from GBS were 32 for yield components measured in WL1718 and 10 multi-environment MTAs for GY and HD. In contrast, markers from the SNP chip array contributed 8 MTA for yield components and 5 multi-environment MTA for GY and HD. This data indicates that MTAs were identified from both sets of SNP markers. In addition, we examined the inter-marker spaces for SNP chip markers located between two GBS markers. For example, the SNP IWB72708 (identified for biomass at –logP = 4.36) is located 897,950 bp downstream of gbs_1B_472631187 while 296,024 bp upstream of gbs_1B_473,825,161. The SNP IWB51951 (identified for plant height at –logP = 5.01) is located 458,542 bp downstream of gbs_2A_92,338,766 while 1,095,722 bp upstream of gbs_2A_93,893,030. While these distances must be judged based on the local LD decay rates in each region, our conclusion is that the 45K marker set, combined from chip array and GBS methods, is not in excess for this germplasm and the combination of both marker sets can be complementary in GWAS applications.
FarmCPU is a multiple loci linear mixed model that eliminates confounding effects between markers and kinship by iterating between both fixed and random effect models. In the fixed effect model, individual SNPs are tested while using pseudo-QTNs as covariates to control false positives. The FarmCPU model controls false positives, false negatives, and provides greater statistical power than alternative models used for association mapping [39,41]. Based on quantile-quantile (Q-Q) plots, FarmCPU effectively controlled false positives and false negatives based on the population structure and significant associations (Supplementary Figure S3). The Q-Q plot line holds close to the 1:1 line of expected versus observes association probabilities, with a slight upward tail indicating deviation from expected distribution. A deviation in the tail area indicates properly controlling false positives and false negatives, where any inflation line upward would indicate false positives or downward indicate false negatives . Other researchers reported similar claims. Xu et al.  and Vanous et al.  concluded the multi-locus model of FarmCPU provided more statistical power than single locus models with less over or under fitting. The FarmCPU model identifies the most significant single SNP at a specific genomic location instead of a large peak of SNPs with other MLM models .
In the target environment of Indiana, several loci affected yield and yield components traits were identified. Germplasm were also identified that harbor those favorable alleles. The 17 lines that harbored the favorable yield QTL for the region on 7D were all developed by Purdue’s small grains breeding program. It is possible that all 17 of these lines contain a 7E translocation for resistance to barley yellow dwarf and cereal yellow dwarf virus, and are descendants of the Purdue line “P107” . However, we could identify the 7E translocation harboring line in the pedigrees of only 11 out of the 17 lines. This translocation could explain the slow LD decay rate in over 100 Mbp on chromosome 7D.
QTL expressed in one environment may not equally or ever be expressed in other environments. To a large degree, this can be associated with the key environmental clues that are critical regulatory event for the mode of action and expression of traits and QTL. For example, if the mode of action of a growth QTL is via tiller development before winter that are later on sensitive to freezing temperature, then two environments differing in winter temperature would results in different number of tillers that are counted in the spring. Therefore, QTL could go unnoticed in the colder environment. Similar examples can be given for kernel weight QTL expression under two hot and mild grain-fill period temperatures. Such QTL by environment interaction effects can vary depending on the location and specific year. To identify stable QTLs, GWAS on the basis of combined analysis of years and locations is suggested, which is often known as multi-environment GWAS [36,77] for future QTL implementation in marker assisted selection [78,79]. However, our results showed that majority of MTAs are environment specific. Even when we contained GWAS analysis within homogenously environments, the majority of MTAs we identified in WL site for YLD and HD and were not observed in other environments. Even when markers were significant across environments, there was differences for phenotypic variations that was explained by each marker and the size of marker effect. For some traits such as grain yield the magnitude of variance component due to G × Y was 20% greater than the magnitude of variance component due to G. Since winter wheat is grown over nine months, variation in climate and weather can directly impact the year to year variability and effect of the environment. For example, the WL site in 2017 showed significantly higher monthly temperature than WL18 site from February through April (Supplementary Table S1), which is a critical time in winter wheat development. With the increase in temperature, the vernalization period for 2017 was shorter than 2018, resulting in a decrease in yield. This could be one potential reason for the difference in classifying the WL17 and WL18 site into different groups for YLD. Previous work is a mix of success and failure in the transferability of QTL across environments. Guan et al.  identified 226 QTL controlling yield component traits and heat susceptibility in a Chinese elite double haploid winter wheat population. Across the 12 environments in northern China, only 39 of these QTL were deemed “stable” based on detection in at least three individual environments. Further explanation could be the significant source of variance based on effect of environment and effect of genotype by environment on all measured traits. In the United Kingdom, a double haploid population was developed from favorable bread making hexaploid winter wheat cultivars to detect QTL controlling yield variation. The population was evaluated and phenotyped at five field trials across multiple years in England, Scotland, Germany, and France. Two QTL were mapped on chromosome 6B for grain size and yield, Qtgw-jic.6A and Qyld-jic.6A, that were stable across nine of the twelve environments . These favorable QTL validated with near isogenic lines displayed improvements of 5.5% and 5.1% for grain yield and grain weight.
Seeking stable QTLs for yield determining traits may not be the most thoughtful approach to improve stability and genetic gains for wheat breeding. QTL transferability is challenging, and we suggest proceeding with caution to identify QTLs across multiple environments. In our case, detecting MTAs in homogenous environments showed minimal opportunities for making progress across regions or even or for developing biomarkers for marker assisted selection. We suggest performing GWAS and evaluating MTAs in the targeted breeding environment. The ability to utilize past data is powerful for predictability and examining transferability, however, the effect of the environment could be the leading issue in non-transferable QTLs controlling significant MTAs.
The following supplementary materials are available online at https://doi.org/10.20900/cbgg20200013:
Supplementary Table S1. Monthly precipitation and temperature in West Lafayette, Indiana, for the two cropping seasons of the study;
Supplementary Table S2. Path coefficient analysis;
Supplementary Table S3. LD decay half distance per chromosome and genome;
Supplementary Table S4. Grouping of environments from linear discriminant analysis for GWAS based on grain yield and days to heading;
Supplementary Figure S1. Path coefficient analysis diagram;
Supplementary Figure S2. Principal component analysis based on all SNPs for 270 lines. 3D scatterplot of first three principal components (PCs);
Supplementary Figure S3. Q-Q plots from Manhattan plots of phenotypic traits from Figure 2.
BR and MM planned and designed the research. BR collected field measurements and analyzed the data. CHS provided seed and additional datasets. GBG performed genotyping. BR and MM wrote, edited, and revised the manuscript. All authors read and approved the manuscript.
The authors declare that there is no conflict of interest. This work is a chapter of Dr. Blake Russell’s PhD dissertation at Purdue University.
Financial support from USDA Hatch grant 1013073 via Purdue College of Agriculture to MM is greatly appreciated.
We thank Purdue’s Agronomy Farm and James Beaty for land preparation, weed control, and Jason Adams for planting assistance. For past data implemented, the authors are thankful to all researchers under the Triticeae CAP project, funded by NIFA AFRI Grant no. 2011-68002-30029 from the USDA National Institute of Food and Agriculture, who created the grain yield and days to heading validation datasets in multiple states. We also thank the Eastern Regional Small Grains Genotyping Lab for genotyping of germplasm.
Russell B, Brown-Guedira G, Sneller CH, Mohammadi M. Transferability of Marker Trait Associations in Wheat Is Disturbed Mainly by Genotype × Year Interaction. Crop Breed Genet Genom. 2020;2(3):e200013. https://doi.org/10.20900/cbgg20200013