Sign in Register Submit Manuscript

Hapres Home

Location: Home >> Detail

Crop Breed Genet Genom. 2020;2(4):e200016.


Potential of Genomic Selection and Integrating “Omics” Data for Disease Evaluation in Wheat

Jemanesh K. Haile 1,* , Amidou N’Diaye 1, Ehsan Sari 1, Sean Walkowiak 2, Jessica E. Rutkoski 3, Hadley R. Kutcher 1, Curtis J. Pozniak 1,*

1 Crop Development Centre, Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada

2 Grain Research Laboratory, Canadian Grain Commission, Winnipeg, MB R3C 3G8, Canada

3 Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

* Correspondence: Jemanesh K. Haile, Tel.: +1-306-966-2430; Curtis J. Pozniak, Tel.: +1-306-966-2361.

Received: 27 May 2020; Accepted: 12 October 2020; Published: 27 October 2020

This article belongs to the Virtual Special Issue "Genomic Selection in Wheat"


Diseases are among the most important limiting factors for wheat production. Breeding for fungal diseases of wheat, primarily for rusts and Fusarium head blight (FHB), are major resource consuming activities in most breeding programs which prevent breeders from focusing entirely on improving yield. Breeding for these diseases is challenging because resistance is inherited mostly in a quantitative fashion and is greatly influenced by weather conditions. Recent advances in genomics, phenomics and big-data analysis provide opportunities for accelerating the development of low-cost and efficient selection methods for such complex traits. Genomic selection (GS) may provide opportunities for reducing the time and cost of making selections. By appropriately integrating GS in the breeding workflow, it is possible to select new parents purely based on genomic estimated breeding values before breeding materials are entered into nurseries and field trials. Due to reduced selection cycle time, annual genetic gain for GS is predicted to be two to threefold greater than for a conventional phenotypic selection program. In this paper, we review the recent GS studies focusing on the prediction of resistance to rusts and FHB including those that benefits from modeling multiple phenological traits correlated with the resistance. In addition, we discuss the potential of integrating phenomics and machine learning for evaluating plant disease and the integration of multiple “omics” data in genomic prediction to improve the applicability of GS for disease resistance breeding in wheat.

KEYWORDS: disease resistance; genomic selection; genetic gain; genotyping; machine learning; “omics” data; phenomics; wheat


There is a growing need to invest in crop improvement to ensure food security for the future, which is challenged by an ever-increasing global population, climate change, extreme weather phenomena and the unsustainable use of natural resources. Based on the United Nations projection, the global human population would be 9.7 billion by 2050, 10.8 billion by 2080, and 11.2 billion by 2100 [1]. Plant breeders and scientists are under pressure to improve crops to be higher yielding, more nutritious, pest- and disease-resistant and climate-smart [2]. A shift in global temperatures and other climatic conditions will results in various changes in wheat diseases, including pathogen populations, which will mean that breeders will need to continuously adapt crops to combat these diseases [3].

Wheat is one of the most important cereals in the world and plays a vital role in addressing food security [4]. Diseases are among the most important limiting factors that affect wheat production. There are a number of wheat diseases and insects that cause significant crop loss and result in an increased input costs for farmers. Three rust pathogens: Puccinia triticina (leaf rust), Puccinia striiformis f. sp. tritici (stripe rust), and Puccinia graminis f. sp. tritici (stem rust) are among the most damaging pathogens and have caused massive losses to wheat production in some areas [5–11]. Each of these pathogens can cause yield losses of up to 50% or more during severe epidemics and when environmental conditions are favorable [12,13]. Fusarium spp. that cause Fusarium head blight (FHB) in wheat are also challenging pathogens for wheat production, as they penalize both grain yield and quality, and contaminate grains with mycotoxins such as trichothecene deoxynivalenol (DON) [14].

Importance of Rust and FHB Resistance

Improving disease resistance in wheat is very important as it also improves yield, quality and even some agronomic traits. Rust pathogens have hindered global wheat production since the domestication of the crop and continue to threaten the world’s wheat supply [15]. Leaf rust is a problematic disease because the pathogen displays high diversity; there is a constant emergence of new races and the pathogen exhibits high adaptability to a wide range of climates [9,16,17]. Similarly, in recent years, stem rust has re-emerged as a concern as new physiological races have evolved in Puccinia graminis f. sp. tritici population, demonstrating the vulnerability of broadly grown wheat cultivars with limited number of major rust resistance genes across the globe [18–20]. Stem rust has the capacity to destroy millions of hectares of healthy, high-yielding wheat in less than a month by reducing fields to a mass of bare, broken stalks supporting only small, shriveled grains by harvest time [21]. In addition, fungicide treatment against stem rust is very hard to apply because it would require farmers to drive through their fields after flowering has occurred with potential damage to their yields. Stripe rust, also known as yellow rust, occurs around the world in environments where growing-season conditions are humid and cool or at high altitude areas with warm day and cooler night temperatures [5]. However, strains of stripe rust have recently developed with a broader range of temperature adaptation [22]. The pathogen is highly variable, affecting the durability of resistance.

Breeding resistant cultivars is an important component of an integrated FHB management strategy. Resistance to FHB is quantitative, requiring a quantitative approach for evaluation and analysis. Genetic studies conducted over the last decade have identified over 500 FHB resistance QTL on all wheat chromosomes [23–25]. Fhb1, on chromosome 3B is the most consistently reported QTL for FHB resistance breeding from Chinese wheat cultivar Sumai 3. The resistance genes within the Fnb1 have been cloned [26,27]Fhb1 has provided by far the strongest level of disease severity reduction ranging between 20% and 25% [28]. Low frequency of resistance alleles in elite wheat breeding parents and concerns about the detrimental effect of linkage drag has limited the utilization of Fhb1 in breeding programs [29,30]. Recently another FHB resistance gene transferred from Thinopyrum to wheat, Fhb7, has been cloned and its resistance mechanisms has been characterized [27]. Fhb7 resistance differs from Fhb1 resistance, which depends on a reduction of pathogen growth in spikes, although both confer durable resistance [27]. The ability of Fhb7 to detoxify multiple mycotoxins produced by various Fusarium species demonstrates its potential as a source of resistance to the various diseases for which Fusarium trichothecenes are virulence factors [27]. A previous study proposed an additive effect of FHB resistance QTL [31], implicating the feasibility of improving FHB resistance by combining minor effect QTL. Phenotyping over multiple environments is routinely conducted to identify superior FHB resistant germplasm. Phenotypic selection has been successful in spite of interactions between FHB resistance loci and the environment, and the unfavorable association of FHB resistance with agronomic traits such as plant height (PH) and maturity [32–35].

Selection for Resistance to Rust and FHB

Selection for resistance to rust and FHB in wheat is resource demanding and diverts breeding resources away from other priority traits, including yield. Each breeder needs to make the strategic decision of which disease resistance to target, keeping in mind that each additional trait will ultimately reduce the selection intensity (i.e., the chances of success) for other traits, when assuming fixed population size or limited budget [36]. Currently, phenotyping rusts and FHB requires observation of visible symptoms and screening of hundreds or thousands of lines to identify resistant plants, which is a costly and labor intensive process. The time constraints are also prohibitive if the window of opportunity for phenotyping is narrow. Moreover, conventional phenotyping approaches tend to have high experimental errors due to inaccurate or subjective visual assessments.

Rusts and FHB are challenging diseases to improve because resistance is inherited in a quantitative fashion and is greatly influenced by environmental conditions. Current advances in genomics and bioinformatics provide opportunities for accelerating the development of efficient and low-cost genomic selection methods for such complex traits [37–39]. In addition, developing high-throuput phenotyping techniques combined with the power of machine learning (ML) would improve the efficiency of disease assessment in field and is integral to the sucess of GS.

Genomic Selection

Genomic selection has been considered one of the key post-1990 technologies utilized in plant improvement, along with transgenic cultivars, QTL mapping, association mapping, phenomics, envirotyping, genome editing, sequencing, and doubled haploid production [40,41]. In GS, a training population is genotyped with genome-wide markers and phenotyped for the trait under selection. GS models are then trained with the marker and phenotype data, and the model is used to predict the breeding value of new set of individuals (selection candidates)  that have been genotyped but have not been phenotyped. Unlike traditional  marker assisted selection (MAS), which uses a small number of markers associated with major QTL, GS uses genome-wide markers with phenotyping data to calculate (GEBVs) in one population that will predict the performance of lines in another population only using markers [42]. This avoids multiple testing and the need to identify marker-trait associations based on an arbitrarily chosen significance threshold. Studies indicate that GS outperforms traditional marker-assisted selection for complex traits controlled by many minor effect QTL with low heritability [43–47]. If adequately integrated into the breeding workflow GS can partially replace field testing and therefore reduces line development time [46].

Genomic selection has been well established in the field of animal breeding, but many plant breeding programs worldwide are still evaluating the optimal strategy and stage for implementation in a breeding program. Wheat breeding programs typically require 10–15 years to transfer novel genes into elite germplasm. By application of GS, it is possible to select new parents purely based on GEBV before being entered in field trials and nurseries [48–50]. Because of reduced selection cycle time, annual genetic gain for GS is predicted to be two to threefold greater than for a conventional phenotypic selection program [46,51–58]. However, there is still limited information on the application of GS for improving disease resistance in wheat. 

The earliest review by Rutkoski et al. [59] addressed the implementation of GS for adult plant stem rust resistance in wheat and later Poland and Rutkoski [60] reviewed GS studies for diseases resistance published until 2015. Thus, in this review, we discuss the recent methods and studies reported between 2016 and 2020 about (a) GS for resistance to rusts and FHB, (b) GS for multiple correlated traits, which may be useful for breeding for disease resistance, (c) the application of phenomics and ML to evaluate plant disease, and (d) advances in genotyping and the application of other “omics” technologies in GS to predict disease resistance in wheat.


Resistance to wheat rusts generally falls into two categories: (i) all stage resistance, which is often conferred by race-specific resistance genes (R genes) involved in pathogen recognition and associated with a hypersensitive response, and (ii) slow rusting adult plant resistance (APR), which is quantitative resistance often conferred by multiple loci, and is not associated with a hypersensitive response. R genes protect the plant from seedling to adult growth stages whereas APR genes function mainly at the adult stage [61]. Quantitative disease resistance is more durable than qualitative resistance conferred by R genes [59,62,63]. Phenotyping APR in large populations is expensive and labor intensive, as it requires conducting both seedling and adult plant screening. Resistance to FHB in wheat is inherited quantitatively and strongly influenced by the environment [23]. In general, breeding for quantitative disease resistance is a challenge because of the low heritability and high genotype × environment interaction, emphasizing the importance of devising strategies for more effective evaluation and exploitation of this resistance [64].

Marker assisted selection is useful for major effect QTL, but for FHB and rust resistance the individual QTL often have small effects. Additionally, only a few monogenic rust resistances are durable and only a few rust and FHB QTL with large effects have been successfully transferred into elite breeding material [36]. Further constraints like lack of diagnostic markers and the prevalence of QTL–background effects hinder the broad implementation of MAS [36]. GS is a promising approach that can potentially accelerate breeding for quantitative resistance by providing accurate predictions of resistance levels, reducing time to parental selection and increasing genetic gain from selection. GS will also open new avenues for molecular based resistance breeding by capturing more of the variation due to small effect QTL [39,65]. This makes GS well suited for rust and FHB resistance breeding. To achieve even greater gains, multiple traits can be simultaneously targeted for GS [2] including morphological traits correlated with disease resistance. Selection strategies which combine disease resistance with other traits offer efficient use of resources by assaying multiple traits on the same set of plants.

Strategies for Improving GS Prediction Accuracy

Several different strategies have been tested and reported to increase GS prediction accuracies. Some of them are: combining pedigrees and markers [66], applying GS models that account for interactions between genotype and environment [67], incorporating additional secondary traits [56], and incorporating additional genomic and/or biological information, such as that revealed in a genome wide association study (GWAS), into the GS model [68], termed GS + de novo GWAS. Combining pedigree with markers for prediction has been shown to improve accuracy compared to prediction based on either pedigree or the markers alone. Juliana et al. [64] found that combining marker and pedigree-based relationship matrices lead to the highest GS accuracies for APR for all three rusts of wheat. In the GS + de novo GWAS approach, significant markers identified by GWAS were included as fixed effects in the GS model and removed from the matrix of random effects. Besides enhancing prediction accuracy, GS + GWAS does not require additional data because the same phenotypic and genotypic data set is used, and it can be more accessible to breeders as it does not require extensive knowledge of the underlying genetics of a trait of interest [68]. The benefits of integrating GWAS with GS to further improve the accuracy of GS in wheat are confirmed for rusts [69,70], Septoria tritici blotch [71,72], and yield [73]. Particularly, Daetwyler et al. [69] and Rutkoski et al. [70] demonstrated the advantage of including markers linked to large to moderate effect genes or loci previously found to affect the traits of interest. On the other hand, according to Arruda et al. [74], treating random SNPs as fixed effects, reduced prediction accuracy.

Another strategy is the application of GS on landraces stored in genebanks to obtain GEBVs for economically important traits by training models on a subset of phenotyped landraces [75]. Muleta et al. [76] have also shown the feasibility of this approach by using empirical data collected for adult plant resistance to stripe rust from 1163 spring wheat accessions and suggested that genomic prediction can provide a promising global strategy for mining useful alleles from crop germplasm collections. In addition, the results of this study showed promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. The application of GS for selected bulk and recurrent selection methods and backcrossing as possible breeding schemes to enhance rust resistance of wheat is well explained [59]. Despite the availability of a large number of wheat wild relatives and landraces in genebanks, their utilization has been impeded largely due to limited phenotyping data. GS can significantly contribute to mobilizing the genetic variation within the non-adapted germplasm through accurate predicting of FHB and rust resistance phenotype.

As durable resistance needs the effective combinations of major and minor genes [77,78], the integration of MAS and GS for selection of both is reasonable for enhancing disease resistance germplasm. Cerrudo et al. [44] proposed the use of QTL based MAS for forward breeding to enrich the allelic frequency of traits with large additive effect QTL in early selection cycles, while GS could be used in more advanced breeding cycles to capture additional alleles with smaller additive effects. Extensive deployment of large-effect rust resistance genes/QTL in resistant cultivars imposes strong selection pressure [79], on the pathogen population which can lead to pathogen virulence shifts or mutations [80]. Enhancing quantitative rust resistance in wheat using GS is hence highly desired.

Genomic Selection for Rust Resistance in Wheat

The potential for increased genetic gain for rust resistance in wheat through GS has been recognized [56,64,69,70,76,78,81]. There is still limited information on the application of GS to exploit disease resistance from exotic or uncharacterized germplasm from gene banks, however, most GS studies have been based on bi-parental and multi-family breeding populations.

Among the few studies that have shown the feasibility of GS to predict rust resistance in wheat, Juliana et al. [64] achieved mean genomic prediction accuracies ranging from 0.12–0.56 for leaf rust (LR), 0.31–0.65 for stem rust (SR), and 0.34–0.71 for stripe rust (YR). They examined adult plant resistance in a population of 333 and 314 advanced lines from the Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT) wheat breeding program. Their results indicated that using genome-wide marker-based models resulted in an average of a 42% increase in accuracy over the least-squares approach, which involves an initial marker ranking and selection step. This indicated that GS was a promising approach for improvement of quantitative rust resistance in the breeding pipeline.

Using a set of 365 advanced CIMMYT wheat data for quantitative APR to SR, Rutkoski et al. [81] indicate how historical data could be used to successfully initiate a GS program for resistance breeding. They used a second population of 503 new selection candidates (SCs) which was generated by two rounds of random mating between 14 founder lines from the historical population, followed by one round of selfing for seed increase. They have evaluated these individuals for quantitative APR to stem rust and genotyped using genotyping-by-sequencing approaches and analyzed using GBLUP. A training population taken from SCs and formed from historical population was compared by taking a subset of lines from SCs as a validation population. Their results showed that lower accuracy was obtained when retaining historical data especially when the heritability of the historical data was low, the heritability of the close relative training data was high, and the observations were not weighted properly according to heritability. This has implications for prediction model updating. In a selection program, it may be better to discard historical data and simply use the most recent data for model training. However, when to discard training data will need to be determined empirically because it will depend on the selection intensity of the breeding program, the availability of data on close relatives, and quality of the historical data [81].

Muleta et al. [76] used empirical data for APR to YR collected on 1163 spring wheat accessions and genotypic data based on the wheat 9K Single nucleotide polymorphism (SNP) iSelect assay to estimate GEBVs for stripe rust resistance under scenarios of different population sizes, degrees of genetic relatedness within a population, and marker densities from multi-environment field trials. According to their results, larger germplasm collections may be efficiently sampled via lower-density genotyping methods, whereas genetic relationships between the training and validation populations remain critical when exploiting GS to select for resistance to YR from germplasm collections. In addition, this study revealed that GS could provide an efficient and cost-effective sampling strategy of unlocking the potential of wheat genetic resources and accelerating the rate of genetic gain in wheat breeding programs. Examples of GS studies reported for SR, YR and LR resistance in wheat after 2015 including information on the training and test population, the GS models used, and the accuracy of the prediction is presented in Table 1.

Table 1. Examples of GS studies of rust resistance in wheat.
Genomic Selection for FHB Resistance in Wheat

Phenotyping for FHB is laborious and expensive, requiring the preparation of large amounts of inoculum and establishing mist irrigation in additional to general crop management practices. Phenotyping for mycotoxin is only conducted after harvest; is expensive and labor intensive, and weakly correlated with visual assessments of FHB resistance [82]. Considering these complexities, developing reliable markers for marker-assisted selection (MAS) is highly desirable. However, the implementation of MAS for FHB is deemed only partially effective due to the complex genetic architecture [83]. GS models can enhance selection capacity at early breeding cycles when FHB phenotyping is impractical due to the large population size and low number of seeds. The application of GS allows the effective utilization of limited FHB nursery capacity for evaluating the most promising breeding materials, hence accelerating the release of resistant cultivars.

Most GS studies of FHB resistance used ridge regression-best unbiased linear prediction (rrBLUP). This is an infinitesimal model with all markers sharing a common variance, and all effects are shrunken toward zero [84]. When major genes are present, this model underestimates the genetic variance. Alternative models that account for the effect of major genes are Bayesian [85], least absolute shrinkage and selection operator (LASSO) [86] and the elastic net [87] that combines LASSO and rrBLUP strengths in a single model. Both Bayesian and LASSO model were previously used for GS of FHB by Rutkoski et al. [88] however, none of them provided higher prediction accuracy over rrBLUP. Arruda et al. [48] suggested that rrBLUP outperform LASSO and elastic net in a GS study of FHB in a population consisting of soft red winter wheat lines from midwestern and eastern United States. Multiple studies also used genomic best linear unbiased prediction (GBLUP) which uses genomic relationship to estimate phenotype and is the most basic GS model [89]. GBLUP has been successfully used by three independent GS studies of FHB since 2016 [83,90,91] (Table 2). Given the contribution of several minor effect genes to FHB resistance, rrBLUP is therefore the most common model advised for GS of FHB, and other models that consider marker effects such as LASSO and Bayesian are less common. An additional drawback of LASSO and Bayesian models is that they are very computationally demanding [74].

An alternative approach that is often used to improve rrBLUP prediction is to identify FHB resistance QTL using GWAS and treat them as fixed effects in the model. For example, Arruda et al. [74] reported up to 15% improvement in prediction accuracy after combining the FHB resistance QTL into the rrBLUP model as fixed effects. For the training population, it seems beneficial to conduct a GWAS to identify QTL and combine them into rrBLUP model as fixed effect. However, this may introduce an artifact if the entire population, including the training and validation set, is used to identify QTL. In a realistic scenario, the validation set does not have phenotypic data and cannot be used for QTL detection. Using data from the validation set to help improve prediction accuracy is an example of “data snooping”. In certain cases, the data snooping can make MAS appear more effective than GS as suggested for some FHB traits measured in 273 soft red winter wheat lines from the US Midwestern and Eastern regions [74].

Several studies have shown that phenotypic selection is more accurate than GS [91,92]. However, according to Steiner et al. [91], the application of GS for FHB resistance led to a 43% selection advantage over a two-stage FHB phenotypic selection. Since GS across cycles (predicting using phenotypic data obtained from previous breeding cycles) generally has a lower prediction accuracy than the within cycle GS, such application in breeding programs requires the improvement of GS across cycles. The increasing application of GS for yield in wheat breeding programs, along with the availability of skim sequencing at reasonable prices indirectly provide the opportunity to use GS for FHB resistance at nearly zero cost. This reserves the limited capacity in FHB nurseries for testing more advanced elite materials and hence accelerates the release of FHB resistant cultivars. Awareness of the relatedness between training and validation populations and periodic updating of the selection models are imperative for the reliable application of GS in wheat breeding programs. GS studies addressed in this review are summarized in Table 2.

Genomic Selection for Correlated Disease Resistance Traits

The undesirable association between agronomic traits such as plant height (PH) and heading date (HD) with FHB resistance is a challenge for the application of GS. There is compelling evidence supporting the negative correlation between FHB resistance and PH and HD, which is often reflected as the co-localization of PH and HD QTL with FHB resistance QTL [32,34,35,93]. The dwarfing alleles of Rht-B1 and Rht-D1 have been associated with FHB susceptibility [94,95]. This has motivated the phenotyping of PH and HD along with FHB resistance for most GS-FHB studies conducted since 2016 (Table 2). Interestingly, PH and HD were integrated into the GS models differently. Moreno-Amores et al. [83] evaluated three different approaches to combine PH and HD in the GS model: (1) correcting the FHB resistance trait values using PH and HD followed by using the corrected phenotypic data for single-trait GS (STGS), (2) using PH and HD for Multi-Trait GS (MTGS), and (3) adjusting GS using restriction indices with variable restriction enforced for FHB resistance, PH and HD. They indicated that combining PH and HD as a fixed effect in the GS model is a reasonable strategy to select moderately resistant lines with lower PH and earlier HD than the population average. In other words, successful selection is attainable to fine tuning the tradeoff between prediction accuracy and acceptable reduction in unfavorable agronomic traits. Steiner et al. [91] also reported marginal improvement when using a MTGS model that combined PH and flowering date (FD), although it largely inflated the negative trade-off between GEBVs for FHB severity and the undesirable agronomic traits. They then applied a restriction index to compensate for inflation that only led to marginal improvement. These results reiterate the trade-off between integrating PH and FD in the multivariate model and the reduction in prediction accuracy of FHB resistance. Comparing STGS and MTGS for FHB resistance in a population of 1604 wheat hybrids, Schulthess et al. [90] suggested that the application of MTGS is only advantageous for genotypes less related to the training set. They also proposed the concept of “phenotype imputation”, when the indirect selection of a highly heritable traits leads to improvement in a correlated trait of lower heritability [53,99]. By progressively reducing the intensity of FHB resistance phenotyping and thus trait heritability, they proved that FHB severity could be imputed from PH data. To alleviate the unfavorable increase in PH, a tandem selection strategy or a restricted selection index that discards the extremely tall plants prior to GS was recommended [90]. The low variation for FHB resistance in short and early flowering lines and the pleiotropic effects of PH and HD genes on FHB resistance are some of the impediments to the use of MTGS for GS of FHB resistance.

Table 2. Examples of GS studies of FHB in wheat.

Given the unfavorable association of FHB resistance with PH and HD, genomic prediction indices are expected to minimize the bias for the undesirable traits and thus allow GS of FHB resistance, semi-dwarf and early-heading lines. Steiner et al. [91] deployed a GS index by assigning different weight to FHB resistance, PH and FD. Genomic selection of only FHB traits in the model resulted in an undesirable increase in PH and FD, which could be compensated for by the application of the selection index. The resulting reduction in prediction accuracy was mitigated by adjusting the weight of each trait in the selection index. The integration of FHB resistance QTL as fixed effect in the STGS along with MTGS guided by restriction indices are thus far the most promising strategies for the GS of FHB resistance.

Incorporation of multiple traits into GS models for FHB shows promise; this is particularly true in the advent of high-throughput phenotyping and phenomics. Application of phenomics in plant breeding has recently gained attention. It also enables the discovery of agronomic traits with FHB resistance that have not yet been examined. Enhancing high-throughput phenotyping for FHB resistance under field conditions is expected to increase the accuracy and reduce the cost of phenotyping. Significant progress has been made in developing algorithms capable of accurate detection of wheat spikes from images collected using Ground Mobil imaging units [100]. This will pave the way for the detection of infected areas of spikes using deep learning techniques in future [100]. Improving the phenotyping accuracy and throughput can improve the predictability of GS models over the levels predicted in the previous studies. In addition to providing FHB phenotyping data, information on other traits is also collected through analysing high-resolution images captured by Mobil imaging units and/or Unmanned Aerial Vehicle with minimal effort. These additional data can be incorporated into multi-trait GS models to improve prediction accuracies for FHB resistance.


Plant breeders are constantly searching for specific traits that help farmers grow crops more efficiently, while using fewer natural resources. They usually phenotype large populations for several traits throughout the crop growth cycle. This tedious task of phenotyping multiple traits and large populations is exacerbated by the necessity of sampling multiple environments and growing replicated trials.

New technologies and tools have emerged to speed up the breeding process for rapid release of cultivars that meet the industry and consumers demands. An example of this is high-throughput phenotyping and imaging, which enables non-destructive field-based plant phenotyping for a large number of traits including physiological, biotic (e.g., weeds, insects and diseases caused by fungi, bacteria and virus) and abiotic (e.g., heat, drought, and flood, nutrient deficiency) stress traits [101,102]. The adoption of new phenotyping and genotyping technnolgies has generated a huge amount of complex data, including sequencing data, transcriptomic data, metabolomics data and imaging data. A challenge attached to the exponential growth of data is analysis and interpretation. Machine learning (ML) is set to play a pivotal role in sustainable and precision agriculture. One of the major advantages of ML is the ability to search large datasets to discover patterns and features (traits) by simultaneously looking at a combination of factors instead of analyzing each feature individually. Because ML algorithms can potentially approximate any function, ML may easily uncover genuine patterns within complex datasets [103,104]. In addition, ML allows algorithms to interpret data by learning patterns through experience [105].

Success stories of ML cover various research fields, including robotics [106], bioinformatics [107], biochemistry [108], medical diagnosis [109,110], meteorology [111] and climatology [112]. In agricultural research, ML techniques have been used for predicting regulatory and non-regulatory regions in the maize genome [113], predicting mRNA expression levels in maize [114], polyadenylation site prediction in Arabidopsis thaliana [115] and predicting macronutrient deficiencies in tomato [116]. Only few practical examples related to crop breeding were reported for predicting yield in many crops (see [117] for a review), including wheat [118,119] and maize [120]. ML has been also applied to FHB and rust resistance in wheat [100,121,122].

When tracking any plant disease, an early and accurate identification is essential. The traditional method of identifying disease is visual examination, which is prone to human errors and variability in scoring. For a trained algorigthm, diagnosing plant disease is essentially pattern recognition. After going through hundreds of thousands of diseased plant images, ML algorithms can assess disease type and severity. Deep learning techniques, particularly Convolutional Neural Networks (CNN), are quickly becoming the preferred method for automatic plant disease recognition [123]. An exhaustive study including 79 diseases (e.g., powdery mildew and leaf rust) affecting 14 plant species (e.g., soybean, corn and wheat) has confirmed the effectiveness of CNN for plant disease assessment [125] identified plant disease through images of 25 plant species, with an average accuracy of 99% using CNN. In wheat, deep learning techniques have recently been applied to the detection of FHB with an average accuray of 92% [100,121]. For the first study by Qiu et al. [100], field trials were divided into 10 regions of China, with a hyperspectral image acquired for each region. Several environmental factors influencing the hyperspectral imagery were considered, including wind, humidity, temperature and experimental time (noon) where the sunbeam angle was optimal. The images data were used to train nine ML algorithms. For the second study by Jin et al. [121], three wheat lines with different levels of susceptibility to FHB were cultivated on the St. Paul campus at the University of Minnesota (USA). After innocultion, data acquisition was performed with a camera imaging pipeline at the milk stage of development. Diseased areas of individual spikes were detected using a deep convolutional neural network.

Machine learning methods are useful to analyse large data sets that are hampered by issues such as a small number of observations and a large number of predictive variables, high dimensionality or highly correlated data structures [126]. Therefore, developing high-throuput phenotyping techniques combined with the power of ML would improve the efficiency of FHB assessment in the field, as ML provides substantial advantages over other analytical approaches for large and diverse datasets such as those generated by photo imaging [127].


Genomic selection has been established on the availably of DNA markers linked with all small effect loci contributing to phenotype. In fact, reduction in genotyping cost and the availability of high-density genotyping platform has been the driving force for the application of GS in plant breeding. Single nucleotide polymorphism (SNP) array and genotyping-by-sequencing platforms have been developed for over 25 crop species (reviewed by Rasheed et al. [128]. Although, for the majority, an ultra-high throughput and cost-effective genotyping platforms desirable for GS is still not available.

Wheat genomics came of age with the availability of bread, durum and wild emmer wheat reference genome assemblies in the past few years [124,129,130]. The genomes of 15 wheat cultivars assembled through 10+ Wheat Genomes Project is now publicly available (Walkowiak et al. under review; Leveraging these resources to devise high-throughput and cost-effective genotyping platforms is a significant step toward transferring these investments to breeders and consequently farmers’ fields.

To date of this review, five SNP chips have been developed and benchmarked for genotyping wheat. Such efforts were initiated by developing a SNP array with 9000 gene-associated SNPs in a worldwide bread wheat collection of 2994 accessions [131] followed by development of the wheat 90K iSelect array [132] from RNA sequences of a diverse panel of 726 accessions including tetraploid and hexaploid landraces. The wheat 90K iSelect array is by far the most intensively used SNP array in wheat mapping research. High-throughput SNPs arrays for wheat have also been developed, i.e., the wheat 660K axiom ( and the Wheat HD genotyping array [133]. The later harbor 820K SNPs and integrate variation from diploid, tetraploid and hexaploid wheat accessions and wheat relatives, thus enhancing the genotyping capacity beyond the primary gene pool [133]. Among the very few efforts to make these resources accessible to breeders is the generation of the Wheat Breeder’s Genotyping Array by refocusing on 35K mostly co-dominant SNPs discovered through exome sequencing of wheat cultivars. Genotyping by SNP arrays has significantly boosted high-density linkage and QTL mapping and GWAS in wheat; however, the cost of genotyping has impeded intensive application of GS because it requires genotyping several thousand lines per year.

Genotyping through sequencing has been widely applied for de novo discovery of SNPs in model plants. Application of this approach has been slow in wheat mainly due to the absence of a high-quality reference genome and the high cost of genome sequencing [134]. The high sequencing cost of the large genome of wheat has motivated researchers to apply reduced-representation methods such as RNA-seq [135], exome-sequencing [136] and genotyping-by-sequencing [137]. In certain cases, the reduced representation methods have been used to obtain the sequence of certain gene families in wheat, e.g., disease resistance genes [138]. As the cost of sequencing is reduced and SNP imputation methods improve, low coverage (skim) sequencing is gaining attention due to its lower error rates and higher genome coverage. Availability of third-generation sequencing at a reasonable cost based on long sequencing reads hold a potential for further integration of structural variants such as presence/absence variants (PAVs) and copy number variants (CNVs) into QTL mapping and genomic prediction studies. PAVs and CNVs seem to form a significant portion of the variation present between cultivars and wild germplasm. Including these types of variants would be of great value for studies aimed at enhancing genetic variation in wheat. Such an effort has been initiated through an international project dubbed 4D Wheat (Diversity, Domestication, Discovery and Delivery) that targets mobilizing genetic variation in the wheat secondary and tertiary gene pools and their application in de novo re-domestication of wheat (Pozniak and Cloutier, personal communication). Partnerships among 4D Wheat and companies offering third generation sequencing is expected to lead to the availability third generation skim sequencing platforms enabling cost-effective discovery of PAVs and CNVs for several genetic and GS studies. Skim sequencing of exotic materials facilitate enhancing the diversity within wheat breeding gene pool.

As haplotypes are inherited independently, but not SNPs, the required number of SNPs to cover all haplotypes is often several times lower than the number discovered through most genotyping platforms discussed above [139]. However, GS accuracy is often positively correlated with marker density, as it theoretically increases the odds of QTL lying in linkage disequilibrium (LD) with at least one marker. For example, genomic prediction accuracy improved by 10% when the number of markers increased from 92 to 1158 for a population of 374 winter wheat advanced-cycle breeding lines [140]. However, the prediction accuracy plateaus at a certain marker density, depending on the genetic diversity within the population and relatedness between the training and validation population [141]. The prediction accuracy decreases as the number of markers increases over this threshold, as the consequence of an over-fitted model [142]. In most cases, 1000–1500 SNPs have been recommended for genomic prediction studies in wheat, however, the decision over what markers to include largely depends on the diversity within the training and validation population and their relatedness. Thus, it seems realistic to develop program-specific breeder SNP chips that captures the available haplotypes at a reasonable cost. The decision on what SNPs to be included in the breeder chip could be based on the estimation of LD decay over genetic distances inferred from high-density QTL mapping studies [34,35,143] or comprehensive haplotype mapping of wheat diversity panels [136,144]. Thus, in addition to high-throughput phenotyping, advances in genotyping technologies are also shaping the future of GS including SNP arrays and DNA/RNA sequencing.


To our knowledge other “omics” have not yet been utilized for GS in wheat. Inclusion of intermediary biological strata in the cascade from genotype to phenotype (endophenotypes) could improve prediction accuracy. This is attributed to the contribution of endophenotypes to the identification of epistatic interactions within and between various gene regulation strata [145]. The most attention has been given to the transcriptome, which reflects and quantifies gene expression. Previously, transcriptomics has been deployed for genomic prediction in maize [146,147]. The metabolome has also garnered attention since it integrates all gene regulation and interaction processes. Metabolomics has been successfully used for phenotypic prediction in maize [147]. In a recent study, the combination of transcriptomics of mRNA and sRNA, and metabolomic data were used to predict the yield performance of maize hybrids [145]. The combination of genomic and mRNA data returned 10% higher prediction accuracy, while including sRNA in the model had negligible effect on the prediction accuracy. Interestingly, the difference between transcriptomic and combined genomic and transcriptomic data was negligible, suggesting that mRNA data could alone be used to achieve high predictability.

Practically, transcriptomic prediction benefits from both gene expression data and SNP discovery for a combined genomic and transcriptomic prediction platform. Despite these advantages, integration of transcriptome data has been impeded by the higher cost of mRNA sequencing compared with DNA sequencing, the poor correlation between gene expression under controlled conditions and field environments, and the tendency to discover non-heritable variation. Nevertheless, the cost seems to be reasonable in wheat hybrid breeding programs as the transcriptomes of a limited number of founder lines is analyzed. Transcriptome data could be generated for a subset of founder lines and used to develop models for imputing the value of others using pedigree and genomic data [148]. Application cost could also be reduced by utilizing 3’Pool-seq, which is claimed to reduce the library preparation cost up to 90% with marginal reduction in the accuracy of gene expression quantification [149]. In addition, the BART-seq platform allows the utilization of reduced-representation transcriptome sequencing [150] that could theoretically capture the expression of a certain set of genes relevant to the trait of interest. The validation of both methods in wheat warrants further investigation. Once validated, these techniques could be applied in future “omics” prediction studies in wheat, especially as hybrid breeding is gaining ground as a new strategy for genetic improvement in wheat [151].


The present review tapped into several high-impact GS studies conducted during last five years for rust and FHB to identify the most effective protocol for implementing GS in breeding programs. Despite significant variability in how GS was implemented in these studies, we identified few common grounds. A significant common theme was the tendency to integrate several data e.g., pedigree, genotype × environment interaction and QTL identified through mapping studies into a model. A reasonable strategy suggested was the application of MAS to increase the frequency of favourable alleles for traits with strong additive QTL at early generations and GS to capture positive alleles with smaller additive effect in later generation materials. On the contrary, others argue the benefit of using GS at early generation where population size and shortage of seed impede intensive phenotyping at disease nurseries. The relatively low across-cycle predictability of GS is a hurdle for the application of the later strategy. The across-cycle predictability of GS could be improved by increasing the size of training populations, and the periodic retraining of the model and validating of the model. Accurate and high-throughput phenotyping combined with the power of ML is expected to promote the application of GS by reducing the phenotypic error and thus increasing the across-cycle predictability of GS. It could also unveil complex association of the resistance with other phenological traits and supply detailed data for modeling such complex associations.

Multi-trait GS has appeared as a useful strategy for selecting quantitative resistance, especially for FHB considering the well-realized association of FHB with PH and HD. Majority of GS studies on FHB benefit from integration of PH and HD (or FD) into the prediction models. However, unsupervised integration of PH and HD in the models leads to undesirable increase in PH and delayed HD. Efforts to mitigate such undesirable effects leads to reduction in the predictability of the model. Another challenge for the application of MTGS for FHB is the low variation for FHB resistance in short stature wheat germplasm and the pleiotropic effect of PH and HD on FHB resistance. Despite these challenges, MTGS was proved partially useful when GS models were adjusted using restriction indices for PH and HD, allowing some selection gain for FHB resistance, semi-dwarf, and early heading germplasm. High-throughput phenomics empowered by ML would be of great value to uncover the association with other agronomic traits not yet considered in the previous genetic studies on rust and FHB resistance. Exotic germplasm and landraces hold promise for improving FHB and rust resistance in wheat. Availability of skim sequencing at reasonable cost has made discovering structural variation across various wheat gene pools possible. Skim sequencing of a large number of exotic materials facilitate enhancing the diversity within wheat breeding gene pool.

All in all, we expect GS to be intensively applied in wheat breeding programs given its numerous advantages such as improving selection gain, reducing the need for labor-intensive and costly phenotyping at disease nurseries and accelerating the utilization of genetic variation. The availability and predictability of GS for wheat breeding could be enhanced by ML empowered high-throughput and precise phenotyping, the cost-effective application of “omics” for improving the GS predictability, and the availability of endophenotypes such as transcriptome and metabolome data in effort to better model epistatic and genotype × environment interaction. Reducing the cost per sample for such endophenotypes is a prerequisite for their integration in GS studies in inbred and hybrid wheat breeding. On the other hand, ML would allow for a more accurate disease diagnosis, while preserving energy and generating consistent/repeatable data. However, dataset limitations (number and variety of samples) hamper the development of truly efficient platforms for plant disease classification. Fortunately, some efforts towards building and sharing more representative publicly available databases are underway. Thus, future studies could focus on improving across-cycle GS predictability through integrating modern technologies and big data sciences.


The authors declare that they have no conflicts of interest.


We are grateful for funding from the Canadian Triticum Applied Genomics research project (CTAG2) funded by Genome Canada, Genome Prairie, the Western Grains Research Foundation, Saskatchewan Ministry of Agriculture, Saskatchewan Wheat Development Commission, Alberta Wheat Commission, Viterra and Manitoba Wheat and Barley Growers Association.


The authors express their appreciation to anonymous reviewers for their valuable suggestions to improve the manuscript. 

























































































































































How to Cite This Article

Haile JK, N’Diaye A, Sari E, Walkowiak S, Rutkoski J, Kutcher RK, Pozniak CJ. Potential of Genomic Selection and Integrating “Omics” Data for Disease Evaluation in Wheat. Crop Breed Genet Genom. 2020;2(4):e200016.

Copyright © 2020 Hapres Co., Ltd. Privacy Policy | Terms and Conditions