Implementation of Genomic Selection in the CIMMYT Global Wheat Program, Findings from the Past 10 Years

Susanne Dreisigacker; Jose Crossa; Paulino Pérez-Rodríguez; Osval A. Montesinos-L֯ópez; Umesh Rosyara; Philomin Juliana; Suchismita Mondal; Leonardo Crespo-Herrera; Velu Govindan; Ravi P. Singh; H.-J. Braun

doi:https://doi.org/10.20900/cbgg20210005

Abstract
Abbreviations
Introduction
Improving Genome-Wide Prediction Ability in CIMMYT Wheat Datasets
Integrating GS in the CIMMYT Global Wheat Program
Reorganizing the CIMMYT Wheat Breeding Program
Conclusions
Author Contributions
Conflicts of Interest
Funding
Acknowledgments
References
How to Cite This Article

< Previous Next >

TOTAL VIEWS
View Article Impact

This work is licensed under a

Creative Commons Attribution 4.0 International License

Crop Breed Genet Genom. 2021;3(2):e210005. https://doi.org/10.20900/cbgg20210005

Review

Implementation of Genomic Selection in the CIMMYT Global Wheat Program, Findings from the Past 10 Years

Susanne Dreisigacker^1,*

, Jose Crossa^1,*

, Paulino Pérez-Rodríguez², Osval A. Montesinos-L֯ópez³, Umesh Rosyara¹, Philomin Juliana¹, Suchismita Mondal¹, Leonardo Crespo-Herrera¹, Velu Govindan¹, Ravi P. Singh¹, H.-J. Braun¹

¹ International Maize and Wheat Improvement Center CIMMYT, Edo. de México 56237, México

² Colegio de Postgraduados, Motecillos, Edo. de México 56230, México

³ Faculty of Telematic, Universidad de Colima, Colima 28040, México

* Correspondence: Susanne Dreisigacker, Tel.: +52-55-5804-2004; Jose Crossa, Tel.: +52-55-5804-2004.

Received: 22 June 2020; Accepted: 25 January 2021; Published: 26 March 2021

ABSTRACT

Wheat is a fundamental crop for improving global food security and the International Maize and Wheat Improvement Center (CIMMYT) has been a central pillar for providing high yielding, nutritious, disease- and climate-resilient wheat varieties to target countries, which is the basis for establishing more resilient agri-food systems especially in the developing world. Increasing wheat yield potential through plant breeding will play a crucial role in fulfilling the projected future global demand of wheat. New emerging technologies and breeding strategies must be looked at to accelerate the rate of genetic gains in wheat. Genomic selection (GS) in one of these strategies that has already demonstrated higher rates of genetic gains in animal breeding and is becoming an essential component of many plant breeding programs including wheat. Throughout the last decades the CIMMYT Global Wheat Program has made significant contributions to promote the implementation of GS in wheat. Several new genome-wide prediction models (e.g., models accounting for genotype × environment interaction or, more recently, deep learning methods) were developed and tested on CIMMYT wheat datasets. GS is routinely implemented in the CIMMYT spring bread wheat program since 2013. Here we summarize the learnings from 10 years of experience with GS in the CIMMYT Global Wheat Program and give a brief outlook on future work.

KEYWORDS: CIMMYT Global Wheat Program; genomic selection; prediction models; wheat

ABBREVIATIONS

AUC, receiver operating characteristic curve; B × E, Band × Environment; BGLR, Bayesian Generalized Linear Regression; BMGF, Bill and Melinda Gates Foundation; CENEB, Campo Experimental Norman E. Borlaug; CIMMYT, International Maize and Wheat Improvement Center; DIFID, Department for International Development; DL, Deep learning; G × E, Genotype × Environment; GBLUP, Genomic best linear unbiased prediction; GEBV, Genomic-enabled breeding value; GS, Genomic Selection; GWAS, Genome-wide association study; EYT, Elite yield trial; HTP, High-throughput phenotyping; ICARDA, International Center for Agricultural Research in the Dry Areas; INIA, Instituto Nacional de Investigación Agropecuaria; M × E, Marker × Environment; MLP, multi-layer perceptron; NGS, Next generation sequencing; PNN, probabilistic neural network; YT, Yield trials; QTL, Quantitative trait locus; ReLu, rectified linear activation unit; RKHS, Reproducing Kernel Hilbert Spaces; SNP, single nucleotide polymorphism; USAID, United States Agency for International Development

INTRODUCTION

Wheat (Triticum aestivum L.) ranks as the second most important food crop after rice and is the most widely cultivated cereal in the world. It is one of the fundamental basis of global food security, supplying 20% of total calories and a similar portion of total protein to the world’s population [1]. Currently, food security is under threat from ongoing climate change, plateauing of crop yield in several regions and declining natural resources (FAO, 2018). Increasing crop yield potential and closing the yield gap are two important aspects of the solutions proposed to achieve global food security in a sustainable manner with minimum environmental footprints [2,3].

The Global Wheat Program of the International Maize and Wheat Improvement Center (CIMMYT) is one of the most important public sources of high yielding, nutritious, disease- and climate-resilient wheat varieties for Africa, Asia, and Latin America, and is therefore a central pillar for more resilient agri-food systems in those countries. Lantican et al. [4] estimated that nearly 70% of the spring wheat growing regions in developing countries either grow CIMMYT wheat germplasm as a direct release or have used CIMMYT germplasm as a parent in their varieties. The development of wheat germplasm has been the core activity of the Global Wheat Program since its establishment in 1966 following the “Green Revolution”. Since then, CIMMYT’s wheat breeding programs have continuously evolved to meet production challenges and future demands for wheat production. Only recently, the impact of the semi-dwarf spring wheat breeding programs on improving grain yield was newly assessed during the time span of fifty years in optimum, drought and heat stressed environments [5]. In fact, since 1976, the program has periodically estimated genetic gains in grain yield to evaluate progress in grain yield potential of new germplasm using various statistical methods [6–14]. Studies before 1990 reported rates of genetic gain in grain yield of around 1.1%, while the most recently published studies reported genetic gains of around 0.6 to 0.7%.

In order to keep up with the future demands of wheat production and to adapt to the changing environmental factors, breeders have constantly been turning to new and emerging technologies and breeding strategies. Recently, advances in highly efficient agri-genomic approaches, such as high-throughput next-generation sequencing (NGS) technologies that provide thousands to millions of data points at constantly decreasing costs, along with advancements in statistical solutions to exploit large amounts of genomic data, pave one way to modernize breeding programs [15,16].

The wheat genome is large and highly complex compared to many other cereal crops. Its estimated size of ~17Gb is in part attributable to wheat being an allohexaploid, with three different but highly related diploid genomes [17]. In addition, the wheat genome has experienced significant proliferation of repetitive elements, resulting in a composition of between 75 and 90% repetitive DNA sequences [18]. Nevertheless, the swift development of NGS during the last two decades has made it possible to produce draft sequences not only for the wheat diploid donors of the A (T. urartu, [19]) and D (Ae. tauschii, [20]) genomes, but also the tetraploid and hexaploid wheat genomes (T. durum and T. aestivum, [21,22]). Simultaneously, additional molecular marker platforms (e.g., single nucleotide polymorphism (SNP) arrays) and other resources (e.g., mutant libraries) were developed for wheat, that largely facilitate genome-wide characterization of germplasm and functional genomics to bridge the gap between genotype and phenotype [23].

Genomic approaches have also been adopted by the CIMMYT wheat breeding programs [24]. For example, research on rust has mapped and officially designated 12 genes in the past decade using biparental linkage mapping. Among these genes, especially the three pleiotropic adult-plant resistance genes Lr34/Yr18/Sr57/Pm38, Lr46/Yr29/Sr58/Pm39 and Lr67/Yr46/Sr55/Pm46 are widely used across bread wheat breeding programs as a basis of partial resistance against the three rusts in wheat [25]. With the development of high-throughput molecular marker platforms, including NGS, genome-wide marker information has been generated on many different populations of CIMMYT wheat. These datasets have been utilized in genome-wide association studies (GWAS) to discover new quantitative trait loci (QTL) for various priority traits, including resistance to several diseases, grain quality, and grain yield components [26–33] and to understand the genetic basis of new traits [34–36]. Resulting gene or QTL-associated molecular markers are routinely integrated in the CIMMYT shuttle breeding scheme by applying marker-assisted selection or marker-assisted backcrossing. Both approaches are applied to increase response to selection mainly for simply inherited traits [24].

In contrast to QTL and GWAS, genomic selection (GS) is a breeding tool that uses genome-wide marker information to make predictions on breeding values in a breeding program [37]. The use of genome-wide markers in crops had already been suggested by Bernardo [38]. After the publication of Meuwissen et al. [37] the approach first became popular in animal breeding before being tested and applied in plant breeding. GS uses available genotypic and phenotypic information to develop a prediction model that can subsequently be used to estimate the breeding values of parents used in crosses or for the selection of lines using genotypic data only. Thus, progeny can be selected and taken forward in a breeding program based on genotype alone, saving the cost, time and effort of phenotypic selection [39,40].

During the last decade, studies on GS in wheat have been growing. CIMMYT started to more aggressively explore GS as a breeding tool in 2010. Since then, CIMMYT has made significant contributions developing and testing various new genome-wide prediction models on CIMMYT wheat datasets. GS became operational in the CIMMYT spring bread wheat program in 2013. Here we summarize the learnings from 10 years of experience with GS in the CIMMYT Global Wheat Program.

IMPROVING GENOME-WIDE PREDICTION ABILITY IN CIMMYT WHEAT DATASETS

A large number of genome-wide prediction models in crops have been developed or adopted from other research fields to handle the high-dimensional marker datasets that are typical of GS. The various types of models respond differently because they vary in their assumptions when treating the variance of complex traits. Throughout the last decade, the CIMMYT Biometrics Unit, in partnership with other research institutions, has developed and promoted various novel prediction algorithms and tested their implications in wheat. Furthermore, the team released the “Bayesian Generalized Linear Regression (BGLR)” package in the R computing environment [41] which is most commonly used at CIMMYT. Large genotypic data that accumulated over the years, together with comprehensive phenotypic data and supported by environmental information, often required complex statistical solutions.

Integrating Pedigree Information

Before the application of GS, predictions of genotypes have mainly been obtained using phenotypic and family data by estimating breeding values. Family data were usually represented by pedigrees and were routinely used in animal breeding, but less in crop breeding due to the lack of complete pedigree information [42]. Nevertheless, for decades CIMMYT wheat breeders have been using the Purdy nomenclature [43], a standard pedigree system for crosses, which makes it possible to generate relationship matrices based on fully expanded genealogical information extracted from CIMMYT’s International Wheat Information System [44].

Formally, a breeding value can be partitioned into two components: (1) the parent average (that is, one individual receives 50% of its genome from each of its two parents) and (2) Mendelian sampling, which is the random sampling of the genome of each parent [42]. While the pedigrees capture the first component of family-based relationships, molecular markers can capture both components, family relationships and Mendelian segregation and are therefore expected to increase the accuracy of breeding values. Very early GS studies utilizing CIMMYT wheat datasets have already confirmed this assumption and showed that molecular markers increased genome-wide prediction abilities over the pedigree-derived models [45,46]. Furthermore, it was shown that if molecular markers and pedigree information are considered jointly, prediction abilities are sometimes slightly but consistently superior to the marker or pedigree-derived models alone. The additive relationship matrix A calculated via the coefficient of parentage is therefore routinely integrated into almost any genome-wide prediction model at CIMMYT.

Models with Linear and Non-Linear Kernels

Let y_i be the phenotypic value for an individual x_ij, i = 1, …n and let j = 1 …p represent the marker genotypes coded as 0, 1, 2 (which correspond to aa, Aa and AA, respectively), and we define u_i = ∑_{j = 1} ^p x_ij β_j, the response for i-th individual that can be represented as the sum of two quantities: a genetic signal u_i and a residual (e.g., [37]), that is: y_i = μ + u_i + e_i where μ is a general mean, β_j is the j-th marker effect and e_i represents independent and identically distributed random variables with mean 0 and variance σ_e². The model can be written in matrix notation as:

y = μ1 + Xβ + e (1)

Usually in genomic applications, the number of phenotypic records is much smaller than the number of markers, and it is not possible to obtain the least square estimator for β; therefore, Bayesian or penalized estimation procedures are used to fit the model. In the standard Bayesian Regression model, it is assumed that β | σ_β² ~ MN(0, σ_β² I) where σ_β² is the variance associated with the markers, I is the identity matrix and MN stands for multivariate normal density. If we set u = Xβ, then model (1) can be rewritten as y = 1μ + u + e with u~N(0, σ_g² G) with G being a linear kernel representing the genomic relationship matrix and σ_g² its associated variance parameter. This model is known in the literature as the GBLUP model and is probably the model most widely used in GS. Several software packages (e.g., BGLR, [41]; rrBLUP, [47], etc.) can fit these models. The GBLUP has also been widely used for genome-wide predictions at CIMMYT [46,48].

Some applications of GS have used semi-parametric genomic regression methods to account for non-additive variation [49,50]. These methods have been used to predict complex traits in wheat with very promising practical results [51–54]. One non-linear method is the Gaussian kernel that appeared as a reproducing kernel in a semi-parametric model: the Reproducing Kernel Hilbert Spaces (RKHS) [51,54–58]. Therefore, the regression function u_i = u(x_i1,…,x_ip ) described above can also be represented by the semi-parametric RKHS regressions or neural network types. The RKHS approach uses markers for generating a covariance structure matrix known as the reproducing kernel matrix which depends on the markers and on the bandwidth parameter (h). A reproducing kernel is, for example, the Gaussian kernel function K_h (x_i, x_i' )= exp⁡(−hd_ii'² ), where x_i, x_i' are the marker vectors for the i-th and i'-th individuals, and d_ii'² = ∑_{j = 1} ^p (x_ij − x_i'j)² is the squared Euclidean distance [54]. Recently, Pérez-Elizalde et al. [59] proposed an empirical Bayesian method for estimating h following a simple idea put forth by Gianola and Van Kaam [55], that is, to assign a prior p(h) and obtain a posterior point estimate of h. RKHS have been suggested as alternative models to capture non-linear and complex interactions between genes [56]. RKHS based models have been used in several research studies with CIMMYT wheat data, e.g., [46,54,59], among many others, and have produced good results when deploying GS in early breeding generations (see below).

Several studies ([60–62] and [63]) showed that Gaussian kernel methods with the multi-environment genomic G × E model of Jarquín et al. [64] gave higher prediction accuracy than other linear kernel methods. In searching for non-linear kernels, Cuevas et al. [65] showed the use of the arc-cosine kernel. The arc-cosine kernel was initially described by Cho and Saul [66] using the deep learning in kernel machines. The importance of the arc-cosine non-linear kernel is that it emulates deep neural network by including levels (hidden layers) and a recursive function. Cuevas et al. [65] and Crossa et al. [67] described the arc-cosine kernel method in a multi-environment models including G × E. The arc-cosine kernel is computationally much simpler than the Gaussian kernel and performed very similarly to, and sometimes slightly better than, the Gaussian kernel.

Accounting for Genotype × Environment (G × E) Interactions

Multi-environment trials for assessing G × E interaction are a key component in plant breeding for selecting high performing and stable lines across environments. Multi-environment linear mixed models account for correlated environmental structures within the GBLUP framework and thus can increase accuracy when predicting the performance of unobserved phenotypes using pedigree and molecular markers. Burgueño et al. [68] were the first to use marker and pedigree GBLUP models to assess G × E in GS, while Heslot et al. [69] incorporated crop modeling data for studying genomic G × E.

Reaction norm model

Jarquín et al. [64] proposed a reaction norm model where the main and interaction effects of markers and environmental covariates are introduced using highly dimensional random variance-covariance structures of markers and environmental covariables; the reaction norm model is an extension of the well-known GBLUP model.

The baseline model for the phenotypes (y_ij) can be described as

y_ij = μ + E_i + L_j + EL_ij + e_ij (2)

where μ is the overall mean, E_i (i = 1, …, I) is the random effect of the i-th environment, L_j is the random effect of the j-th line (j = 1, …, J), EL_ij is the interaction between the i-th environment and the j-th line, and e_ij is the random error term. The assumptions are as follows: E_i ~iid N(0, σ_E²), L_j ~iid N(0, σ_L²), EL_ij ~iid N(0, σ_EL²), and e_ij~iid N(0, σ_e²), with N(.,.) denoting a normal density, and iid standing for independent and identically distributed.

Markers can be introduced in (2) such that the effect of line (L_j) can be replaced by u_j defined by the regression on marker covariates (it approximates the genetic value of the j-th line). The vector containing the genomic values is u~MN(0, σ_g²G), where σ_g² is the genomic variance, and G is a genomic relationship matrix [70,71]. Also, the effects of line (L_j) can be replaced by a_j, with a~MN(0, σ_a²A), where A is the additive relationship matrix derived from pedigree and σ_a² is the additive variance.

The reaction norm model has been successfully applied using pedigree and genomic relationships [72,73]. Velu et al. [74] applied the reaction norm model to 330 wheat lines (from CIMMYT’s biofortification breeding program) having Zn and Fe content in the grain measured in Mexico and India. The authors used nine different reaction norm models with either g_j or a_j or both g_j and a_j, as well as their interactions with environment E_i. Results show that models including G × E always had higher genomic-enabled prediction abilities than the main effects models. This study was the first to discover the more complex genetic architecture of Zn and Fe concentration in the grain, leading the authors to favor the implementation of GS over marker-assisted selection for the improvement of biofortified wheat.

The marker × environment (M × E) interaction model

The M × E interaction model proposed by Lopez-Cruz et al. [75] decomposes the marker effects into components that are common across environments (stability) and environment-specific deviations (interaction). The model for the j-th environment can be written as:

where y_ij represent the response of the i-th line in the j-th environment, x_ijk represents the k-th marker for individual i in environment j, b_0k is an effect common to all environments and b_jk is a marker effect specific to each environment, b_0k ~iid N(0, σ₀² ), b_jk ~iid N(0, σ_b²), e_ij ~iid N(0, σ_e²). This model borrows information across environments while allowing marker effects to change across environments. The M × E model of Lopez-Cruz et al. [75] can be implemented using both shrinkage methods as well as variable selection methods; it can thus be used to identify genomic regions whose effects are stable across environments and other regions that are responsible for G × E. The M × E model is best suited for the joint analysis of positively correlated environments. Lopez-Cruz et al. [75] used the M × E model to analyze three CIMMYT wheat datasets; its prediction accuracy was substantially higher than that of an across-environment analysis that ignores G × E.

Crossa et al. [73] used the M × E model to predict untested individuals and identify genomic regions whose effects are stable across environments and others that are environment-specific. Detecting regions for a complex trait such as grain yield is more complicated, and M × E interaction patterns are consequently more complex. Nevertheless, the M × E interaction Bayes B model detected marker main effects in regions of chromosomes that had important environment-specific grain yield marker effects in specific environments.

Single Step G × E interaction model

The single step method [76–79] extends the genomic relationship matrix to include information of non-genotyped individuals for which pedigree information is available. The method has been used mainly in animal breeding. Pérez-Rodríguez et al. [80] implemented and extended the single step model to include G × E interactions. The authors showed how to use the proposed model to predict grain yield in international environments (sites in India, Pakistan and Bangladesh) using 58,798 CIMMYT wheat lines and concluded that prediction abilities of the proposed model were higher than those of models that did not include G × E interactions.

Combining GS with High-Throughput Phenotyping (HTP)

High-throughput phenotyping measures a large number of phenotypes through time and space at low cost and with less labor intensity using proximal and remote sensing [81]. The information collected with HTP can be combined with genotypic or pedigree information and be included in GS models to predict the trait of interest. Rutkoski et al. [81] used measures of canopy temperature, and a green and red normalized difference vegetation index to predict grain yield for 1092 wheat lines that were evaluated in 5 environments. The authors concluded that HTP can be included in GS models and leads to improvements in prediction abilities, which can be beneficial in the early stages of selection.

Montesinos-López et al. [82] were the first to propose Bayesian functional regression models that take into account the main effects of environment and genotype, all the available reflectance wavelength of the HTP data and the interaction terms (G × E and band × environment (B × E) interactions) for predicting the primary trait grain yield. The authors compared the prediction abilities of models that include interaction terms versus those that do not and compared the prediction abilities and implementation time of Bayesian functional regression models versus conventional Bayesian models that are not in the functional regression category. The authors found that using all bands simultaneously increased prediction accuracy more than using vegetative indices alone. The Splines and Fourier models had the best prediction accuracy. However, in this research the authors did not use genomic or pedigree information to complete the predictions using the HTP information.

Some of the first researchers linking the genomic and the high dimensionality of the HTP data together were Montesinos-López et al. [83], who proposed a Bayesian functional regression analysis that takes into account all the bands using hyperspectral wavelengths and is implemented using two types of basic functions, B-splines and Fourier. This method resulted in superior prediction abilities for wheat grain yield compared to a range of other options. Montesinos-López et al. [83] further extended this model to incorporate genomic and pedigree information, in addition to accommodating G × E by modeling hyperspectral B × E interactions. Their study found that models that included the B × E term had higher prediction accuracies than those that did not, suggesting that hyperspectral reflectance may be a useful phenotype for modeling G × E interactions. The authors also observed that models with the B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction abilities. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated from the number of basis, is smaller than the total number of coefficients for all bands.

When collecting hyperspectral data within a multi-environment context, the number of predictors increases in proportion to the number of environments and phenotyping time points observed, which may come at a computational cost depending on the type of prediction model used [83]. One possible approach that may minimize computation time would be to use the hyperspectral bands as a high dimensional predictor set, similar to the prediction with SNPs markers in GBLUP, that is, to create a relationship matrix between individuals using the hyperspectral bands [83]. In this way, the number of bands could be very large without increasing the complexity of the GBLUP prediction model. Separate genomic/pedigree and hyperspectral reflectance kernels could be integrated to model the genetic main effects and G × E effects, respectively.

Krause et al. [84] used hyper-spectral reflectance data to predict grain yield and lodging using data from 3771 CIMMYT wheat lines. The lines were evaluated in four breeding cycles (2013–2014, 2014–2015, 2015–2016, 2016–2017) under five different environment/management treatments. The authors proposed a multi-kernel GBLUP model that includes the additive relationship matrix derived from pedigree, the additive relationship matrix derived from markers, a relationship matrix derived from the hyperspectral images and the G × E interaction for GS. The authors also concluded that deriving relationship matrices from aerial hyperspectral reflectance phenotypes can effectively predict grain yield in wheat within and across managed treatments and breeding cycles, potentially to be implemented in earlier breeding generations, when genotyping of many lines would be too costly.

Incorporating GWAS Signal into Genomic Prediction Models

In contrast to the single marker approach, haplotype-based approaches consider more than one marker to be able to turn biallelic SNPs into multi-allelic haplotype loci. Haplotype-based approaches for genome-wide predictions may be favored in some cases, where QTL are more closely linked to haplotype alleles than to individual SNPs [85]. It has been hypothesized that such approaches can be used to boost genomic-enabled prediction abilities [86–89] with the argument that they can capture epistasis between SNPs [90,91]. Another approach proposed for boosting the prediction abilities is using GWAS-based marker effects as fixed effects in the model [92,93]. In some cases, both approaches have not shown any benefit or even negative impact on the prediction abilities. A third approach to enhance prediction abilities is using the Gaussian Kernel which can indirectly pickup epistatic effects [45,94]. In a recent study, we derived a GBLUP model that combines one or more of these three approaches - haplotype, GWAS-based marker effects used as fixed effects, and the Gaussian Kernel as a proxy for epistatic effects [95]. The genomic predictions were based on the G-BLUP model, using the following mixed model:

y = Xβ + Zu + ϵ (4)

where y is a vector of phenotypes consisting of adjusted means, β is a vector of fixed effects, u is a vector of random genetic values, and ϵ is the vector of residuals. X and Z are design matrices. The u was assumed to follow a Gaussian distribution u ~ N(0,Kσ_g²), where K is the genomic relationship matrix and σ_g² is the additive genetic variance. The residuals e is assumed to follow a Gaussian normal distribution u ~ N(0,Iσ_e²), where I is the identity matrix. In models including GWAS results, the GWAS discovered QTLs are used as fixed effects X ∈ Q, where Q ∈ {1,0}, 1 or 0 for each QTL columns based on their presence or absence. The K can be calculated as an additive relationship matrix (AM), where AM = MMT where M ∈ {1,0,-1} depending upon whether the markers show the homozygous reference, the heterozygous reference or the homozygous alternate alleles. Similarly, we can calculate a haplotype-based relationship matrix AH = HHT where H ∈ {1,0} is depending on the presence of each haplotype for all haplotype loci. In addition to the conventional AH or AH matrix, we calculated the Gaussian Kernel based matrix for both markers and haplotypes. Although the highest prediction abilities were obtained with the most complex model including all three components, the highest gains in prediction abilities were revealed with the Gaussian Kernel. The GWAS-based marker effects used as fixed effects in the GS model alone had only minimum impact on the overall prediction ability.

Deep Learning (DL) Methods

Deep learning methods are machine learning methods inspired on the functioning of the human brain that gives computers the ability to learn without being explicitly programmed [96] and enables the computers to act and make data-driven decisions to carry out a certain task. Deep learning can be defined as a generalization of artificial neural networks where more than one hidden layers are used (Figure 1) which implies that more neurons are used for implementing the model. The adjective "deep" applies not to the acquired knowledge, but to the way in which the knowledge is acquired [97] since it stands for the idea of successive layers of representations. The “deep” of the models refers to the number of layers that contribute to a model (Figure 1).

FIGURE 1

Figure 1. A feedforward deep neural network with one input layer, three hidden layers and two output layers. There are eight neurons in the input layer that corresponds to the input information, three neurons in each of three hidden layers, with two neurons in the output layers that correspond to the traits that will be predicted.

The deep neural network provided in Figure 1 is very popular; it is called feedforward neural network or multi-layer perceptron (MLP). The topology contains eight inputs, two output layers and three hidden layers. The input is passed to the neurons in the first hidden layer, and then each hidden neuron produces an output that is used as an input for each of the neurons of the second hidden layer. Similarly, the output of each neuron in the second hidden layer is used as an input for each neuron in the third hidden layer. Finally, the output of each neuron in the third hidden layer is used as an input to obtain the predicted values of traits of interest. It is important to point out that in each of the hidden layers, a weighted sum of the inputs and weights (including the intercept) is attained, which is called the net input, to which a transformation called activation function is applied to produce the output of each hidden neuron.

The analytical formulas of the model given in Figure 1 for two outputs, d inputs (not only eight), M₁ hidden neurons (units) in hidden layer 1, M₂ hidden units in hidden layer 2, M₃ hidden units in hidden layer 3, and two output neuron are given by the following equations (5–8):

V_1j = g₁ (∑_{i = 1} ^dW_ji⁽¹⁾ x_i + b_j1), j = 1, …, M₁ (5)

V_2k = g₂ (∑_{j = 1} M₁ W_kj⁽²⁾ V_1j + b_k2), k = 1, …, M₂ (6)

V_3l = g₃ (∑_{k = 1} M₂ W_lk⁽³⁾ V_2k + b_l3), l = 1, …, M₃ (7)

y_t = g_4t (∑_{l = 1} ³ W_tl⁽⁴⁾ V_3l + b_t4), t =1, 2 (8)

where g₁ (⋅), g₂ (⋅), g₃(⋅) and g_4t(⋅) are activation functions for the first, second, third and output layers, respectively; Equation (5) produces the output of each of the neurons in the first hidden layer, Equation (6) produces the output of each of the neurons in the second hidden layer, Equation (7) produces the output of each of the neurons in the third hidden layer and finally, Equation (8) produces the output of the two response variables of interest. The learning process involves updating the weights (W_ji⁽¹⁾, W_kj⁽²⁾, W_lk⁽³⁾, W_tl⁽⁴⁾) and biases (b_j1, b_k2, b_l3, b_t4) to minimize the loss function, and these weights and biases correspond to the first hidden layer (W_ji⁽¹⁾, b_j1), second hidden layer (W_kj⁽²⁾, b_k2), third hidden layer (W_lk⁽³⁾, b_l3) and to the output layer (W_tl⁽⁴⁾, b_t4), respectively. To obtain the outputs of each of the neurons in the three hidden layers, the rectified linear activation unit (RKHS) or other nonlinear activation functions (sigmoid, hyperbolic tangent, leaky ReLu, etc.) can be used. However, for the output layer, the activation functions (g_4t) are selected according to the type of response variable that needs to be used (for example, linear for continuous outcomes, sigmoid for binary outcomes, softmax for categorical outcomes and exponential for count data). It is important to point out that when in Figure 1 only one outcome is present, this model is reduced to a univariate model, but when there are more than two outcomes, the DL model is multivariate. According to the universal approximation theorem, a neural network with enough hidden units can approximate any arbitrary functional relationships [98,99]. DL with univariate or multivariate outcomes can be implemented in the keras library as front-end and Tensorflow as back-end [100] in a very user friendly way. Some results in the context of GS in wheat were revealed using the univariate and multi-trait DL models.

DL for univariate traits and for multiple-traits and multiple-environments

Using nine datasets of wheat and maize, Montesinos-López et al. [101] found that when the G × E interaction term was not taken into account, the DL method was better than the GBLUP model for six out of the nine datasets. However, when the G × E interaction term was considered, the GBLUP model was the best in eight out of nine datasets under study and, in this case, only in one dataset, the DL method was better than the GBLUP model. In another study Montesinos-López et al. [102] performed a benchmark study to compare univariate DL method with the support vector machine and the conventional Bayesian threshold best linear unbiased prediction (TGBLUP). The authors did not observe large differences between the three methods. However, in many cases, the TGBLUP outperformed the other two methods.

Next we compared the prediction performance of the multi-trait DL versus the Bayesian multi-trait and multi-environment model. The comparison was performed by [103] in two datasets of wheat and one of maize. The authors found that when the G × E interaction term was not taken into account in the three datasets under study, the best predictions were observed under the multi-trait DL model, but when the G × E interaction term was taken into account, the Bayesian multi-trait and multi-environment model outperformed the DL model. Montesinos-López et al. [104] also showed that the DL framework is very powerful for implementing multi-trait GS, but with mixed outcomes (binary, ordinal and continuous). In this publication they compared the prediction performance of the multi-trait DL with the univariate DL model. They found no relevant differences among the models in the dataset under study for any trait using one hidden layer when G × E interaction was considered. However, when the G × E interaction was ignored, they found statistical differences for grain yield, with a better performance under the multi-trait DL model. The average Pearson’s correlation in prediction ability was superior to the univariate DL by 22.44%. With two layers, the authors found the same results for grain yield with the multi-trait DL method being superior; however, with three layers, no statistical differences between the two models were found.

Instead of predicting all individuals, using classifiers in GS can be attractive because they are trained to maximize the probability of an individual being a member of the target class, rather than searching for its overall performance [53]. In a recent study, González-Camacho et al. [52] compared two classifiers, MLP and probabilistic neural network (PNN), for predicting the probability of an individual being a member of a target phenotypic class, using genomic and phenotypic data. The authors analyzed two traits (days to heading and grain yield) which were evaluated on 306 CIMMYT wheat lines and genotyped with 1717 markers. The grain yield was measured in seven environments, whereas days to heading was measured in 10 environments. The authors focused on the 15th and 30th percentiles of the upper and lower classes for selecting the best and lowest performers. The wheat datasets were also used for predicting a binary response variable (with two classes). The criterion for assessing the prediction ability of MLP and PNN was the area under the receiver operating characteristic curve (AUC). The parameters of both classifiers were estimated by optimizing the AUC for a specific target class. González-Camacho et al. [52] found that PNN was more accurate than MLP for assigning wheat lines to the correct upper, middle or lower class. Results for the wheat dataset with continuous traits split into two and three classes showed that the performance of PNN with three classes was higher than with two classes when classifying individuals into the upper and lower (15 or 30%) categories.

Although the advantages of using artificial neural networks and DL models when compared to conventional GS models are not apparent in our current studies, DL models look promising since there is empirical evidence that they outperform conventional models when very large datasets are available and efficiently handle larger data in their raw form. The latter makes it possible, for example, to more efficiently incorporate larger numbers of omics data (Metabolomics, Proteomics, Transcriptomics) in the same model, with such data becoming more and more accessible. Because genomic data represent a large number of independent variables and a small number of samples (observations), DL models are also difficult to implement; however, they offer many opportunities to design specific topologies (deep neural networks) that deal with any type of data in a better way than present models using GS.

INTEGRATING GS IN THE CIMMYT GLOBAL WHEAT PROGRAM

Overall, the many cross-validation experiments and studies performed in existing and new datasets and the comparison of different statistical and DL models by CIMMYT and many other research groups, have established that GS is a promising approach in wheat. Consequently, the multiple strategies for how to fit GS best into a wheat breeding program have more recently become the focus in the literature [105,106]. The optimal way to implement GS in plant breeding programs is not straightforward and is subject to the regularly applied breeding scheme and key traits targeted in each individual breeding program. Considering that CIMMYT breeding germplasm lines are related to each other as new lines are derived through intercrossing a diverse set of superior lines selected from previous cohorts, the initial aim in the CIMMYT Global Wheat Program was to implement GS, while not significantly changing the standard breeding method. Therefore, initially no specific set-up of GS (e.g., training population design, optimizing the relatedness of training and test populations) was considered.

Genomic Selection at Preliminary and Elite Yield Trial Stages

At present, GS is most routinely used in the CIMMYT spring bread wheat breeding program to increase the accuracy of line selection across breeding cycles, where (1) the marker effects are calculated in one generation in a single or more years for selection in the same generation in the next year, or (2) the genomic estimated breeding values (GEBVs) on the basis of a more advanced generation are used for the selection of lines in an earlier, next generation (Figure 2).

FIGURE 2

Figure 2. Simplified breeding scheme of the CIMMYT spring bread wheat breeding program, indicating the current implementation of GS. TC: top cross, YT: yield trial, EYT: elite yield trial, NARS: National Agricultural Research Services.

The CIMMYT wheat breeding programs in Mexico use a selected bulk scheme for generation advance with two selection cycles annually as the standard breeding method (Figure 2). The earliest entry point of GS in the CIMMYT spring bread wheat breeding program is currently the first yield trials (YT). Annually, approximately 9000 advanced lines are tested for grain yield in one environment with two replications at the Campo Experimental Norman E. Borlaug (CENEB), CIMMYT’s main wheat research station at Cd. Obregon, northern Mexico. In addition, the same lines are tested for stem and stripe rust resistance in one replication during the off-season in Njoro, Kenya. Figure 3 reveals the natural variation in grain yield across YT cycles because of the effect of environmental conditions and because the evaluated materials are change from cycle to cycle.

FIGURE 3

Figure 3. Violin plot of adjusted grain yield per PYT cycle.

Since 2013, all entries of the YT have been genotyped using GBS through the USAID Feed the Future ‘Applied Wheat Genomics Innovation Lab’ and ‘Delivering Genetic Gain in Wheat’ projects. The prediction abilities of the YT using models including markers, pedigrees and environment with all available historic data in the training population in each year varied from 0.20 to 0.42 for grain yield. The prediction abilities increased over time with recent values of 0.42 in cycle 2017–2018 [107] and 0.34 in cycle 2018–2019. We expect that the higher prediction abilities were caused by a larger overall training population size, improved genotypic data and improved statistical models. Each year, the GEBVs have been used to assist in the selection decisions. Figure 4 shows an example of the observed vs. predicted grain yield for 8927 lines in the YT from cycle 2018–2019 obtained using the reaction norm model [64] using markers and pedigree information jointly. The grain yield predictions for the 8,927 lines were performed using a training set of 43,315 lines evaluated during previous PYTs (2013–2014, 2014–2015, 2015–2016, 2016–2017, 2017–2018). Applying a selection intensity of 20%, a total of 763 entries (42.7%) were in common between the observed and predicted values and under the top performers for the observed and predicted values. In addition, GS was able to predict a large percentage of low performing lines, e.g., a total of 2715 lines (60.8%) were in common between the observed and predicted values when culling the lowest 50% of all entries.

FIGURE 4

Figure 4. Correlation between observed and predicted values in the PYT 2018–2019 cycle.

The genotyping of all YT entries from 2013 to 2019 has generated a large genetic resource with ~80K reported sequence tags in a total of 62,827 CIMMYT advanced breeding lines. This resource has also been used to evaluate GS in the elite yield trials (EYT) with 1,092 entries annually selected from the YTs, but much more extensively phenotyped [108–110]. Phenotypic evaluation routinely includes grain yield and grain yield components (e.g., days to heading, plant height, grain weight), disease resistance (e.g., stem rust, stripe rust, septoria tritici blotch, spot blotch), and traits related to end product quality (e.g., flour protein content, flour yield, alveograph, test weight, loaf volume). The prediction abilities across breeding cycles in the EYT from 2014 to 2017 were evaluated for each trait using the genomic best linear unbiased prediction (GBLUP) approach, where the lines of each EYT were predicted using the other three EYTs as training populations. Prediction abilities were particularly high for seedling and field resistance to stem rust and for several end product quality traits, up to 0.83 [32,110].

With the vast amount of phenotypic data available for the lines evaluated in the EYT, implementation of GS to further assist any selection decision is rather inefficient. However, the EYT across breeding cycles is used as a training population to predict the entries in the earlier PYT generation the following year, which are not tested for end product quality traits, with only one replication for stem rust resistance and with only two replications for grain yield under irrigated conditions but not under drought or heat stress conditions. This scaling-up to earlier generations thus benefits selection efficiency in the YT but includes an additional genotyping cost. Indirect cost-savings, however, are significant given the cost (e.g., 65USD/sample for all end-product quality traits), labor, and resource (e.g., land area, quantity of seed) constraints in earlier generations such as the YT.

Despite these good results throughout the years and continuous optimized logistics implementing GS in YT, CIMMYT wheat breeders have still not become fully convinced of using GS. This is probably mainly due to GS doing a poor job predicting all individuals and especially of finding the 10% top performers (often also called the ‘positive outliers’) that breeders aim to find. As outlined by Verges and van Sanford [111], the success of GS is usually measured by the prediction ability and the correlation between the predicted and observed phenotypic value, which, however, weakly reflects the implication for selection in a breeding program. The principal question for breeders e.g., in the YT is: “How many of the top performing lines are also correctly predicted and confirm my selection decision?” Our results in the YT confirm the suggestion of Bassi et al. [105] and Verges and van Sanford [111] that with a selection intensity of 20% (commonly used by breeders when selecting during early stages of testing) and with a prediction accuracy of 0.34, the 42% of the best lines were correctly selected with GS, greater than the expected accuracy provided by Pearson’s correlation. Equally, the percentage of correctly discarded lines can be high. However, with the current prediction abilities for grain yield with values between 0.3 and 0.5, the top individuals are unable to be predicted for such a complex trait with high G × E.

Hybrid Wheat Prediction

Hybrid wheat technology has caught renewed attention to increase wheat productivity. The ability to predict wheat hybrids using genomic information has greatly interested wheat breeders because it can enable them to select the best male, female or male-female combinations a priori. Such predictions can greatly facilitate hybrid wheat breeding by saving costs, selecting winning cross combinations and defining heterotic pools. A pilot study was therefore performed in the CIMMYT hybrid wheat program [112]. The prediction model included the GBLUP model with G × E interactions where similarity between lines was assessed by pedigree and molecular markers, and similarity between environments was accounted for by environmental covariables. Using the reaction norm approach suggested by Jarquín et al. [64], the model was extended to include additional terms uH, uM and uF to represent include genotype (hybrid) × environment interaction and parent (male and female) × environment interaction, respectively. The models were tested in four cross-validation designs: (1) both males and females, (2) males only, (3) females only and (4) none were tested for heterosis. The average grain yield prediction ability by design was (1) 0.61–0.65, design (2) 0.53–0.55, design (3) 0.39–0.51 and design (4) 0.46–0.61. In all designs the highest prediction accuracies were achieved with the most complex model (GBLUP + Pedigree + Environment + Hybrid × Environment + Parents × Environment Model). Unfortunately, hybrid wheat research has currently been discontinued at CIMMYT.

Using GS to Harness Gene Bank Accessions

Although the accessions stored in gene banks represent a rich asset for breeders, alleles need to be moved from the accessions to cultivar development programs. Lengthy pre-breeding programs are required to develop lines that combine favorable alleles from gene banks with good agronomic performance to be subsequently used as parents in a breeding program. One possible application of GS is to identify potentially useful lines stored in the gene banks that are integrated as candidates in pre-breeding programs. Based on the simulation of various pre-breeding options, Gorjanc et al. [113] concluded that germplasm enhancement breeding programs could be initiated directly from landraces or from landraces crossed with elite testers using GS.

Crossa et al. [114] examined the genomic-enabled prediction accuracy of 8416 Mexican wheat landrace accessions and 2403 Iranian wheat landrace accessions stored in the CIMMYT gene bank. Two traits were measured in two environments and several other highly heritable traits were measured in a single optimum environment. The authors studied two genomic prediction strategies: (1) random cross-validation schemes where 20% of the accessions form the training set and 80% of the accessions comprise the testing set, and (2) prediction accuracy of reference core sets of sizes 10% and 20% of the total population to predict the remaining 90% and 80% of the accessions, respectively. Genomic predictions were generally of a magnitude (0.18–0.65, with a 20% core as the training population) that could be very useful for predicting the value of other accessions in the gene bank.

Crossa et al. [48] reflect on the issues from pre-breeders and genetic resource conservationists, that is, can GS be employed to accelerate the flow of favorable alleles from the gene bank to form gene pools and advanced breeding populations, taking a target trait together with the number of agronomically required traits into account? This pre-breeding strategy has not been directly addressed at CIMMYT yet; however, several studies to explore the use of GS in pre-breeding are underway with the aim of increasing the use of gene bank accessions.

REORGANIZING THE CIMMYT WHEAT BREEDING PROGRAM

CIMMYT is in the process of piloting and adopting new approaches to increase the current rate of genetic gain for grain yield through the Bill & Melinda Gates Foundation (BMGF) and Department for International Development (DFID) funded projects. While up to now, GS has mainly been implemented to assist in selection decisions, this project envisions that GS will become a mainstream process to increase the rate of genetic gain, mainly by reducing the generation interval through rapid cycling.

Shortening the Generation Interval

Recently, a two-part breeding strategy that uses GS was proposed to develop inbred lines [106,115]. The strategy reorganizes a breeding program in two distinct components (1) a population improvement component to identify parents for subsequent breeding cycles and increase the frequency of favorable alleles through rapid recurrent GS, and (2) a product development component to develop advanced breeding lines. The population improvement component relies on recurrent selection in an early breeding generation using GEBVs, and is expected to result in a fast increase of population means. Selected early generation plants enter the product development component to identify the superior inbreds. The product development component corresponds to regular breeding schemes for the development of inbred lines including GS. GS models are routinely updated using the phenotypic and genomic data generated in the course of product developments. By performing computer simulations comparing the two-part strategy with more standard and GS breeding schemes, Gaynor et al. [106] showed that long-term genetic gain from the two-part strategy increased by up to 1.5 times compared with the best performing GS strategy.

A second approach to turn generations over more quickly is rapid generation advance or ‘Speed Breeding’ [116]. By growing the plants in a temperature-controlled glasshouse under a prolonged photoperiod, plant growth can be accelerated to turn generations over in 2 to 3 months. Breeders may also be able to select for some basic traits (e.g., plant height, major disease resistance) under speed breeding conditions either phenotypically or using gene-associated molecular markers. Eventually the two-part strategy and speed breeding can be combined; however, both approaches require substantial reorganization of a wheat breeding program, considering infrastructure, costs, real-time logistics, traits to be evaluated at different stages, number of populations and population sizes to be used.

CIMMYT has gained some initial experience on testing the response to selection for grain yield based on genomic predictions in early breeding generations in the framework of a BMGF-funded project. In this project, a random subset of around 200 F₂ plants was selected from a larger set of GBS genotyped F₂ plants derived from 40 different crosses. F₂ derived F₄ bulks were tested in replicated yield trials under optimal conditions for two years at CENEB (Bonnett, unpublished data). Response to selection calculating the GEBVs of F₂ plants was assessed. Among the three models tested, the RKHS model (described above) showed correlation with the average yield. Lines derived from F₂ plants with the highest 20% GEBV values from RKHS on average yielded 7% more than those derived from F₂s with the lowest 20% GEBVs. These results are probably one of the first empirical evidence for a possible realized gain in grain yield obtained by applying GS in early breeding generations (Bonnett, in preparation).

CIMMYT will pilot a new breeding scheme combining both the two-part strategy and the speed breeding approach (but modified) with the aim of accelerating the development of yield competitive high-Zn wheat varieties. The scheme includes a rapid turnover of generations as bulks through speed breeding and early generation GS which will include the earliest possible phenotype for Zn for selecting parents for the subsequent breeding cycle. Moreover, GEBV will be used to skip the YT-stage by directly going to EYT in 2-3 environments, thus saving 1 year in generation advancement and an additional year in yield phenotyping.

Using Genome-Wide Predictions for Cross Design

The design of new crosses is one of the most important decisions to be made in a breeding program. In the CIMMYT wheat breeding programs, the selection of crosses is a balance between choosing the best × best parents and maintaining genetic diversity for long-term genetic gains [117]. Because CIMMYT is distributing varietal candidates for further selection to identify adopted varieties to be grown in developing countries all over the world, maintaining genetic diversity has historically gained significant importance, and diverse genetic materials (e.g., landraces, synthetic hexaploid wheat, wild relatives) are routinely introgressed.

Endelman [47] was the first person to propose using genome-wide predictions to design the crossing schemes in a plant breeding program. GS has been used extensively to predict the breeding values of animals used to generate progenies. The ability to predict the potential of a cross before it is created, allows more efficient use of the genetic and financial resources in a standard breeding program, especially in large breeding programs such as the CIMMYT breeding program, where large numbers of potential parents are tested, and a relatively high number of crosses are made annually. Lado et al. [118] has tested strategies for the selection of crosses using genome-wide prediction in the CIMMYT and MLP (Instituto Nacional de Investigación Agropecuaria) – Uruguay spring bread wheat breeding programs. Means and variances of all possible cross combinations were predicted for grain yield and quality traits. While the predicted mean progeny performance was the strongest driver for selecting superior crosses for grain yield, the predicted variance of the progeny was of larger relative importance for the quality traits. The same results were observed by Yao et al. [119], who consequently developed a selection index that allowed identifying candidate parents for improving yield and quality traits simultaneously.

Promising new statistical models have more recently been reported that balance selection while maintaining genetic diversity, which is important at CIMMYT. Genomic mating [120] or optimum cross selection [121] penalizes the selection of individuals that are too closely related or includes information on the complementarity of parents to be mated. Using genomic prediction to identify promising crosses can be introduced into a breeding program with limited impact on the overall logistics of the program. Therefore, CIMMYT breeders are intrigued by the possibility of exploring the potential of genome-wide prediction for cross design to allow a portion of their yearly crosses to be selected on this basis. However, these approaches require a good database for routine application. The Enterprise Breeding System (EBS) under development for utilization by CGIAR and national breeding programs should help in the routine use of sophisticated prediction tools.

Increasing Family Size

Juliana et al. [108,109] investigated the family structures in CIMMYT yield trials. In both YT and EYT, the sizes of families derived from crosses made between two or three parents are relatively small. For example. in the YT from 2018–2019, 25% of the families were only represented by 1 line and less than 1% of the families had more than 20 full-sib lines. The same authors also observed that the phenotypic variance between full-sibs is low. Hickey et al. [122] simulated that GEBVs within bi-parental families increase from 0.4 to 0.6, considering 20 and 50 phenotypes, respectively. Verges and van Sanford [111] concluded that a minimum of 25 phenotypes per family would be needed in their study to stabilize prediction accuracies at a preliminary yield trial level. The issue of number versus size of breeding populations has been discussed and tested in the literature since the 1980s. More recently, Bernardo [123] and Witcombe et al. [124] both concluded that the ability to identify the breeding populations with the highest mean performance prior to making the crosses (and thus, parental selection) was most important in comparison to the number and size of the breeding population. In an applied GS context, there seems to be scope for a better selection of parents and for increasing the family size of populations with a reduction in the total number of populations.

Sparse Testing in Target Populations of Environments

CIMMYT and ICARDA annually distribute around 1000 genotypes to national research programs in global wheat growing environments through the International Wheat Improvement Network (IWIN) (Figure 2). The impact of IWIN at the national program and farm level has been well documented [125,126]. More recent CIMMYT projects such as the USAID Feed the Future ‘Applied Wheat Genomics Innovation Lab’ have a significant focus on improving the line testing capacity and advanced training in national programs across various countries. This increased testing capacity has already allowed larger subsets of CIMMYT advanced breeding lines to be shared 1–2 generations earlier (e.g., national programs in India, Pakistan, Bangladesh and Nepal). Generated phenotypic data in these target populations of environments can be included in GS training and validation populations to assist in selection decisions [80]. To expand on this concept, e.g., the USAID project foresees building on this valuable phenotyping and breeding network and establishing a coordinated sparse testing model that can additionally include test sites with more limited capacity. GEBVs of an overall larger set of potential new high yielding lines for each target population of environments could be shared within the network, and seed of promising but non-phenotyped lines grown in the next generation.

In this context, we recently studied three general cases of sparse testing allocation designs: (1) complete non-overlapping of lines in environments, (2) complete overlapping of lines tested in all of the environments and (3) combinations of the two previous cases where certain numbers of non-overlapping/overlapping lines were distributed across the environments. We also studied several cases where the size of the testing population was reduced. This study used three extensive wheat datasets. Four different prediction models were used to study the effect of sparse testing in terms of the genomic-enabled prediction abilities; two models did not include G × E, whereas the other two models incorporated two forms of modeling G × E. The results showed that the prediction models that included G × E captured more genetic variability than the models with only the main genomic effects (G) term for all three datasets. Also, both G × E models provided overall higher prediction abilities for the different allocation designs comprising different combinations of non-overlapping/overlapping lines in the environments. Reducing the size of the testing populations under all allocation designs decreased the prediction accuracy. Models including G × E offered the possibility of maintaining the prediction abilities higher when the two extreme situations occurred [(1) all non-overlapping lines and (2) all overlapping lines)], while reducing the size of the training set. These initial results of genomic-enabled prediction abilities for sparse testing on wheat datasets indicated that substantial savings of testing resources can be achieved by using allocation designs and applying prediction models incorporating G × E. Reducing the size of the testing sets always reduced the genomic-enabled prediction ability.

CONCLUSIONS

Increasing crop yield potential is an essential factor for future food security. To achieve this, new breeding strategies and technologies are required to boost genetic gains. During the past two decades, successful studies on GS (including wheat) have been reported and have left little doubt that GS is one of the breeding tools that can pave a path to continued crop improvement in the near future. However, it has also become apparent that while the methodological framework has been established, optimal strategies to implement GS are subject to the regular applied breeding scheme and key traits targeted in each individual breeding program. Overall, GS can be effective in a breeding program when prediction abilities are high, when traits are difficult or impossible to phenotype, or when the cost of genotyping outweighs the cost of phenotyping.

For mainstreaming GS, CIMMYT is in the process of building the computational infrastructure to store genotypic, phenotypic and pedigree data, to combine these data and make genome-wide prediction, and to store the predicted values to be accessed by breeders while making selection decisions.

CIMMYT is implementing GS in its regular applied breeding scheme and is looking towards redesigning some parts of the breeding program to explore accelerating genetic gains by shortening the breeding cycle. Subsequent empirical studies will be essential to prove the efficiency of these GS-featured breeding strategies. We want to emphasize that implementing GS alone cannot be the only solution to close the worrisome gap between current production trends and the projected future demand of crops, and breeding programs are required to continuously explore further technologies and breeding strategies.

AUTHOR CONTRIBUTIONS

Susanne Dreisigacker conceptualized the manuscript, authors Susanne Dreisigacker, Jose Crossa, Paulino Pérez-Rodríguez, Osval A. Montesinos-López and Umesh Rosyara wrote the manuscript, and all authors reviewed the full manuscript.

CONFLICTS OF INTEREST

The authors declare no conflict of interest.

FUNDING

Support for GS research at CIMMYT was provided by the by the CGIAR Research Program on Wheat, the Feed the Future Innovation Lab for Applied Wheat Genomics project through the US Agency for International Development (USAID), under the terms of contract no. AIDOAA-A-13-00051; the Delivering Genetic Gain in Wheat project through the UK Government’s Department of International Development (DFID) and the Bill & Melinda Gates Foundation (Grant No.: OPP1133199); the Genomic Selection: The next frontier for rapid gains in maize and wheat improvement project through the Bill & Melinda Gates Foundation Grant No.: OPP101638; and the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806.

ACKNOWLEDGMENTS

The authors are grateful to their CIMMYT colleagues for intensive discussions on the subject, CIMMYT field and lab technicians who helped to record phenotypic and genotypic data, collaborators worldwide, as well as scientists in national programs who in their countries collected the valuable data used in the various studies. We especially wish to express our sincere gratitude to the innovation lab technicians at Kansas State University, led by Jesse Poland, for their support in implementing GS in the spring bread wheat breeding program and generating the genotyping data (particularly with the support of Sandesh Shrestha). The authors also thank Marcia MacNeil and Alma McNab for editing the revised version of the manuscript.

REFERENCES

1. Braun H-J, Atlin G, Payne T. Multi-location testing as a tool to identify plant response to global climate change. In: Reynolds MP, editor. Climate change and crop production. Wallingford (UK): CABI, Publishers; 2010. p. 115-38.
View Article PubMed/NCBI Google Scholar

2. Foley JA, Ramankutty N, Brauman KA, Cassidy ES, Gerber JS, Johnston M, et al. Solutions for a cultivated planet. Nature. 2011;478(7369):337-42.
View Article PubMed/NCBI Google Scholar

3. Godfray HCJ, Crute IR, Haddad L, Muir JF, Nisbett N, Lawrence D, et al. The future of the global food system. Philos Trans R Soc B Biol Sci. 2010;365(1554):2769-77.
View Article PubMed/NCBI Google Scholar

4. Lantican MA, Braun H-J, Payne TS, Singh RP, Sonder K, Baum M, et al. Impacts of International Wheat Research 1994-2014. Mexico (Mexico): CIMMYT; 2016.
View Article PubMed/NCBI Google Scholar

5. Mondal S, Dutta S, Crespo-Herrera L, Huerta-Espino J, Braun HJ, Singh RP. Fifty years of semi-dwarf spring wheat breeding at CIMMYT: Grain yield progress in optimum, drought and heat stress environments. F Crop Res. 2020;250(February):107757.
View Article PubMed/NCBI Google Scholar

6. Fischer R, Wall PC. Wheat breeding in Mexico and yield increases. J Aust Inst Agric Sci. 1976;42:139-48.
View Article PubMed/NCBI Google Scholar

7. Waddington SR, Ransom JK, Osmanzai M, Saunders DA. Improvement in the Yield Potential of Bread Wheat Adapted to Northwest Mexico 1. Crop Sci. 1986;26(4):698-703.
View Article PubMed/NCBI Google Scholar

8. Sayre KD, Rajaram S, Fischer RA. Yield potential progress in short bread wheats in northwest Mexico. Crop Sci. 1997;37(1):36-42.
View Article PubMed/NCBI Google Scholar

9. Lopes MS, Reynolds MP, Jalal-Kamali MR, Moussa M, Feltaous Y, Tahir ISA, et al. The yield correlations of selectable physiological traits in a population of advanced spring wheat lines grown in warm and drought environments. F Crop Res. 2012;128:129-36.
View Article PubMed/NCBI Google Scholar

10. Aisawi KAB, Reynolds MP, Singh RP, Foulkes MJ. The physiological basis of the genetic progress in yield potential of CIMMYT spring wheat cultivars from 1966 to 2009. Crop Sci. 2015;55(4):1749-64.
View Article PubMed/NCBI Google Scholar

11. Crespo-Herrera LA, Crossa J, Huerta-Espino J, Vargas M, Mondal S, Velu G, et al. Genetic gains for grain yield in cimmyt’s semi-arid wheat yield trials grown in suboptimal environments. Crop Sci. 2018;58(5):1890-8.
View Article PubMed/NCBI Google Scholar

12. Crespo-Herrera LA, Crossa J, Huerta-Espino J, Autrique E, Mondal S, Velu G, et al. Genetic yield gains in CIMMYT’S international elite spring wheat yield trials by modeling the genotype × environment interaction. Crop Sci. 2017;57(2):789-801.
View Article PubMed/NCBI Google Scholar

13. Honsdorf N, Mulvaney MJ, Singh RP, Ammar K, Burgueño J, Govaerts B, et al. Genotype by tillage interaction and performance progress for bread and durum wheat genotypes on irrigated raised beds. F Crop Res. 2018;216:42-52.
View Article PubMed/NCBI Google Scholar

14. Gerard GS, Crespo-Herrera LA, Crossa J, Mondal S, Velu G, Juliana P, et al. Grain yield genetic gains and changes in physiological related traits for CIMMYT’s High Rainfall Wheat Screening Nursery tested across international environments. F Crop Res. 2020;249(February):107742.
View Article PubMed/NCBI Google Scholar

15. Berkman PJ, Lai K, Lorenc MT, Edwards D. Next-generation sequencing applications for wheat crop improvement. Am J Bot. 2012;99(2):365-71.
View Article PubMed/NCBI Google Scholar

16. Gardiner L-J, Joynson R, Hall A. Next-Generation Sequencing Enabled Genetics in Hexaploid Wheat. In: Miedaner T, Korzun V, editors. Applications of Genetic and Genomic Research in Cereals. Cambridge (United Kingdom): Woodhead Publishing; 2019. p. 49-63.
View Article PubMed/NCBI Google Scholar

17. Paux E, Roger D, Badaeva E, Gay G, Bernard M, Sourdille P, et al. Characterizing the composition and evolution of homoeologous genomes in hexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J. 2006;48(3):463-74.
View Article PubMed/NCBI Google Scholar

18. Wanjugi H, Coleman-Derr D, Huo N, Kianian SF, Luo MC, Wu J, et al. Rapid development of PCR-based genome-specific repetitive DNA junction markers in wheat. Genome. 2009;52(6):576-87.
View Article PubMed/NCBI Google Scholar

19. Ling H-Q, Ma B, Shi X, Liu H, Dong L, Sun H, et al. Genome sequence of the progenitor of wheat A subgenome Triticum urartu. Nature. 2018;557:424-8.
View Article PubMed/NCBI Google Scholar

20. Luo MC, Gu YQ, Puiu D, Wang H, Twardziok SO, Deal KR, et al. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii. Nature. 2017;551(7681):498-502.
View Article PubMed/NCBI Google Scholar

21. Appels R, Eversole K, Feuillet C, Keller B, Rogers J, Stein N, et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361:6403.
View Article PubMed/NCBI Google Scholar

22. Maccaferri M, Harris NS, Twardziok SO, Pasam RK, Gundlach H, Spannagl M, et al. Durum wheat genome highlights past domestication signatures and future improvement targets. Nat Genet. 2019;51(5):885-95.
View Article PubMed/NCBI Google Scholar

23. Jia M, Guan J, Zhai Z, Geng S, Zhang X, Mao L, et al. Wheat functional genomics in the era of next generation sequencing: An update. Crop J. 2018;6(1):7-14.
View Article PubMed/NCBI Google Scholar

24. Dreisigacker S, Sehgal D, Singh RP, Sansaloni C, Braun H-J. Application of genetic and genomic tools in wheat for developing countries. In: Miedaner T., Korzun V, editors. Application of Genetic and Genomic Research in Cereals. Cambridge (United Kingdom): Woodhead Publishing; 2019. p. 251-71.
View Article PubMed/NCBI Google Scholar

25. Lan C, Basnet BR. Overview of Bi-Parental QTL Mapping and Cloning Genes in the Context of Wheat Rust. In: Dreisigacker S, Sehgal D, Reyes Jaimez AE, Luna Garrido B, Zavala, Muñoz S, Núñez Ríos C, editors. CIMMYT Wheat Molecular Genetics: Laboratory protocols and applications to wheat breeding. Mexico (Mexico): CIMMYT; 2016. p. 39-46.
View Article PubMed/NCBI Google Scholar

26. Crossa J, Burguen J, Dreisigacker S, Vargas M, Herrera-Foessel SA, Lillemo M, et al. Association Analysis of Historical Bread Wheat Germplasm Using Additive Genetic Covariance of Relatives and Population Structure. Genetics. 2007;177:1-25.
View Article PubMed/NCBI Google Scholar

27. Lopes MS, Dreisigacker S, Peña RJ, Sukumaran S, Reynolds MP. Genetic characterization of the wheat association mapping initiative (WAMI) panel for dissection of complex traits in spring wheat. Theor Appl Genet. 2015;128(3):453-64.
View Article PubMed/NCBI Google Scholar

28. Sukumaran S, Dreisigacker S, Lopes M, Chavez P, Reynolds MP. Genome-wide association study for grain yield and related traits in an elite spring wheat population grown in temperate irrigated environments. Theor Appl Genet. 2014;128(2):353-63.
View Article PubMed/NCBI Google Scholar

29. Singh PK, Crossa J, Duveiller E, Singh RP, Djurle A. Association mapping for resistance to tan spot induced by Pyrenophora tritici-repentis race 1 in CIMMYTs historical bread wheat set. Euphytica. 2016;207(3):515-25.
View Article PubMed/NCBI Google Scholar

30. Sehgal D, Autrique E, Singh R, Ellis M, Singh S, Dreisigacker S. Identification of genomic regions for grain yield and yield stability and their epistatic interactions. Sci Rep. 2017;7:1-12.
View Article PubMed/NCBI Google Scholar

31. Sehgal D, Mondal S, Guzman C, Garcia Barrios G, Franco C, Singh R, et al. Validation of Candidate Gene-Based Markers and Identification of Novel Loci for Thousand-Grain Weight in Spring Bread Wheat. Front Plant Sci. 2019;10(September):1-12.
View Article PubMed/NCBI Google Scholar

32. Battenfield SD, Guzmán C, Chris Gaynor R, Singh RP, Peña RJ, Dreisigacker S, et al. Genomic selection for processing and end-use quality traits in the CIMMYT spring bread wheat breeding program. Plant Genome. 2016;9(2):plantgenome2016-01.
View Article PubMed/NCBI Google Scholar

33. Juliana P, Singh RP, Singh PK, Poland JA, Bergstrom GC, Huerta-Espino J, et al. Genome-wide association mapping for resistance to leaf rust, stripe rust and tan spot in wheat reveals potential candidate genes. Theor Appl Genet. 2018;(0123456789):1-18.
View Article PubMed/NCBI Google Scholar

34. Gupta V, He X, Kumar N, Fuentes-Davila G, Sharma RK, Dreisigacker S, et al. Genome wide association study of karnal bunt resistance in a wheat germplasm collection from Afghanistan. Int J Mol Sci. 2019;20(13):3124.
View Article PubMed/NCBI Google Scholar

35. Muqaddasi QH, Reif JC, Li Z, Basnet BR, Dreisigacker S, Röder MS. Genome-wide association mapping and genome-wide prediction of anther extrusion in CIMMYT spring wheat. Euphytica. 2017;213(3):73.
View Article PubMed/NCBI Google Scholar

36. Velu G, Singh RP, Crespo-Herrera L, Juliana P, Dreisigacker S, Valluru R, et al. Genetic dissection of grain zinc concentration in spring wheat for mainstreaming biofortification in CIMMYT wheat breeding. Sci Rep. 2018;8(1):1-10.
View Article PubMed/NCBI Google Scholar

37. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome wide dense marker map. Genetics. 2001;157:1819-29.
View Article PubMed/NCBI Google Scholar

38. Bernardo R. Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci. 1994;34(1):20-5.
View Article PubMed/NCBI Google Scholar

39. Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME. Plant breeding with Genomic selection: Gain per unit time and cost. Crop Sci. 2010;50(5):1681-90.
View Article PubMed/NCBI Google Scholar

40. Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H, et al. Genomic Selection in Plant Breeding. Knowledge and Prospects. Adv Agron. 2011;110:77-123.
View Article PubMed/NCBI Google Scholar

41. Pérez P, De Los Campos G. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 2014;198(2):483-95.
View Article PubMed/NCBI Google Scholar

42. Crossa J, Pérez P, Hickey J, Burgueño J, Ornella L, Cerón-Rojas J, et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. 2014;112(1):48-60.
View Article PubMed/NCBI Google Scholar

43. Purdy LH, Loegering WQ, Konzak CF, Peterson C., Allan RE. A Proposed Standard Method for Illustrating Pedigrees of Small Grain Varieties. Crop Sci. 1968;8:405-6.
View Article PubMed/NCBI Google Scholar

44. Payne TS, Skovmand B, Lopez CG, Brandon E, McNab A. The international wheat information system (IWIS). Mexico (Mexico): CIMMYT; 2002.
View Article PubMed/NCBI Google Scholar

45. De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics. 2009;182(1):375-85.
View Article PubMed/NCBI Google Scholar

46. Crossa J, De Los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics. 2010;186(2):713-24.
View Article PubMed/NCBI Google Scholar

47. Endelman JB. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome. 2011;4(3):250-5.
View Article PubMed/NCBI Google Scholar

48. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, et al. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017;22(11):961-75.
View Article PubMed/NCBI Google Scholar

49. Gota M, Gianola D. Kernel-based whole-genome prediction of complex traits: A review. Front Genet. 2014;5:1-13.
View Article PubMed/NCBI Google Scholar

50. Gianola D, Weigel KA, Krämer N, Stella A, Schön CC. Enhancing genome-enabled prediction by bagging genomic BLUP. PLoS One. 2014;9(4):e91693.
View Article PubMed/NCBI Google Scholar

51. González-Camacho JM, de los Campos G, Pérez P, Gianola D, Cairns JE, Mahuku G, et al. Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet. 2012;125(4):759-71.
View Article PubMed/NCBI Google Scholar

52. González-Camacho JM, Crossa J, Pérez-Rodríguez P, Ornella L, Gianola D. Genome-enabled prediction using probabilistic neural network classifiers. BMC Genomics. 2016;17(1):1-16.
View Article PubMed/NCBI Google Scholar

53. Ornella L, Pérez P, Tapia E, González-Camacho JM, Burgueño J, Zhang X, et al. Genomic-enabled prediction with classification algorithms. Heredity. 2014;112(6):616-26.
View Article PubMed/NCBI Google Scholar

54. Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 (Bethesda). 2012;2(12):1595-605.
View Article PubMed/NCBI Google Scholar

55. Gianola D, Van Kaam JBCHM. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics. 2008;178(4):2289-303.
View Article PubMed/NCBI Google Scholar

56. Gianola D, Fernando RL, Stella A. Genomic-Assisted Prediction of Genetic Value with Semiparametric Procedures. Genetics. 2006;173(3):1761-76.
View Article PubMed/NCBI Google Scholar

57. Gianola D, Okut H, Weigel KA, Rosa GJM. Predicting complex quantitative traits with Bayesian neural networks: A case study with Jersey cows and wheat. BMC Genet. 2011;12:4-7.
View Article PubMed/NCBI Google Scholar

58. De Los Campos G, Gianola D, Rosa GJM, Weigel KA, Crossa J. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb). 2010;92(4):295-308.
View Article PubMed/NCBI Google Scholar

59. Pérez-Elizalde S, Cuevas J, Pérez-Rodríguez P, Crossa J. Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction. J Agric Biol Environ Stat. 2015;20(4):512-32.
View Article PubMed/NCBI Google Scholar

60. Cuevas J, Crossa J, Soberanis V, Pérez-Elizalde S, Pérez-Rodríguez P, de los Campos G, et al. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models. Plant Genome. 2016;9(3):plantgenome2016.03.0024.
View Article PubMed/NCBI Google Scholar

61. Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rodríguez P, de los Campos G. Bayesian genomic prediction with genotype × environment interaction kernel models. G3 (Bethesda). 2017;7(1):41-53.
View Article PubMed/NCBI Google Scholar

62. Cuevas J, Granato I, Fritsche-Neto R, Montesinos-Lopez OA, Burgueño J, Sousa MB, et al. Genomic-enabled prediction Kernel models with random intercepts for multi-environment trials. G3 (Bethesda). 2018;8(4):1347-65.
View Article PubMed/NCBI Google Scholar

63. Bandeira e Sousa M, Cuevas J, Couto EG de O, Pérez-Rodríguez P, Jarquín D, Fritsche-Neto R, et al. Genomic-enabled prediction in maize using kernel models with genotype × environment interaction. G3 (Bethesda). 2017;7(6):1995-2014.
View Article PubMed/NCBI Google Scholar

64. Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou J, et al. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet. 2014;127(3):595-607.
View Article PubMed/NCBI Google Scholar

65. Cuevas J, Montesinos-López O, Juliana P, Guzmán C, Pérez-Rodríguez P, González-Bucio J, et al. Deep Kernel for genomic and near infrared predictions in multi-environment breeding trials. G3 (Bethesda). 2019;9(9):2913-24.
View Article PubMed/NCBI Google Scholar

66. Cho Y, Saul LK. Kernel methods for deep learning. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, editors. Advances in Neural Information Processing Systems 22 (NIPS 2009). Vancouver (Canada): JLMR publisher; 2009. p. 342-50.
View Article PubMed/NCBI Google Scholar

67. Crossa J, Martini JWR, Gianola D, Pérez-Rodríguez P, Jarquin D, Juliana P, et al. Deep Kernel and Deep Learning for Genome-Based Prediction of Single Traits in Multienvironment Breeding Trials. Front Genet. 2019;10(December):1-13.
View Article PubMed/NCBI Google Scholar

68. Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. 2012;52(2):707-19.
View Article PubMed/NCBI Google Scholar

69. Heslot N, Akdemir D, Sorrells ME, Jannink JL. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet. 2014;127(2):463-80.
View Article PubMed/NCBI Google Scholar

70. Vanraden PM. Genomic measures of relationship and inbreeding. Interbull Bull. 2007;25(37):33.
View Article PubMed/NCBI Google Scholar

71. VanRaden PM. Efficient Methods to Compute Genomic Predictions. J Dairy Sci. 2008;91(11):4414-23.
View Article PubMed/NCBI Google Scholar

72. Pérez-Rodríguez P, Crossa J, Bondalapati K, De Meyer G, Pita F, De Los Campos G. A pedigree-based reaction norm model for prediction of cotton yield in multienvironment trials. Crop Sci. 2015;55(3):1143-51.
View Article PubMed/NCBI Google Scholar

73. Crossa J, De Los Campos G, Maccaferri M, Tuberosa R, Burgueño J, Pérez-Rodríguez P. Extending the marker × Environment interaction model for genomic-enabled prediction and genome-wide association analysis in durum wheat. Crop Sci. 2016;56(5):2193-209.
View Article PubMed/NCBI Google Scholar

74. Velu G, Crossa J, Singh RP, Hao Y, Dreisigacker S, Perez-Rodriguez P, et al. Genomic prediction for grain zinc and iron concentrations in spring wheat. Theor Appl Genet. 2016;129(8).
View Article PubMed/NCBI Google Scholar

75. Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jannink J-L, et al. Increased Prediction Accuracy in Wheat Breeding Trials Using a Marker × Environment Interaction Genomic Selection Model. G3 (Bethesda). 2015;5(4):569-82.
View Article PubMed/NCBI Google Scholar

76. Misztal I, Aguilar I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Interbull Bull. 2009;40:240-4.
View Article PubMed/NCBI Google Scholar

77. Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92(9):4656-63.
View Article PubMed/NCBI Google Scholar

78. Aguilar I, Misztal I, Legarra A, Tsuruta S. Efficient computation of the genomic relationship matrix and other matrices used in single-step evaluation. J Anim Breed Genet. 2011;128(6):422-8.
View Article PubMed/NCBI Google Scholar

79. Christensen OF. Correction: Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation. Genet Sel Evol. 2012;44:37.
View Article PubMed/NCBI Google Scholar

80. Pérez-Rodríguez P, Crossa J, Rutkoski J, Poland J, Singh R, Legarra A, et al. Single-Step Genomic and Pedigree Genotype × Environment Interaction Models for Predicting Wheat Lines in International Environments. Plant Genome. 2017;10(2):plantgenome2016.09.0089.
View Article PubMed/NCBI Google Scholar

81. Rutkoski J, Poland J, Mondal S, Autrique E, González Párez L, Crossa J, et al. Canopy Temperature and Vegetation Indices from High-Throughput Phenotyping Improve Accuracy of Pedigree and Genomic Selection for Grain Yield in Wheat. G3 Genes|Genomes|Genetics [Internet]. 2016;6(September):1-36.
View Article PubMed/NCBI Google Scholar

82. Montesinos-López A, Montesinos-López OA, Cuevas J, Mata-López WA, Burgueño J, Mondal S, et al. Genomic Bayesian functional regression models with interactions for predicting wheat grain yield using hyper-spectral image data. Plant Methods. 2017;13(1):1-29.
View Article PubMed/NCBI Google Scholar

83. Montesinos-López OA, Montesinos-López A, Crossa J, los Campos G, Alvarado G, Suchismita M, et al. Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data. Plant Methods. 2017;13(1):1-23.
View Article PubMed/NCBI Google Scholar

84. Krause MR, González-Pérez L, Crossa J, Pérez-Rodríguez P, Montesinos-López O, Singh RP, et al. Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3 (Bethesda). 2019;9(4):1231-47.
View Article PubMed/NCBI Google Scholar

85. Zondervan KT, Cardon LR. The complex interplay among factors that influence allelic association. Nat Rev Genet. 2004;5(2):89-100.
View Article PubMed/NCBI Google Scholar

86. Cuyabano BCD, Su G, Rosa GJM, Lund MS, Gianola D. Bootstrap study of genome-enabled prediction reliabilities using haplotype blocks across Nordic Red cattle breeds. J Dairy Sci. 2015;98(10):7351-63.
View Article PubMed/NCBI Google Scholar

87. Cuyabano BCD, Su G, Lund MS. Selection of haplotype variables from a high-density marker map for genomic prediction. Genet Sel Evol. 2015;47(1):1-11.
View Article PubMed/NCBI Google Scholar

88. Jiang Y, Schmidt RH, Reif JC. Haplotype-based genome-wide prediction models exploit local epistatic interactions among markers. G3 (Bethesda). 2018;8(5):1687-99.
View Article PubMed/NCBI Google Scholar

89. Cuyabano BCD, Su G, Lund MS. Genomic prediction of genetic merit using LD-based haplotypes in the Nordic Holstein population. BMC Genomics. 2014;15(1):1-11.
View Article PubMed/NCBI Google Scholar

90. Clark AG. The role of haplotypes in candidate gene studies. Genet Epidemiol. 2004;27(4):321-33.
View Article PubMed/NCBI Google Scholar

91. Zhang Z, Wang W, Valdar W. Bayesian modeling of haplotype effects in multiparent populations. Genetics. 2014;198(1):139-56.
View Article PubMed/NCBI Google Scholar

92. Spindel JE, Begum H, Akdemir D, Collard B, Redoña E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395-408.
View Article PubMed/NCBI Google Scholar

93. Bian Y, Holland JB. Enhancing genomic prediction with genome-wide association studies in multiparental maize populations. Heredity. 2017;118(6):585-93.
View Article PubMed/NCBI Google Scholar

94. Jiang Y, Reif JC. Modeling epistasis in genomic selection. Genetics. 2015;201(2):759-68.
View Article PubMed/NCBI Google Scholar

95. Sehgal D, Rosyara U, Mondal S, Singh R, Poland J, Dreisigacker S. Incorporating Genome-Wide Association Mapping Results Into Genomic Prediction Models for Grain Yield and Yield Stability in CIMMYT Spring Bread Wheat. Front Plant Sci. 2020;11(March):1-12.
View Article PubMed/NCBI Google Scholar

96. Samuel AL. Some studies in machine learning using the game of checkers. IBM J Res Dev. 1959;3(3):207-19.
View Article PubMed/NCBI Google Scholar

97. Lewis ND. Deep learning made easy with R. A gentle introduction for data science. Scotts Valley (CA, USA): CreateSpace Independent Publishing Platform; 2016.
View Article PubMed/NCBI Google Scholar

98. Cybenko G. Approximation by superpositions of a sigmoidal function. MathControl Signals Syst. 1989;2:303-14.
View Article PubMed/NCBI Google Scholar

99. Hornik K. Approximation Capabilities of Multilayer Neural Network. Neural Networks. 1991;4(1991):251-7.
View Article PubMed/NCBI Google Scholar

100. Chollet F, Allaire JJ. Deep learning with R. New Delhi (India): Manning Publications, Manning Early Access Program (MEA); 2017.
View Article PubMed/NCBI Google Scholar

101. Montesinos-López OA, Montesinos-López A, Crossa J, Gianola D, Hernández-Suárez CM, Martín-Vallejo J. Multi-trait, multi-environment deep learning modeling for genomic-enabled prediction of plant traits. G3 (Bethesda). 2018;8(12):3829-40.
View Article PubMed/NCBI Google Scholar

102. Montesinos-López OA, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A, et al. A benchmarking between deep learning, support vector machine and Bayesian threshold best linear unbiased prediction for predicting ordinal traits in plant breeding. G3 (Bethesda). 2019;9(2):601-18.
View Article PubMed/NCBI Google Scholar

103. Montesinos-López A, Montesinos-López OA, Gianola D, Crossa J, Hernández-Suárez CM. Multi-environment genomic prediction of plant traits using deep learners with dense architecture. G3 (Bethesda). 2018;8(12):3813-28.
View Article PubMed/NCBI Google Scholar

104. Montesinos-López OA, Montesinos-López A, Hernández MV, Ortiz-Monasterio I, Pérez-Rodríguez P, Burgueño J, et al. Multivariate bayesian analysis of on-farm trials with multiple-trait and multiple-environment data. Agron J. 2019;111(6):2658-69.
View Article PubMed/NCBI Google Scholar

105. Bassi FM, Bentley AR, Charmet G, Ortiz R, Crossa J. Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.). Plant Sci. 2016;242:23-36.
View Article PubMed/NCBI Google Scholar

106. Gaynor RC, Gorjanc G, Bentley AR, Ober ES, Howell P, Jackson R, et al. A two-part strategy for using genomic selection to develop inbred lines. Crop Sci. 2017;57(5):2372-86.
View Article PubMed/NCBI Google Scholar

107. Pérez-Rodríguez P, Burgueño J, Montesinos-López O., Singh RP, Juliana P, Mondal S, et al. Prediction with Big Data in the Genomic and High-throughput Phenotyping Era: A Case Study with Wheat Data. In: Kang MS, editor. Quantitative Genetics, Genomics and Plant Breeding. 2nd ed. Wallingford (United Kingdom): CAB International; 2020.
View Article PubMed/NCBI Google Scholar

108. Juliana P, Singh RP, Poland J, Mondal S, Crossa J, Montesinos‐López OA, et al. Prospects and Challenges of Applied Genomic Selection—A New Paradigm in Breeding for Grain Yield in Bread Wheat. Plant Genome. 2018;11(3):1-17.
View Article PubMed/NCBI Google Scholar

109. Juliana P, Montesinos-López OA, Crossa J, Mondal S, González Pérez L, Poland J, et al. Integrating genomic-enabled prediction and high-throughput phenotyping in breeding for climate-resilient bread wheat. Theor Appl Genet. 2019;132(1):177-94.
View Article PubMed/NCBI Google Scholar

110. Juliana P, Poland J, Huerta-Espino J, Shrestha S, Crossa J, Crespo-Herrera L, et al. Improving grain yield, stress resilience and quality of bread wheat using large-scale genomics. Nat Genet. 2019;51(10):1530-9.
View Article PubMed/NCBI Google Scholar

111. Verges VL, van Sanford DA. Genomic selection at preliminary yield trial stage: Training population design to predict untested lines. Agronomy. 2020;10(1):1-16.
View Article PubMed/NCBI Google Scholar

112. Basnet BR, Crossa J, Dreisigacker S, Pérez‐Rodríguez P, Manes Y, Singh RP, et al. Hybrid Wheat Prediction Using Genomic, Pedigree, and Environmental Covariables Interaction Models. Plant Genome. 2019;12(1):1-13.
View Article PubMed/NCBI Google Scholar

113. Gorjanc G, Jenko J, Hearne SJ, Hickey JM. Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics. 2016;17(1):1-15.
View Article PubMed/NCBI Google Scholar

114. Crossa J, Jarquín D, Franco J, Pérez-Rodríguez P, Burgueño J, Saint-Pierre C, et al. Genomic prediction of gene bank wheat landraces. G3 (Bethesda). 2016;6(7):1819-34.
View Article PubMed/NCBI Google Scholar

115. Hickey JM, Chiurugwi T, Mackay I, Powell W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet. 2017;49(9):1297-303.
View Article PubMed/NCBI Google Scholar

116. Watson A, Hickey LT, Christopher J, Rutkoski J, Poland J, Hayes BJ. Multivariate genomic selection and potential of rapid indirect selection with speed breeding in spring wheat. Crop Sci. 2019;59(5):1945-59.
View Article PubMed/NCBI Google Scholar

117. Mondal S, Rutkoski JE, Velu G, Singh PK, Crespo-Herrera LA, Guzman CG, et al. Harnessing diversity in wheat to enhance grain yield, climate resilience, disease and insect pest resistance and nutrition through conventional and modern breeding approaches. Front Plant Sci. 2016;7(July):991.
View Article PubMed/NCBI Google Scholar

118. Lado B, Battenfield S, Guzmán C, Quincke M, Singh RP, Dreisigacker S, et al. Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs. Plant Genome. 2017;10(2):plantgenome2016.12.0128.
View Article PubMed/NCBI Google Scholar

119. Yao J, Zhao D, Chen X, Zhang Y, Wang J. Use of genomic selection and breeding simulation in cross prediction for improvement of yield and quality in wheat (Triticum aestivum L.). Crop J. 2018;6(4):353-65.
View Article PubMed/NCBI Google Scholar

120. Akdemir D, Sánchez JI. Efficient breeding by genomic mating. Front Genet. 2016;7:1-12.
View Article PubMed/NCBI Google Scholar

121. Gorjanc G, Gaynor RC, Hickey JM. Optimal cross selection for long-term genetic gain in two-part programs with rapid recurrent genomic selection. Theor Appl Genet. 2018;131(9):1953-66.
View Article PubMed/NCBI Google Scholar

122. Hickey JM, Dreisigacker S, Crossa J, Hearne S, Babu R, Prasanna BM, et al. Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci. 2014;54(4):1476-88.
View Article PubMed/NCBI Google Scholar

123. Bernardo R. Parental selection, number of breeding populations, and size of each population in inbred development. Theor Appl Genet. 2003;107(7):1252-6.
View Article PubMed/NCBI Google Scholar

124. Witcombe JR, Gyawali S, Subedi M, Virk DS, Joshi KD. Plant breeding can be made more efficient by having fewer, better crosses. BMC Plant Biol. 2013;13(1):1-12.
View Article PubMed/NCBI Google Scholar

125. Byerlee D, Dubin HJ. Crop improvement in the CGIAR as a global success story of open access and international collaboration. Int J Commons. 2009;4(1):452.
View Article PubMed/NCBI Google Scholar

126. Reynolds MP, Braun HJ, Cavalieri AJ, Chapotin S, Davies WJ, Ellul P, et al. Improving global integration of crop research. Science. 2017;357(6349):359-60.
View Article PubMed/NCBI Google Scholar

How to cite this article

Dreisigacker S, Crossa J, Pérez-Rodríguez P, Montesinos-L֯ópez OA, Rosyara U, Juliana P, et al. Implementation of Genomic Selection in the CIMMYT Global Wheat Program, Findings from the Past 10 Years. Crop Breed Genet Genom. 2021;3(2):e210005. https://doi.org/10.20900/cbgg20210005