Application of the GGE Biplot as a Statistical Tool in the Breeding and Testing of Early and Extra-Early Maturing Maize in Sub-Saharan Africa

In this paper is reviewed some aspects of the research conducted in subSaharan Africa in which the genotype main effect plus genotype by environment interaction (GGE) biplot was employed for the analysis and interpretation of the data. GGE biplot has been found quite effective in analyzing genotype × environment interaction, genotype × trait (GT) interaction, interpretation of diallel and line × tester data, and evaluation of the efficiency of testers in hybrid production. Application of GGE biplot to genotype by environment data from several studies has helped to identify outstanding varieties, inbreds and hybrids of early and extraearly maize in terms of yield performance and stability under stress and non-stress environments. The use of GT biplot analysis has resulted in the identification of ear aspect (EASP), plant aspect (PASP), anthesissilking interval (ASI), and number of ears per plant (EPP) as the most reliable traits for selection for yield under drought, low-N, high-N and well-watered environments. Studies comparing GT with path-coefficient analyses revealed that both methods identified EASP, plant height (PLHT), and ASI as the most important traits directly contributing to yield under drought stress. GT biplot identified EASP, EPP, and Striga damage as the most reliable traits for indirect selection for improved grain yield under Striga infestation. The biplot graphical analysis allowed visual display of the general combining ability (GCA) of the parental inbreds and specific combining ability (SCA) of the hybrids used in Griffings diallel mating design. In addition, information on the best mating partners, identification of proven testers and tester groups, and heterotic groups have been provided graphically. The disadvantages of the GGE biplot include limited number of entries, only two heterotic groups are handled by the method, and only fixed statistical model can be used. More attention needs to be focused on test of hypothesis and QTL analyses. Open Access Received: 26 February 2020 Accepted: 01 June 2020 Published: 11 June 2020 Copyright © 2020 by the author(s). Licensee Hapres, London, United Kingdom. This is an open access article distributed under the terms and conditions of Creative Commons Attribution 4.0 International License. Crop Breed Genet Genom. 2020;2(3):e200012. https://doi.org/10.20900/cbgg20200012 Crop Breeding, Genetics and Genomics 2 of 39


INTRODUCTION
The climatic, edaphic, and management variability in sub-Saharan Africa (SSA) is great and too formidable to be dealt with ordinarily. For example, the soil variability goes, as it were, from foot-to-foot and it is necessary that crops be able to cope with the variation. Similarly, crop plants vary a great deal in their response to environmental conditions. Therefore, genotype × environment interaction (GEI) has been defined as the degree of variation in response of a genotype across environments [1]. Genotype × environment interaction has been measured more by c. provide guidance that are very reliable in selecting the best genotypes or agronomic treatments suitable for planting at new locations and in the coming years [2].
Observable uniqueness in ensuring the interaction between the environment and the genetic make-up is called phenotype. Phenotypes could be assessed, observed, estimated, and arranged in groups according to features that they have in common. Environmental factors may be regarded as locations, growing seasons, years, nitrogen levels, rainfall, temperature, all of which could have positive or negative effects on genotypes [3]. Wu and O'Malley [4] described two classes of environments with detailed differences in their information: microenvironmental differences that cannot clearly be forecast such as yearly differences in drought conditions, rainfall, and level of insect damage; and macro-environmental differences which can be forecast, such as practice. According to the authors, the G × E variance can only be projected for the macro-environmental state.
There are difficulties in determining the varietal performance evaluated in experiments containing genotypes (G), locations (L), and years (Y) due to the genotype × location × year (G × L × Y) interactions not being easy to classify [5]. The complications resulting from G × E interactions can best be avoided by identification of stable genotypes that are adapted across crop production environments. To ensure the maintenance of broader adaptation and yield stability, superior experimental varieties have been selected based on the performance across contrasting environments. For example, the Regional Drought achieve this, we re-examined target testing environments in WA for their uniqueness as it was believed that some environments could never provide unique information, because of similarity to some other environments in separating and ranking genotypes without losing valuable information on genotypes. Furthermore, it was felt that stratification of maize evaluation environments could help improve heritability of measured traits, accelerate the rate of genetic gain from selection, and strengthen the potential competitiveness for seed production and maximize grain yields of farmers [11]. It was therefore very important to develop an in-depth understanding of the target agroecologies used for the evaluation of drought tolerant cultivars in WA and to determine if it could be subdivided into different mega-environments to facilitate a more meaningful cultivar evaluation and recommendation.  [12]. The seed catalogue contains the list of varieties whose seeds could be produced and commercialized within the territories of the 17 member countries of ECOWAS and is an aggregate of the varieties registered in the national catalogues of the Member States. The catalogue offers a unique opportunity for the movement of good quality seeds of improved maize varieties and hybrids across the borders of the ECOWAS countries for production and marketing. As a result of these new developments and the implications of global warming, desertification, and recurrent drought in the sub-region, there was a need for a reexamination of the mega-environments in WA and the identification of core testing locations in each of the mega-environments in WA used for the evaluation of the three different regional trials in WA. A number of studies was therefore conducted to determine the representativeness, discriminating ability and the repeatability of the test locations used for the evaluation of the DT Regional Early Variety Trials and to identify core testing sites to facilitate testing, seed production and commercialization of drought tolerant cultivars in WA. Therefore, using the GGE Biplot statistical tool, Badu-Apraku et al. [13] examined the mega-environments in WA employed for testing the Regional extra-early maturing varieties.
The test locations Zaria, Ilorin, Ikenne, Ejura, Kita, Babile, Ina, and Angaredebou were identified as the core testing sites of the three megaenvironments for testing the Regional Uniform Variety Trials-Extra-early.
In another study, involving the testing sites for the Regional Early Trials, test environments were classified into four mega-environments [14].
Four test locations were highly correlated in their ranking of the genotypes in group 1, suggesting that a promising early maturing cultivar selected in one of these locations in one country will also be suitable for production in the other locations within the same mega-environments in different countries [14]. Similarly, eight test locations were highly correlated in their rankings of the genotypes in group 2 and therefore, a promising cultivar identified in one of these locations were likely to be adapted to the other locations. It was concluded that selecting a cultivar out of these two locations would likely result in varieties adapted to other locations within the same mega-environment. The identification of the core testing sites was expected to facilitate the selection of high yielding and stable cultivars in the four different regional trials of WA [Regional Uniform Variety Trial (RUVT)-early, RUVT-extra-early, Drought Tolerant (DT) Regional Early and the DT Regional Extra-early variety Trials] and seed production and marketing across the countries of WA.  [15], only test locations with high discriminating ability were useful and only those that were also representative could be used in selecting superior genotypes. The repeatability of genotype ranking across years within test locations was also an essential aspect in test location evaluation. Using the GGE Biplot method, the GEI of the testing sites of the RUVT early and extra-early varieties in West and Central Africa (WCA) were studied and the test locations characterized and stratified into mega-environments and core testing sites to facilitate efficient and less costly testing of varieties [13,14]. On the other hand, the testing sites of the Regional Drought Tolerant Trials which were confined to the drought-prone locations in the four partner countries of the Drought Tolerant Maize for Africa (DTMA) project, namely, Nigeria, Ghana, Benin and Mali had not been studied. Therefore, it was believed that information on the representativeness, discriminating ability and repeatability of the testing sites of the DT Regional Variety Trials in WA would facilitate better understanding of the responses of drought tolerant maize genotypes in target drought environments and would be invaluable in designing an efficient and economic selection strategy for the International Institute of Tropical Agriculture (IITA) Maize Breeding Program. However, there was limited information on the representativeness, discriminating ability and repeatability of the testing sites of the Regional DT Trials which were largely in the drought prone locations in the four partner countries of the DTMA project, Nigeria, Ghana, Benin and Mali (Table 1). Therefore, twelve early maturing maize cultivars were evaluated for 3 years at 16 locations in WA to determine the representativeness, discriminating ability and the repeatability of the testing sites and to identify core testing sites using the GGE biplot method [16]. The results revealed that    and not repeatable and would not be useful for evaluating early maize cultivars for drought tolerance.
Beyond analysis of a MET data where significant GEI is singularly partitioned into eigen values in principal component analysis to obtain information on stability and adaptability of genotypes, as well as discriminativeness and representativeness of the environments, it is interesting to note that GGE biplot is very appropriate for the analysis of any other data that can cast into a 2-way table. This facilitates the use of GGE biplot in graphical analysis of traits relationship and genetic data obtained from factorial mating designs as well as QTL studies [17].
In the rest of this paper, we address the issue of stability of performance by comparing varieties, using different statistical methods.
Specifically, our objectives are to (i) compare GGE biplot with other statistical analytic methods, (ii) determine the effect of genotype × trait interaction, (iii) test GGE biplot for analysis of genetic data using diallel and line × tester designs, (iv) evaluate the efficiency of testers in hybrid production, (v) discuss the strengths and weaknesses of the GGE biplot statistical tool, and (vi) give future directions.

GGE BIPLOT COMPARED WITH OTHER ANALYTICAL METHODS
Stability studies have allowed researchers to identify broadly adapted cultivars for use in breeding programs and have been helpful in recommending new varieties to farmers [18]. Different concepts leading to different definitions of stability have been proposed over the years [19,20]. Lin et al. [19] identified three types of stability concepts:  The methods proposed by [24]) and [25] are examples of Type 3 concept.
Becker and Léon [20] stated that all stability procedures based on quantifying GEI effects belong to the dynamic concept. This includes the procedures for partitioning the GEI of Wricke's [26] ecovalence and Shukla's [23] stability of variance, procedures using the regression approach such as those proposed by Finlay and Wilkinson [22], Eberhart and Russell [24], and Perkins and Jinks [25] as well as non-parametric stability analyses such as rank summation index.
Lin and Binns [19] proposed Type 4 stability concept based on predictable and unpredictable non-genetic variation. The predictable component relates to locations while the unpredictable component relates to years. These researchers suggested the use of a regression approach for the predictable portion and the mean squares for years × location interaction for each genotype as a measure of the unpredictable variation.
The procedure involving combined analysis of variance is the earliest and the most used analysis method to measure the existence of GEI from METs with replicates. In recent times, however, a wide range of methods have been proposed to study GEI that were broadly divided into four groups: analysis of variance, stability or parametric, qualitative or nonparametric, and multivariate methods. We will consider three multiplicative methods here; that is, cluster analysis, additive main effect and multiplicative interaction (AMMI), and genotype and genotype by environment interaction (GGE) effects.

Cluster Analysis. Cluster analysis is a numerical classification
technique that defines groups of similar individuals. There are two types of classification. The first is non-hierarchical classification, which assigns each item to a class. The second type is hierarchical classification, which groups the individuals into clusters and arranges these into hierarchies for the purpose of studying relationships in the data. Comprehensive reviews of the applications of cluster analysis to study GEI can be found in [19]. The report from cluster analyses by Shaibu et al. [27] revealed the genetic diversity among the genotypes and identified genotypes that can be selected for hybridization and improvement of maize.

Additive Main Effects and Multiplicative Interaction (AMMI).
Stability methods have been used in both univariate and multivariate statistics [19]. Among the multivariate methods, the additive main effects and multiplicative interaction (AMMI) analysis are widely used for GEI investigations. This method has been effective because it captures a large portion of the GEI sum of squares, clearly separating the main and interaction effects, and often provides meaningful interpretation of data to support a breeding program [2]. The AMMI model combines ANOVA for the genotype and environment main effects with Principal Components Analysis of GEI [28,29]. Therefore, based on the AMMI model (IPCA1 and IPCA2) the AMMI stability value (ASV) has been used Crop Breed Genet Genom. 2020;2(3):e200012. https://doi.org/10.20900/cbgg20200012 Crop Breeding, Genetics and Genomics 11 of 39 [30]. The ASV is comparable with the methods used by Shukla [23] and Eberhart and Russell [24] for genotype stability [30].
The AMMI method can be used more effectively to analyze METs than ANOVA and PCA. According to Zobel et al. [28], ANOVA fails to detect a significant interaction component, PCA fails to identify and separate the significant genotype and environment main effects, while linear regression models account for only a small portion of the interaction sum of squares. The AMMI method takes care of the flaws in these methods and is used for three main purposes: a. The model diagnosis. AMMI is more appropriate in the initial statistical analysis of yield trials, because it provides an analytical tool of diagnosing other models as subcases when these are better for particular data sets [31].
b. AMMI clarifies the GEI by summarizing patterns and relationships of genotypes and environments [2,28].
c. It improves the accuracy of yield estimates. Gains have been obtained in the accuracy of yield estimates that are equivalent to increasing the number of replicates by a factor of two to five [28]. Such gains may be used to reduce testing cost by reducing the number of replications, increasing the number of treatments (e.g., varieties) in the experiments, or improving efficiency in selecting the best genotypes. It has proven useful for understanding complex GEI. The results can be graphed in a useful biplot that shows both main and interaction effects for both the genotypes and environments.
AMMI combines ANOVA into a single model with additive and multiplicative parameters. The model equation is: where Yij is the measured mean of ith genotype in jth environment; Yj is the grand mean; λ1 and λ2 are the singular values for PC1 and PC2; Ei1 and Ei2 are the PC1 and PC2 scores for genotype i; ϒj1 and ϒj2 are the PC1 and PC2 scores for environment j and εij is the error term.
The combination of ANOVA and PCA in the AMMI model, along with prediction assessment, is a valuable approach for understanding GEI and obtaining better yield estimates. The interaction is explained in the form of a biplot display where PCA scores are plotted against each other thereby providing a visual inspection and interpretation of the GEI components. Integrating biplot display and genotypic stability statistics enables genotypes to be grouped based on similarity of performance across diverse environments. Yield-stability statistic (YSi) was also used to recommend varieties for commercialization [32]. Kang [32] proposed an improved superior stability index (I) that is free from all the aforesaid drawbacks. A new approach, known as genotype selection index (GSI), was used by taking into consideration the AMMI stability value and mean yield for quantification of stability [33].

GENOTYPE AND GENOTYPE BY ENVIRONMENT INTERACTION (GGE)
Yan et al. [34] proposed a methodology known as GGE biplot for graphical display of GEI patterns. It allows visual examination of the relationships among test environments, genotypes and GEI. It is an effective tool for: (i) mega-environment analysis (e.g., "which-wonwhere" pattern), where specific genotypes can be recommended to specific mega-environments [35,36]; (ii) genotype evaluation (the mean performance and stability); and (iii) environmental evaluation (the power to discriminate among genotypes in target environments) [37]. ii) it can only identify two distinct heterotic groups in a genetic study where even more exist; iii) it cannot estimate genetic variances, covariances, and heritability; and iv) there is limited literature on its application to molecular data.
One recent study compared 15 methods of stability analysis using 17 varieties of maize evaluated in four years with several locations within the year for a total of 21 environments [41]. Spearman's rank correlation coefficient was used to rank the varieties ( Table 2). Many of the methods had no significant correlation with each other (Table 2)      Another study was conducted to examine the effect of G×E on the performance and stability of 18 early maize cultivars and to identify core test sites and mega-environments at 15 locations in five countries of WA [14]. Results of the GGE biplot classified the locations into four megaenvironments, regardless of their countries and Kita (KX, lat. 13°05' N, long. 09°25' W) in Mali was identified as the ideal location, and Zaria (lat. 13°05' N, long. 09°25' W) in Nigeria was close to the ideal location ( Figure   4). In addition, variety 2004 TZE-W Pop STR C4 was identified in the study as the ideal cultivar because it had highest grain yield and was the most stable cultivar. Genotype-by-trait (GT) analysis presents the results of trait relationship by graphical display of the genetic relationships among traits [42]. It also provides information that helps to detect less important (redundant) traits and identify those that are appropriate for indirect selection for a target trait. The GGE biplot model equation for the genotype-by-trait analysis is as follows:

Genotype × Trait Analysis
Where Yij is the genetic value of the combination between inbred i and trait j; μ is the mean of all combinations involving trait j; βj is the main effect of trait j; λ1 and λ2 are the singular values for PC1 and PC2; gi1 and gi2 are the PC1 and PC2 eigenvectors, respectively, for inbred i; e1j and e2j are the PC1 and PC2 eigenvectors, respectively, for trait j: dj is the phenotypic standard deviation (with mean of zero and standard deviation of 1); and εij is the residual of the model associated with the combination of Inbred i and trait j. For the GT biplot analysis, the data were not transformed ("Transform = 0") but were standard deviationstandardized ("Scale = 1"), and trait-centered ("centering = 2"). Therefore, the outputs are appropriate for visualizing the relationships among genotypes and traits.
In order to validate consistency of the results of GT biplot with other multivariate techniques such as stepwise multiple regression analysis and path coefficient analysis, a study was conducted to compare the results of the GT biplot and path analyses by Badu-Apraku et al. [47].
Results revealed that both methods identified EASP, PLHT, and ASI as important traits directly contributing to yield under drought stress.
Similarly, Oyekunle and Badu-Apraku [48] reported that the two methods  The ASI, EPP, EASP, and PASP were identified as most reliable traits for simultaneous selection of drought and low-N tolerant genotypes.

Diallel analysis
Adequate knowledge and understanding of genetic variability, modes of inheritance and heterotic response in a germplasm are very crucial for determining appropriate methods to employ for improving the genetic resources. Backcrossing, inbreeding, hybridization, and the S1 recurrent   (Figure 8). Tester TZEI 3 was the closest to the ideal tester while Entry TZEI 7 had the highest GCA effects across stress environments (Figure 9).
In summary, analysing diallel data using GGE biplot is very fascinating and it provides more genetic information beyond just the combining ability of the parents and hybrids. It gives additional information on the relationship among parents, identify testers, assess efficiency of testers, display relationships among testers, identify tester groups, reveals best mating partners, and most importantly, identify heterotic groups. These additional information are not readily available in conventional analysis of diallel data.
A major limitation to the use of diallel mating design is that there is a limit to the number of parents that can be involved. Results of only a few parents can be clearly displayed. As the number of parents to analyse increases, the results of GGE biplot become clustered and both entry and tester labels overlap and the biplot graphical views appear clumsy. In a breeding program where hundreds of inbred lines have to be analyzed, diallel analysis using GGE biplot becomes impracticable.

Line × Tester Analysis
Because of the shortcomings of diallel design in handling large number of parents, line × tester analysis was proposed by Kempthorne  Data were generated from 63 newly developed inbred lines crossed to   four extra-early elite testers (TZEEI 13, TZEEI 14, TZEEI 21 and TZEEI 29) evaluated under multiple stress and stress-free environments. Using GGE biplot, an ideal tester could not be identified under stress environments.
However, testers TZEEI 13 and TZEEI 14 were the closest to the ideal tester under nonstress environments ( Figure 10) [56]. Inbred TZdEEI 34 was identified as outstanding in terms of GCA effects under both stress and nonstress environments. Testers TZEEI 13, TZEEI 21 and TZEEI 29 were found to be very efficient across stress environments based on their discriminating power while testers TZEEI 21 and TZEEI 29 were the best across nonstress environments ( Figure 11).
Challenges encountered with the application of GGE biplot for analysing data from line × tester are similar to those of the diallel.
However, because the number of testers used in line × tester analysis is usually less than in diallel (where number of parents is considered as the number of testers), the graphical display of the results of line × tester study is better than that of diallel. Interpretation of results of line × tester is also easier and simpler than that of the diallel.
North Carolina design II (NCDII) is the third factorial mating design that could be analysed using GGE biplot analysis. However, there is no report in the literature where GGE biplot has been used for analysis of data from NCDII. One reason could be because larger number of parents can be accommodated in NCDII compared to the diallel and for better organization, males nested within set is considered as a factor in the statistical model rather than male factor.
We recommend that for GGE biplot to have a wider application in analysis of genetic as well as agronomic data, the proponents should consider incorporating features that will be appropriate for analyses of random and mixed model data and data from nested type of mating design. Crop

Evaluation of the Efficiency of Testers in Hybrid Production
An important prerequisite for the development of high-yielding commercial hybrids is the availability of efficient testers, which could successfully discriminate, classify inbred lines into appropriate heterotic groups, and combine well with other inbred lines, open pollinated varieties or hybrids. An effective tester should be able to rank inbred lines correctly for performance in hybrid combinations and increase the differences between testcrosses for efficient discrimination [57].
Furthermore, such testers must have improved agronomic characteristics, resistance to diseases and tolerance/resistance to prevailing biotic and abiotic stresses such as drought, low-N and Striga. which were adopted for early and extra-early maize germplasm [58].
Over the years, several testers have been developed in the early and extra-early maturity group to facilitate the development of superior hybrids for SSA. This has necessitated identification of a few efficient testers for use in classifying the available inbred lines into heterotic groups as well as inbred lines for the development of outstanding commercial hybrids for production in SSA.
The GGE biplot tool has the potential for identifying efficient testers even though its use for such analysis has not been adequately explored.
Several early maturing inbred lines, including TZEI 10, TZEI 17, TZE 23, TZEI 129 and ENT 13 have been identified as potential testers in the IITA Maize Improvement Program (MIP) using the GGE biplot statistical tool.
The GGE biplot has been used to identify the most efficient testers among the five inbred lines. As described by Akinwale et al. [17] and Yan [59], the efficiency of a tester (testers were used to replace environments) is determined by the relationship among the testers and the length of the tester vector. The smaller the angle between any two testers, the more closely related the testers are while testers with longer vectors show high discriminating power or its ability to assess the grain yield of the crosses.
Badu-Apraku et al. [53]   The GGE biplot is a superior data-visualization tool widely used in several major areas of agronomy, plant breeding and for analysis in genetic studies involving GEI, test location evaluation, genotype evaluation, mega-environment investigation and identification of parental inbreds for hybrid development [61]. This tool allows researchers to graphically extract and utilize information from METs data and other types of two-way data [35]. However, the full potential and shortcomings of this powerful tool are not completely understood by breeders, geneticists, agronomists, ecologists, entomologists and pathologists. The limited use of this tool could be attributed to lack of understanding of its potential capability on the part of many researchers.
Furthermore, the major weaknesses as well as potential useful areas of Breeders focus more attention on estimating genetic parameters since they increase the effectiveness of predicting gains from selection for the genetic enhancement of crop cultivars. GGE biplot has been extensively employed in combining ability analysis and identification of heterotic patterns using diallel data [35,53] and line × tester data [56]. Badu-Apraku and Akinwale [56]  when the results of interrelationships among traits using GT biplot and sequential path analysis were compared [46].
In its application to analyze genetic data, classification of genotypes into heterotic groups has been based on the SCA effects only, which is represented by the projections of the entry vectors onto the ATC ordinate. In a situation where the GCA is preponderant over the SCA, classifying genotypes into heterotic groups based on SCA alone, as analysed by GGE biplot, will be grossly inefficient and the groups will not be distinct.
Another major challenge with diallel analysis using GGE biplot is that only two heterotic groups can be identified even when more groups are present. Other inbreds that cannot fit into the two groups become unclassified [17]. Furthermore, the proportion of parents classified are smaller relative to the total number of parents involved in the study. This is particularly of great concern especially in a standard breeding program that has committed considerable time, energy, efforts, land, funds and other resources to produce several inbred lines only to find out that just a few can be classified into heterotic groups for the purpose of hybrid development.
Another major shortcoming of GGE biplot analysis of genetic data such as diallel is that only fixed statistical model is applied. When genotype is considered as a random model where the experimenter is interested in computing genetic variances and heritability estimates, application of GGE biplot in the analysis of such data becomes limiting since the biplot has not been designed to display these parameter estimates graphically. Furthermore, GGE biplot tool has not been used in the analysis of data generated using North Carolina Designs I, II, and III and some other genetic designs. The use of GGE biplot in the analysis of the mating designs could facilitate a better understanding of the mating designs.

CONCLUSIONS AND FUTURE DIRECTIONS
GGE biplot is the most widely used multivariate analytical tool in the analysis of plant breeding data. The interpretation of GGE biplot analysis of genetic data is more comprehensive with wider applicability than the conventional statistical methods. Nevertheless, the lack of discrete statistical test of significance in its analysis has sometimes made the reliability of its results debatable by researchers. However, its results have been found to be consistent with that of ANOVA, correlation, that are yet to be fully explored especially for the tropical maize germplasm.

DATA AVAILABILITY
The dataset of the study is available from the authors upon request.

AUTHOR CONTRIBUTIONS
BBA, BF and RA conceived and designed the reviewed experiments as well as drafted the manuscript. BBA, BF, RA, BA and SAK executed the experiments. SAD and JT assisted in drafting the manuscript. All authors critically reviewed the manuscript.

CONFLICTS OF INTEREST
The authors declare that there is no conflict of interest.