Sign in Register Submit Manuscript

Hapres Home

Location: Home >> Detail

Crop Breed Genet Genom. 2022;4(2):e220002.


Multi-Locus Association Mapping Uncovers the Genetic Basis of Yield and Agronomic Traits in Chile Pepper (Capsicum spp.)

Dennis N. Lozada 1,2,* , Derek W. Barchenger 3, Danise Coon 1,2, Madhav Bhatta 4, Paul W. Bosland 1,2

1 Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM 88003, USA

2 Chile Pepper Institute, New Mexico State University, Las Cruces, NM 88003, USA

3 World Vegetable Center, Shanhua, Tainan 74151, Taiwan

4 Bayer Crop Science, Chesterfield, MO 63017, USA

* Correspondence: Dennis N. Lozada, Tel.: +1-575-646-5171.

Received: 08 April 2022; Accepted: 16 May 2022; Published: 19 May 2022

This article belongs to the Virtual Special Issue "Pepper Breeding and Genetics"


Genome-wide association studies (GWAS) are one of the foundations for modern molecular breeding for plant genetic improvement. Single-locus GWAS methodologies impose constraints in association mapping due to the complex nature of quantitative traits and the often-stringent thresholds used. Multi-locus models, therefore, serve as alternative approaches in identifying significant marker-trait associations. This study used six multi-locus models to determine quantitative trait loci (QTL) associated with yield and agronomic traits in a diverse population of chile pepper evaluated in New Mexico, USA. Using the GWAS models, eight genotyping-by-sequencing (GBS)-derived single nucleotide polymorphism (SNP) markers across six chromosomes were identified to be associated with multiple traits. Plant height shared similar genetic control with plant width, whereas total yield per plant had common QTL with yield components. Epigenetic mechanisms such as methylation and chromatin remodeling and organization were predicted for first pod date and flowering time. The diverse functions of candidate genes identified reflect the complex genomic architecture of the evaluated traits. Allele specific assays for the identified QTL will be developed and validated for marker-assisted selection and genome-wide selection towards the genetic improvement of chile pepper.

KEYWORDS: genome-wide association study; genotyping-by-sequencing; marker-assisted selection; quantitative trait loci; single nucleotide polymorphisms


GBS, genotyping-by-sequencing; GWAS, genome-wide association study; LD, linkage disequilibrium; NM-CAMP, New Mexico Capsicum association mapping panel; QTL, quantitative trait loci; SNP, single nucleotide polymorphism


Chile pepper (Capsicum spp.) is among the most important vegetable and spice crops in the world due to its important diverse uses and cultural significance [1]. Global chile pepper production reached ~61.6 million tons in 2019 [2]. However, chile pepper production is continuously constrained by different factors such as the rapidly changing climate, presence of biotic and abiotic stress factors, labor availability, and costs associated with manual harvesting [3–5]. In the state of New Mexico in the United States, chile peppers are regarded as a major economic and cultural crop [6]. Chile pepper production in New Mexico was 51,000 tons in 2021, a 22% decrease from the previous year [7]. Genetic improvement is crucial to improve production and to meet the increasing demands for chile pepper.

Marker-assisted selection (MAS) has the potential to accelerate crop genetic improvement. The availability of reference genomes for the Capsicum spp. [8–10] facilitates effective molecular breeding, marker discovery, genetic mapping, and candidate gene analysis. GWAS is one of the tools to dissect complex traits and identify trait-marker relationships based on linkage disequilibrium (LD) [11]. The development of a mapping population is faster, as GWAS uses a diverse germplasm instead of a biparental population for genetic mapping, eliminating the need to perform experimental hybridizations [12]. Models for GWAS also evaluate multiple traits simultaneously. Nevertheless, spurious associations can arise from genetic and population stratification [13]. Incorporating factors such as relatedness matrices (e.g., principal components) derived from molecular marker data in the GWAS model is therefore necessary to correct for the presence of population structure [14].

The general model for GWAS employs a mixed linear model (MLM) that includes kinship (K) and population structure (Q) information (i.e., K + Q model) that effectively controls Type I and II error rates [15]. More advanced MLM-based methods including the efficient mixed model association (EMMA; [16]), compressed mixed linear model (CMLM; [17]), efficient mixed model association expedited (EMMAX; [18]), and genome-wide efficient mixed model association (GEMMA; [19]), were later developed for association mapping. These approaches feature a one-dimensional survey of the genome by testing a single marker at a time, making the approach computationally efficient especially when using a large number of markers [20].

Previous GWAS in chile pepper used single-locus approaches to understand the genetic basis of various traits with agronomic and economic importance. The genetic basis of fruit morphology, for example, has been examined by using 373 accessions belonging to 11 Capsicum species across 746K polymorphic single nucleotide polymorphism (SNP) sites resulting in the identification of four novel loci associated with fruit shape [21]. Association mapping for fruit weight and capsaicinoid content in C. annuum identified 16 SNP markers related to variation for fruit weight located within known genes [22]. In another study, Lee et al. [23] identified 17 fruit-related regions and 16 causal genes controlling major fruit-related traits using genotyping-by-sequencing [GBS]-derived SNP markers implemented for GWAS and biparental mapping. The genetic architecture of Phytophthora capsici resistance has also been examined using GWAS, and 117 significant SNP markers across the whole genome were identified to be associated with resistance to the disease [24]. Altogether, these studies demonstrated the utility of single-locus GWAS methods in identifying genomic regions associated with complex traits in chile pepper. The single-locus models, nevertheless, may not render an accurate estimate of marker effects if a trait were controlled by multiple loci [20], especially for large experimental error typical in plant breeding field experiments [25]. The Bonferroni multiple test correction for declaring significant thresholds is also often too stringent, consequently causing many important loci to be declared as non-significant [26]. Moreover, implementing strict thresholds decreases the probability of accepting false positives, but does not minimize the potential of rejecting true positives because of setting very high significance thresholds [27]. Multi-locus GWAS models, therefore, serve as good alternatives to the existing single-locus approaches and has been recommended for the genetic dissection of complex traits.

Multi-locus association mapping involves two major steps in detecting significant marker-trait associations. In the first step, markers are scanned and selected based on a low significance level; after which, a multiple locus method is implemented for markers that pass the initial screening in the second step, and the true quantitative trait loci are confirmed using a likelihood test [20]. Contrary to single-locus association mapping, all potential loci are fitted to a single linear model and their effects are calculated and tested simultaneously in multi-locus GWAS [25], consequently improving power over the single-locus methods [28]. The first multi-locus association mapping approach, multi-locus mixed model [MLMM], was implemented in human and Arabidopsis thaliana datasets [29]. The MLMM considers the loci as fixed effects and features a simple, forward-backward stepwise approach where variance components are re-estimated at each step. Random effect multi-locus models for GWAS were subsequently developed due to the observed inflation of the true marker effects from the predicted values resulting from accounting markers as fixed effects [30]. In recent years, the application of multi-locus GWAS methodologies assuming random marker effects to complement or as alternatives to single-locus approaches has expanded across different plant species. Multi-locus random effect GWAS models have been implemented in major crops including bread wheat (Triticum aestivum L.; [31,32]), maize (Zea mays L.; [33]), rice (Oryza sativa L.; [34,35]), soybean (Glycine max L.; [36,37]), and upland cotton (Gossypium hirsutum; [38]). Altogether, these studies demonstrated the efficiency and power of multi-locus GWAS in dissecting the genetic basis of different complex traits resulting from using less stringent significant criteria. To our knowledge, there are currently no known reports on the application of multi-locus models for association mapping of complex traits in chile pepper.

In the present study, we demonstrated the potential of using multi-locus GWAS approaches in dissecting the genetic architecture of yield, phenology, and plant morphology-related traits in chile pepper (Capsicum spp.) The objectives of this study were to (1) explore the relationships among yield, yield components (green and red mature fruit yield per plant, ten pod weight), and agronomic traits (plant height and width) in a chile pepper mapping population; (2) identify significant genomic regions associated with different yield and agronomic traits using multi-locus GWAS models; and (3) determine candidate genes related to these significant loci. Results from this study will be important for the development and validation of markers for molecular breeding and genetic improvement of current chile pepper germplasm.


Chile Pepper Germplasm

The New Mexico Capsicum association mapping panel (NM-CAMP) consisted of 203 accessions belonging to four cultivated and one wild species of chile pepper (Capsicum spp.; Supplementary Table S1). The majority (156; 77%) of the NM-CAMP comprised of members of C. annuum, including their wild progenitor, ‘chiltepin’ (C. annuum var. glabriusculum). The second largest group consisted of members of C. chinense (34; 17%); followed by C. frutescens (7; 3%), and C. baccatum (5; 2%). A single accession of the wild chile pepper C. chacoense was also included in the NM-CAMP population. The C. annuum group included cultivars previously released by the New Mexico State University (NMSU) Chile Pepper Breeding and Genetics Program such as the ‘NuMex Big Jim’ [39], ‘NuMex Heritage Big Jim’ [40], ‘NuMex Sandia Select’ [41], ‘NuMex Joe E. Parker’ [42], and ‘NuMex Heritage 6-4’ [43], all classified as New Mexican pod type chile pepper. Additionally, jalapeño types such as ‘NuMex Vaquero’ [44], the cayenne, ‘NuMex Las Cruces’ [45], the paprika, ‘NuMex R. Vince Hernandez’ [46], the serrano, ‘NuMex Cajohns’, and ornamental chile peppers ‘NuMex Christmas’, ‘NuMex Easter’, and ‘NuMex Chinese New Year’ [47], and NMSU breeding lines derived from single plant selections were included in the NM-CAMP. The C. baccatum comprised of the ajis from South America, whereas the C. chinense consisted of habaneros and the ‘Superhots’, chile peppers with more than 1 million Scoville Heat Units, such as the ‘Carolina Reaper’, ‘Trinidad Moruga Scorpion’ [48], and ‘7 Pot Primo’. The NM-CAMP members of C. frutescens comprised of the tabasco, including the ‘NuMex Nobasco’, a no-heat type [49] and ‘Siling Labuyo’, a high-heat cultivar from the Philippines.

Collection and Analysis of Phenotypic Data

Yield and agronomic traits were collected from NM-CAMP for the 2021 growing season at the Leyendecker Plant Science Research Center, Las Cruces, NM (32.32° N, 106.76° W), and at the NMSU Los Lunas Agricultural Science Center, Los Lunas, NM (34.81° N, 106.73° W; hereafter referred to as the CRU and LUN locations, respectively), 320 kms (200 miles) N of CRU. Seedlings with 8–10 true leaves were transplanted 30 cm apart in 4.5 m plots with 0.30 m distance between plots in raised beds. The plants were cultivated under standard cultural and management practices including fertilizer application and furrow irrigation for growing chile pepper in New Mexico. Transplanting was done in April (CRU) and May (LUN), whereas a single manual harvest of pepper fruit samples was conducted from September-October 2021. The CRU location has a Belen clay loam class, whereas the LUN environment has a mixture of sandy clay loam (Gila) and Belen soil types. The mean temperatures between April and October and between May and October 2021 for the CRU and LUN locations were 21.5 °C and 19.5 °C, respectively, with the highest average temperature recorded in July at 25.7 °C in CRU (National Climatic Data Center;; data retrieved 10 May 2022) [50].

The NM-CAMP was evaluated in an augmented complete block field design for the CRU and LUN environments, where each block consisted of replicated checks, namely ‘Charger’ (New Mexican type), ‘Centella’ (jalapeño), and ‘C2 792’ (cayenne) and un-replicated entries. The total yield per plant (TYP; in kgs.) was represented as the fresh weight of the mature pods (green and red fruits) divided by the total number of plants of each plot; whereas the yield component trait, 10-pod weight (TPW; in kgs.), was measured as the total weight of randomly selected five mature green and five mature red pepper fruits. Other yield components such as green yield per plant (GYP) and red yield per plant (RYP) represented the fresh weight for green and red mature chile pepper fruits, respectively, divided by the number of individual plants of each plot, in kgs. Plant height (PHT) was the average measurement (in cm) from the ground to the top of the plant for five individual plants in each plot. Plant width (PWDTH; in cm) was measured as the widest point of five individual plants on the top of the canopy. Flowering time (FT) and first pod date (FPD) represented the number of days when the flowers and the pods (fruits) start to develop from the day of transplanting for each entry of the NM-CAMP. Phenotypic data collection was performed on a plot basis for each genotype using a maximum of ten plants for yield and yield components, and five individual plants for the plant morphology traits.

Phenotypic field data were analyzed under an augmented type complete block using a custom script for an augmented complete block design in the statistics program, R [51]. Best linear unbiased estimates (BLUEs) for the phenotypes were calculated in individual locations, whereas the Best Linear Unbiased Predictors (BLUPs) were estimated for the combined analysis across environments. Effects were considered as fixed and random when calculating BLUE and BLUP values, respectively. The broad-sense heritability, H2 at an individual location was calculated as: H2 = σ2G/σ2G + σ2E; where σ2G and σ2E represent the variance due to genotype and residual, respectively. Across locations, H2 was represented as H2 = σ2G/σ2G + σ2GE/n + σ2E; where σ2GE is the variance due to genotype-by-environment interaction and n is the number of environments.

Adjusted values for the trait data were calculated using the following statistical model for analysis for individual environment, where the effects were regarded as fixed:

Yij = µ + Gen + Ck + IDck + Blki + εij(1)

where Yij is the phenotype of interest; µ is the mean effect; Gen is the un-replicated entry effect; Ck and IDck are the effects of the replicated check on each block and the check identifier, respectively; Blki is the effect of the ith block; and εij is the residual effect. Phenotypic values across combined locations were calculated using the following model, where the effects were considered random:

Yikj = µ + Gen + Ck + IDck + Blki + Loci + Loci × Gen + Loci × Ck + Loci × IDck + Blkk (Loci) + εijkl(2)

where Yij is the phenotype of interest; µ is the mean effect; Gen is effect of un-replicated entry; Ck and IDck are the effects of the replicated checks on each block and the check identifier, respectively; Blki is the effect of the ith block; Loci is the effect of location i; Loci × Gen, Loci × Ck, Loci × IDck are the effects of the interactions between environment and the entry, check, and check identifiers, respectively; Blkk (Loci) is the effect of block nested into the location; and εij is the residual effect. The BLUE and BLUP trait values were calculated using models (1) and (2), respectively. Analysis of variance for an augmented design was conducted using the ‘augmentedRCBD’ [52] package in R [53]. Principal components analysis and Pearson correlation coefficient calculation were performed using the software ‘DeltaGen’ [54] and the multivariate function in JMP® 13.2.1 [55], respectively.

DNA Isolation and Genotyping-by-Sequencing

Seeds were planted in F1020 multi-cell trays (American Horticultural Supply, Inc., CA, USA) at the Fabián García Science Center, Las Cruces, NM, USA and were grown in standard greenhouse conditions with temperatures maintained between 26 and 30 °C for cultivating chile pepper [56]. Tissue samples from seedlings (30–45-day old) of single plants per entry were collected using 1.2 mL Qiagen® collection microtubes (Qiagen, MD, USA). Genomic DNA extraction was conducted using ~50 mg of fresh leaf tissue using the Qiagen DNEasy® 96 plant extraction kits through the University of Minnesota Genomics Center DNA Extraction Facility ( The total DNA isolated was quantified using Picogreen® (Thermofisher Scientific, MA, USA) and samples were normalized to 10 ng/µL for genotyping-by-sequencing (GBS).

The GBS procedure for the NM-CAMP was conducted through the University of Minnesota Genomics Center ( as described previously in Lozada et al. [57]. Briefly, a single enzyme digestion protocol was implemented using ApeKI (New England Biolabs, Inc., MA, USA) for ~100 ng of DNA per sample. Enzyme incubation at 75 °C for 2 h was performed and heat-inactivation at 80 °C for 20 min followed. Ligation of the DNA samples were then conducted with 200 units of T4 ligase (New England Biolabs, Inc. MA, USA). Purification of the ligated samples with solid phase reversible immobilization (SPRI) beads, and subsequent amplification (18 cycles) with 2X NEB Taq Master Mix were performed to add the barcodes. Libraries were purified, quantified, and pooled using SPRI beads. Fragments (300–744 bp) were selected and diluted to 1 nM for single end 1 × 100 sequencing using the Illumina NovaSeqTM 6000 sequencer (Illumina, CA, USA). Illumina ‘bcl2fastq’ software (Illumina, CA, USA) was used to demultiplex the raw FASTQ files. For the beginning of each sequence read, the first 12 bases were removed to exclude the adapters. Adapter sequences at the 3’ ends of the reads were removed using Trimmomatic [58]. Burrows-Wheeler Aligner [59] aligned the FASTQ files to the Zunla-1 (C. annuum) reference genome ([9]; Zunla-1 reference genome GCA_000710875.1, PRJNA186921, v.1.0; ( Simultaneous joint calling of variants across all samples was done using the Freebayes Bayesian genetic variant identifier [60]. VCFtools processed raw variant call format (VCF) files to remove variants with the following parameters: (a) minor allele frequency <1%; (b) genotype rates <95%, and (c) samples with genotype rates <50%. TASSEL 5.2.77 [61] was used to convert the VCF files to the HapMap format using where loci with MAF < 0.05 and unmapped markers were further excluded. Names of SNP markers were designated as "Scaffold/Chromosome name Position of the marker (in bp) in the scaffold”. Missing data imputation was accomplished using the LD k-nearest neighbor genotype imputation function [62] in TASSEL 5.2.77.

Genetic Diversity and Population Structure

Different indices were used to calculate the genetic diversity for the NM-CAMP. Tajima’s D statistic [63] was calculated using a ‘sliding window’ analysis (step size = 100 (0.10 Kb) and window size = 500 (0.50 Kb)) in TASSEL 5.2.77. GenoDive program [64] calculated heterozygosity within populations (Hs), total heterozygosity (Ht), and inbreeding coefficient (Gis). Analysis of molecular variance (AMOVA) [65] was performed through the package ‘poppr’ [66] in R. ‘GeneticSubsetter’ [67] package in R calculated polymorphism information content (PIC). Principal component analysis using genome-wide SNP markers were conducted using the “PCA” function in TASSEL v.5.2.77 and the first two PCs were fitted in a bivariate plot in JMP 13.2.1.

Population genetic stratification of the NM-CAMP was evaluated using STRUCTURE [68]. An admixture model was applied using a burn-in of 10,000 iterations, 10,000 Monte Carlo Markov Chain replicates, and number of clusters, K, between 1 and 10 with the number of replicates per K equal to 5. An ad-hoc statistic for ΔK based on the magnitude of changes in the log probability of data between the values of K [69] was used to infer the true number of K that best represent the entries in STRUCTURE HARVESTER [70]. ‘StructuRly’ program [71] visualized the admixture values derived from STRUCTURE for each entry through bar plots.

Linkage Disequilibrium Analysis

Analysis of linkage disequilibrium (LD) was conducted in TASSEL v.5.2.77, where LD coefficients for pairwise intrachromosomal marker pairs were calculated using the “sliding window” type function and a window size of 50 (0.05 Kb). The intrachromosomal LD values (r2) were represented as the square of allele frequency correlations between pairs of loci [72]. Distance at which LD starts to decay was determined by fitting a non-linear regression model [73,74] in the r2 against physical distance (Mb) biplot and identifying the intersection between the regression curve and a critical value (r2 = 0.25) in the plot. Analysis of LD was also performed using the markers identified to be associated with yield and agronomic traits. Intrachromosomal marker pairs with P < 0.05 were declared to be in significant LD.

Multi-locus Association Mapping and Analysis of Candidate Genes

Multi-locus GWAS was conducted using the multi-locus random-SNP-effect Mixed Linear Model (‘mrMLM.GUI’) package [75] in R for the BLUE and BLUP trait values. A total of six GWAS models were implemented to identify QTL: (1) mrMLM [20]; (2) FASTmrMLM [76]; (3) FASTmrEMMA [77]; (4) pLARmEB [78]; (5) pKWmEB [79]; and (6) ISIS EM-BLASSO [80] using default parameters (Table 1). A restricted maximum likelihood (REML) approach was used for FASTmrEMMA and the bootstrap was set to ‘FALSE’ for pLARmEB. The search radius (in kb) of candidate genes for mrMLM and FASTmrMM was set to 20, whereas the number of potentially associated variables selected by LARS for pLARmEB was set to 50. To minimize the confounding effects of a small number of individuals for some representative species, the C. chacoense accession was excluded for the final GWAS analyses. A kinship-principal component (K-PC) model was implemented for the identification of QTL, where the K matrix and the first three PCs were included in the model to correct for population structure. Principal components derived from marker data were calculated using TASSEL v.5.2.77. The threshold for identifying a significant marker-trait association using the multi-locus GWAS models was set to an LOD score of >3.0 to balance high power and low false positive rate for the detection of QTL [25]. Candidate genes within the LD regions for the significant multi-trait QTL were identified by comparing sequences against the genome of ‘Criollo de Morellos 334’ (CM-334; Genome assembly (GA): ASM512225v2) (C. annuum L.) using BLASTn in EnsemblPlants ([81];; accessed on 01 December 2021) with the following criteria: (a) ≥ 95% sequence identity and (b) an e-value cut-off of 10. Genes were annotated based on their biological functions using a gene ontology-based sequence annotation.

Table 1. Overview of the multi-locus association mapping models used to identify QTL using the New Mexico Capsicum association mapping panel.


Phenotypic Diversity for Yield and Agronomic Traits

Analysis of variance for augmented design revealed significant differences (P < 0.0001) among entries for FPD, FT, PHT, and PWDTH within environments for the NM-CAMP (Supplementary Table S2). Overall, the total yield per plant (TYP) for Las Cruces (CRU) was higher than the Los Lunas (LUN) location, although no significant differences were observed between the environments (Supplementary Table S3). Entries in the CRU location were taller and wider, on average, than the LUN environment. Conversely, at the LUN location, earlier flowering time (FT) and first pod date (FPD) (~3 days earlier, on average) occurred as compared to the CRU. Across individual environments, plant height (PHT), FT, and FPD were the most heritable traits, with PHT reaching a broad-sense heritability (H2) value of 0.97 for CRU, whereas TYP was moderately heritable for CRU (0.50) and LUN (0.45). For the combined multi-location analyses, H2 values ranged between 0.20 (TYP and red yield per plant (RYP)) and 0.88 (ten pod weight, TPW). Phenotypic correlations among yield and agronomic traits showed wide range of variation for Best Linear Unbiased Prediction (BLUP) trait values across environments (Table 2). The yield components TPW, RYP, and green yield per plant (GYP) were significantly correlated with TYP with correlation coefficient, r values of 0.15 (P < 0.05), 0.74 (P < 0.0001); and 0.61 (P < 0.0001), respectively, whereas there was a negative, non-significant correlation between GYP and RYP (r = −0.03, P > 0.05). Time to flowering and first pod date (FT and FPD) were highly significantly correlated (r = 0.98; P < 0.0001). A significant correlation was likewise observed for PHT and PWDTH (r = 0.65; P < 0.0001). The principal component (PC) biplot for phenotypic traits support the values for phenotypic correlation. The total yield and yield components formed a separate cluster with the plant morphology and phenology-related traits (Figure 1A). The first and second PCs were related to 32.4 and 25.2% of phenotypic variation, respectively.

Table 2. Phenotypic correlations between different yield and other agronomic traits for the New Mexico Capsicum association mapping panel.
Figure 1. Principal component (PC) biplots from the first two PCs for (A) yield, plant morphology, and phenology-related traits. Group I consisted of PHT, PWDTH, FPD, and FT, whereas Group II comprised the yield (TYP) and yield components (GYP, RYP, and TPW). (B) New Mexico Capsicum association mapping panel (NM-CAMP) from GBS-derived SNP markers showing genetic differentiation based on species. FPD—First pod date; FT—Flowering time; GYP—green yield per plant; PHT—Plant height; PWDTH—Plant width; RYP—Red yield per plant; TPW—Ten pod weight; TYP—Total yield per plant.
GBS-Derived SNP Markers

A total of 25,867 GBS-derived SNP markers were identified across the 12 chromosomes of chile pepper for the NM-CAMP. After removing variants with various filtering criteria (minor allele frequency <1%; genotype rates <95%; samples with genotype rates <50%), 14,922 SNP loci remained for further analysis. Chromosome P3 had the greatest number of SNPs identified (1965), followed by chromosomes P1 and P2, with 1692 and 1505 markers, respectively. Chromosomes P11 (871), P5 (911), and P9 (917) had the least number of SNP markers discovered. In terms of marker density (SNP/Mb), chromosome P8 had the highest SNP density (9.2), followed by P2 (9.1), and P3 (7.5); whereas chromosomes P9, P11, and P7 had the least, with 3.8, 3.9, and 4.1, respectively (Supplementary Table S4). Overall, the whole genome had a marker density of 5.6 SNP/Mb. Across the SNP sites, the most common nucleotide was thymine (T) (29.8%), followed by adenine (A) (24.9%), guanine (G) (23.1%), and cytosine (C) (22.2%). Ambiguous nucleotide calls comprised 133,858 (4.4%) of the total sites. Transition substitutions were identified in 8,149 sites (54.6%), where the ‘A/G’ was the most common (14.3%) followed by the ‘T/C’ (13.9%) type. A total of 6773 sites (45.4%) were transversion substitutions with the ‘T/A’ being the most common (6.8%) and the ‘G/C’ type the least (4.1%), across the whole genome. The proportion of heterozygous SNP sites was 0.04 and the average minor allele frequency across all loci was 0.22.

Analysis of Principal Components, Genetic Diversity, and Population Structure

Principal component analysis (PCA) revealed three distinct clusters based on species (Figure 1B). The first two PCs accounted for 24.1% (PC1) and 2.8% (PC2) of the total genotypic variance. Cluster I was comprised of the C. baccatum, C. chinense, C. frutescens, and C. chacoense complex (46 entries), whereas Clusters II and III consisted of the C. annuum. The chiltepins formed a complex with the ornamentals, jalapeno, serrano, cayenne, and New Mexican pod types previously released by the NMSU Chile Pepper Breeding Program (Cluster II; 112 lines). The last group (Cluster III; 45 genotypes) consisted of the NMSU breeding lines derived from single plant selections of several ‘NuMex’ cultivars including the ‘NuMex Joe E. Parker’, ‘NuMex Big Jim’, and ‘NuMex Sandia Select’.

Tajima’s D value for the NM-CAMP was 2.97. Negative Tajima’s values were observed for all groups, with −1.17, −0.19, and −0.20 for Clusters I, II, and III, respectively. Analysis of molecular variance revealed that most of the genetic variation was a consequence of variation between the Capsicum populations (PCA groups) (73.4%; Table 3). Variation between samples within a population and variation within samples accounted for 17.3 and 9.3% of variation, respectively.

Inference for population stratification using STRUCTURE identified K = 5 (ΔK = 14,014.61) as the optimal number of genetic groups for the NM-CAMP (Figures 2A, B). These clusters corresponded to the different Capsicum species, where the C. annuum and C. annuum var. glabriusculum (chiltepins) were divided into four different clusters. Accordingly, Group I consisted of the C. annuum and chiltepins, whereas the C. chinense, C. baccatum, C. frutescens, and C. chacoense formed a complex (Group II). Clusters III and IV comprised exclusively of the C. annuum and chiltepins, respectively. Group 5 was an admixture of C. annuum and C. annuum var. glabriusculum. The number of cluster K = 3 (ΔK = 5696.21) also showed a high delta K value, demonstrating that this could serve as an alternative clustering to describe the population differentiation among the different species for the NM-CAMP (Figure 2B). The grouping for K = 3 for the population was consistent with the PC biplot with differentiation based on species, where C. annuum formed Cluster II and III, whereas the rest of the Capsicum species comprised a complex for Group I.

Table 3. Analysis of molecular variance using genome-wide SNP markers for the New Mexico Capsicum association mapping panel.
Linkage Disequilibrium

Analysis of linkage disequilibrium (LD) identified 681,178 intrachromosomal marker pairs for the NM-CAMP; of which 530,921 (77.9%) were in significant (P < 0.05) LD (Supplementary Table S4). The analysis of LD for QTL associated with yield and other agronomic traits further identified 777 intrachromosomal marker pairs in significant LD (0.11%) (Supplementary Table S5). Chromosome P1 had the highest number of pairs in significant LD (61,708; 80.1%) whereas chromosome P4 had the least (41,763; 78.9%). The mean coefficient of LD (r2) across all pairs of SNP markers was 0.25. A total of 30,111 (4.4%) pairs of loci were in complete LD (r2 = 1.0). Across the individual chromosomes, the mean distance of marker pairs in significant LD ranged between 2.8 (chromosome P2) and 7.25 Mb (chromosome P9). The average distance (in Mb) of marker pairs in complete LD was highest for chromosome P9 (8.88), followed by P5 (8.55), and P11 (8.07). The LD began to decay at a physical distance of ~2.82 Mb (critical value, r2 = 0.25) across the whole genome for the NM-CAMP (Figure 2C). Decay for LD varied across the different chromosomes, ranging between 0.81 Mb (chromosome P8), and extended up to 8.94 Mb (chromosome P5). Across the different PCA groups, distances at which LD decay began were highest for Cluster III (2.74 Mb), followed by Cluster I (1.69 Mb). Cluster II, comprised of NMSU chile pepper cultivars and chiltepin, had the least LD decay distance among the groups at 0.24 Mb.

Figure 2. Bar plots for the admixture indices for each individual in the New Mexico Capsicum association mapping panel (NM-CAMP) for (A) K = 5 clusters. Clusters 1, 3, and 5 comprised of the C. annuum and C. annuum var. glabriusculum (chiltepins), whereas cluster 2 consisted of the C. baccatum, C. chacoense, C. chinense, and C. frutescens complex. Cluster 4 comprised exclusively of the chiltepins. (B) Inference for the optimal number of clusters for the NM-CAMP using the Evanno method revealed K = 5 with the highest value for ΔK; (C) Decay plot for linkage disequilibrium (LD). The red dashed line represents the critical value for LD (r2 = 0.25) and the blue solid line corresponds to the non-linear regression curve. The intersection between the critical value and the regression curve is the distance at which LD starts to decay (~2.82 Mb).
Quantitative Trait Locus and Candidate Gene Identification

Using six multi-locus GWAS models, 215 QTL distributed in 12 chromosomes were identified to be associated with yield and other agronomic traits for the NM-CAMP based on an LOD score threshold of > 3.0 (Supplementary Table S6). Of the identified QTL, 86 (40%) were detected using at least two different models, traits, and/or environments. The multi-locus GWAS model, pKWmEB, resulted in the greatest number of significant SNPs identified (75), whereas the FASTmrEMMA had the fewest QTL (31). Chromosome P3 had the highest number of identified QTL (28), followed by P2 (24), and P1 (23). Overall, 60 QTL were associated with TPW, whereas 29 and 23 QTL were related with GYP and TYP, respectively. Among the plant morphology-related traits, PHT had a greater number of identified QTL (54) compared to PWDTH (27); whereas for the phenology-related traits, FPD had more QTL (23) than FT (20). A total of eight multi-trait, multi-model QTL were detected in at least two environments (Table 4). The traits PHT and PWDTH shared a common QTL mapped on chromosomes P2 and P11, whereas TYP and GYP had multi-trait QTL on chromosomes P2 and P6. Time to flowering and first pod date also shared common QTL on P1, P6, P7 (Figure 3). All six multi-locus models identified the multi-trait QTL SCM002817.1_37397491 (P6), associated with FPD and FT and related with 2.9–11.9% of phenotypic variation. The QTL SCM002815.1_213170635 (P4) was associated with FPD and PHT and explained ~6.0% of variation for the traits. A total of 40 candidate genes for C. annuum were identified for the multi-trait QTL (Supplementary Table S7). These candidate genes were associated with at least 60 diverse biological functions related to defense response, metabolic processes, oxidation-reduction, phosphorylation, and gene silencing, among others. A candidate gene for the multi-trait QTL SCM002817.1_37397491, T459_13341, mapped on chromosome P6 and associated with FPD and FT, is predicted for epigenetic mechanisms namely chromatin organization and methylation in C. annuum. Another candidate gene for SCM002817.1_37397491, T459_07853, has function associated with the remodeling of chromatin.

Figure 3. Genome-wide quantitative trait loci (QTL) associated with different phenology and plant morphology-related traits using multi-locus models in chile pepper (threshold LOD score >3.0; horizontal dashed line). Loci in red text are multi-trait QTL. The QTL identified by multiple GWAS models are indicated by pink dots above the significance threshold. COM- combined values across environments; CRU- Las Cruces, NM.
Table 4. Multi-trait quantitative trait loci (QTL) identified for the New Mexico Capsicum association mapping panel.


Since its first application in crops two decades ago [74,82], GWAS has become one of the foundational approaches for modern molecular plant breeding. The method itself complements traditional linkage mapping based on biparental populations by providing a higher mapping resolution, often to the gene level, because of higher recombination events that occur in natural populations [83]. The genetic architecture of complex traits such as yield and the often-stringent thresholds in declaring significant associations in single-locus GWAS models, nevertheless, can be limiting when implementing these mapping approaches. In the current study, multi-locus association mapping uncovered the genetic architecture of major yield and agronomic traits in chile pepper. Candidate genes underlying these QTLs were further identified, rendering deeper insights into the genetic basis of these complex traits in Capsicum.

Population Structure in Chile Pepper was Related to Genetic Differentiation Based on Species

Population stratification for the NM-CAMP was inferred using a Bayesian iterative algorithm approach implemented in STRUCTURE. Accordingly, the optimal number of clusters K derived from using an ad-hoc Evanno criterion was observed to be at K = 5, where the C. frutescens, chinense, baccatum, and chacoense formed a complex, whereas the C. annuum was further divided into four distinct clusters. Our results support recent phenotypic and phylogenetic analyses by Parry et al. [84] that demonstrated a more complex genetic relatedness within the C. annuum, whereas the other Capsicum species are more clustered together. These observations highlight the high genetic similarity present in the different species grouped in the same cluster, and the potential for crop improvement by using the wild relatives having close genetic relationships for hybridizations [84]. As a species, the genetic variability of the C. annuum was demonstrated on another study where the annuum genotypes clustered into two different groups based on pod types (New Mexican, jalapeno, ornamental types, etc.) [57]. The “biological significance” of the number of inferred clusters is of prime relevance when interpreting results from STRUCTURE, as the optimal K might not essentially reflect the ideal number of groups, a consequence of it being determined exclusively by a predetermined sampling scheme [68]. Differences in terms of the inferred optimal K value and the number of groups based on other approaches, such as PCA, have been observed in the Capsicum. For instance, in a recent genetic diversity study using a different population of New Mexican chile pepper consisting of 165 genotypes, we observed that K = 2 is the optimal K; however, PCA revealed four different groups based on species [57]. Analysis of linkage disequilibrium showed a moderate LD decay (~2.82 Mb) for the population indicating that a higher number of markers might be necessary in implementing association mapping for the whole population. In the present study, the C. annuum group (PCA Cluster III) showed the highest extent of LD (2.74 Mb), whereas in other studies, a rapid LD decay was observed for the annuum [57,85]. It should be noted that LD, as a phenomenon itself, is affected by multiple factors including the degree of inbreeding, population structure, recombination, rare alleles, etc., where one factor is not considered more dominant over the other [11,74,86]. The presence of potential selective sweeps, as demonstrated by the Tajima’s D values, could have also affected the genome-wide LD on the respective species. Among the PCA groups, Cluster I, comprised of the baccatum, chinense, and frutescens complex, had the lowest value for Tajima’s D. Similarly, the C. frutescens and C. chinense have been observed previously to have low Tajima’s D [57,87], indicative of the presence of rare alleles and differences in allele frequencies, which could have resulted in a more extensive LD compared to the other species. A kinship-principal component (K-PC) model (number of PC = 3) was utilized to correct for genetic (population) stratification, that can cause genome-wide LD between unliked loci consequently resulting to confounding effects for GWAS [29], to identify QTL for yield and agronomic traits in chile pepper.

Co-localization of Quantitative Trait Loci with Previously Identified Significant Marker-trait Associations

In the past, only few single-locus association mapping studies that focused exclusively on a set of traits such as capsaicinoid content, fruit morphology, and fruit weight have been conducted in chile pepper, where QTL in chromosomes P1, P2, P3, P4, P7, P8, P9, and P10, among others, have been identified [21–23,88,89]. Among these traits, five candidate genes, namely, pAMT, C4H, 4Cl, CSE, and FatA, involved in various biosynthetic pathways, were identified for capsaicinoid content [88]. Phenotypes such as total yield, yield components, and phenology-related traits, including FT, have not been previously evaluated. In the present work, 215 QTL were identified across the 12 chromosomes of chile pepper for yield and agronomic traits using six multi-locus methodologies. Among these QTL, eight multi-trait loci distributed in six chromosomes were identified across multiple GWAS models and environments, which can be targeted for future molecular breeding. These identified SNP markers can be used for designing allele specific assays to be used in MAS.

Yield and yield components

In this study, the TYP and its component trait, GYP shared genetic control on chromosomes P2 and P6, whereas the phenology-related traits, FPD and FT, had common QTL on chromosomes P1, P6, and P7. These QTL colocalized with markers previously identified to be associated with various agronomic traits in chile pepper, indicating potential effects of linkage due to genomic location or pleiotropy. The QTL SCM002817.1_189395041, associated with both GYP and TYP, for example, is located upstream of a cluster of SNP markers associated with fruit weight in chromosome P6 of C. annuum [21].

Single-trait QTL such as SCM002819.1_133495639, associated with TPW in chromosome P8 (Supplementary Table S6), was mapped closed (~0.09 Mb) to a SNP marker previously related to weight per fruit in a pepper mapping population [90]. Two other QTL for TPW, namely, SCM002816.1_169617895 and SCM002823.1_197062622, were mapped in proximity (~1.08 and 3.73 Mb, respectively) to loci associated with weight per fruit in chromosomes P5 and P12 [90]. A study by Lee et al. [23] identified a fruit weight-related locus, S06_194967541, on chromosome P6 downstream of the QTL SCM002817.1_189395041, associated with TYP and GYP in the NM-CAMP. More similar to yield, significant QTLs for total fruit weight were reported on chromosomes P1, P2, and P3 by Dwivedi et al. [91] using biparental mapping. High broad-sense heritability (H2 = 0.97) has been reported for fruit shape (i.e., fruit length divided by fruit width) in chile pepper [92], and a QTL on chromosome P3 has been reported to contribute to this trait [93], in addition to QTLs on P8 and P11. We observed high H2 value (0.88; COM) for TPW, which is supported by previous work demonstrating a strong genotype contribution to the variability of fruit weight, as compared to the environment [94]. A single-trait QTL, SCM002819.1_133495639, associated with TPW on chromosome P8, was mapped closed (~0.09 Mb) to a SNP marker previously related to weight per fruit in a pepper mapping population [90], but was in contrast to previous findings [91,93,95].

Plant architecture and morphology

Plant height for the NM-CAMP has been found to have moderate to high broad-sense heritability, ranging between 0.61 and 0.97. These results were consistent with previous observations for PHT, where an H2 value of 0.91 and narrow-sense heritability (h2 = 0.55) suggesting additive effects, were reported [92]. Utilizing a RIL population, Han et al. [93] identified plant height associated QTLs on chromosomes P2, P4, P6, P7, and P8 as well as QTLs on P2 and P5 associated with plant width, supporting our findings of significant QTL on chromosomes P2 and P4 for plant height and width. Conversely, Dwivedi et al. [91] identified a QTL on P5 contributing to 47% of the total phenotypic variance for plant height. Altogether, our results indicated that QTL controlling these traits across different pepper populations are located on similar genomic regions suggesting pleiotropy and/or linkage due to genetic distance and location. Moreover, we demonstrated the potential of multi-trait genetic improvement in chile pepper by targeting breeding and selection across these specific regions of the genome.

Epigenetic Factors Might Play a Role for Phenology-Related Traits in Chile Pepper

The genetic complexity of the evaluated traits was further revealed by the diverse functions of the candidate genes predicted for the identified QTL. Epigenetic mechanisms including methylation and chromatin remodeling and organization were predicted for the candidate genes for the QTL SCM002817.1_37397491 mapped on chromosome P6 and associated with FPD and FT. The role of epigenetic factors in gene expression in chile pepper is currently not well established, although a denser methylation profile for chile pepper compared to the potato (Solanum tuberosum) and tomato (S. lycopersicum) genomes has been documented [96]. Recent evidence on the potential involvement of epigenetic mechanisms on different traits (e.g., disease resistance) in Capsicum have been reported. Results from transcriptomic, QTL mapping, and meta-analyses supported the putative effects of these epigenetic factors, particularly of their association with a cluster of major Phytophthora capsici resistance loci in the short arm of chromosome P5 of chile pepper [57,97,98]. The involvement of non-coding RNAs (ncRNAs) and cytosine methylation in regulating fruit development and ripening in chile pepper has also been suggested [99,100]. It might be relevant, therefore, to examine the epigenomic landscape of the Capsicum spp. to gain a better understanding of gene expression and how this process is regulated by different epigenetic mechanisms to affect the terminal phenotype.


The potential of using multi-locus GWAS approach in dissecting the genetic architecture of complex traits in chile pepper was demonstrated in the current study. Genome-wide QTL were identified for yield, plant morphology, and phenology-related traits. Candidate gene analysis revealed diverse functions related to different biological processes including oxidation-reduction, phosphorylation, defense response, and epigenetic mechanisms demonstrating the genetic complexity of the evaluated traits. The QTL identified in this study will assist in developing molecular markers and genome-wide prediction for breeding and selection of traits toward genetic improvement in Capsicum. Validation of the QTL identified from GWAS will be performed using biparental and other diverse populations in the NMSU chile pepper breeding program.


The following supplementary materials are available online, Supplementary Table S1: Genotypes for the New Mexico Capsicum association mapping population and their inferred clusters based on hierarchical and Bayesian-based model clustering for number of clusters, K = 5. The inferred group designation for each genotype is highlighted. Supplementary Table S2: Mean of squares from analysis of variance (ANOVA) for yield and agronomic traits in chile pepper. Supplementary Table S3: Phenotypic variation for the New Mexico Capsicum association mapping panel. Supplementary Table S4: Overview of linkage disequilibrium for the intrachromosomal marker pairs in the New Mexico Capsicum association mapping population. Supplementary Table S5: Coefficient of linkage disequilibrium (r2) for intrachromosomal SNP loci associated with yield and other agronomic traits in significant LD (P < 0.05). Supplementary Table S6: Significant SNP markers identified for the New Mexico Capsicum association mapping panel using multi-locus GWAS approaches. Supplementary Table S7: Candidate genes for the multi-trait quantitative trait loci (QTL) and their potential biological functions for C. annuum.


The dataset of the study is available from the authors upon reasonable request.


DNL conceived this research, analyzed the genotype and phenotype data, and wrote and first draft of the manuscript. DWB co-wrote the manuscript. MB performed population structure and genetic diversity analyses. DC collected phenotypic data and processed samples for DNA extraction and genotyping-by-sequencing. PWB edited the manuscript.


The authors declare that they have no conflicts of interest.


This research was funded by USDA-Hatch Program, Accession # 1025360 and USDA-NIFA Grant No. 2022-67014-37078.


The authors would like to thank Samuel Hernandez, Zachary Edwards, Guillermo Nunez, and Charles Havlik for their assistance in collecting phenotypic data for the NM-CAMP.






































































































How to Cite This Article

Lozada DN, Barchenger DW, Coon D, Bhatta M, Bosland PW. Multi-locus Association Mapping Uncovers the Genetic Basis of Yield and Agronomic Traits in Chile Pepper (Capsicum spp.). Crop Breed Genet Genom. 2022; 4(2):e220002.

Copyright © 2022 Hapres Co., Ltd. Privacy Policy | Terms and Conditions