Sign in Register Submit Manuscript

Hapres Home

Location: Home >> Detail

Crop Breed Genet Genom. 2020;2(4):e200017.


Altered Steroidal Glycoalkaloid (SGA) Biosynthesis in Diploid Potatoes as Induced by EMS-Mutagenesis

Ashok Somalraju 1, Kaushik Ghose 2, David Main 1, Jason McCallum 1, Benoit Bizimungu 3, Bourlaye Fofana 1,*

1 Charlottetown Research and Development Centre, Agriculture and Agri-Food Canada, Charlottetown, Prince Edward Island, C1A 4N6, Canada

2 Department of Plant and Soil Science, Texas Tech University, Lubbock, TX 79409-2122, USA

3 Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, E3B 4Z7, Canada

* Correspondence: Bourlaye Fofana, Tel.: +1-902-370-1454.

Received: 04 June 2020; Accepted: 16 October 2020; Published: 29 October 2020


Steroidal glycoalkaloids (SGAs) can be toxic to humans at more than 100 mg/100 g dry weight of potato tubers. The objective of the current study was to characterize phenotypically and genotypically a subset of 1750 ethyl methane sulfonate (EMS)-mutagenized diploid potato clones previously reported in our group for altered SGA production. The study reports on a wide range of SGA profiles in 246 EMS-mutant lines, of which 14% showed lower SGA content than the wild types and commercial varieties. An Ampliseq gene panel sequencing of 9 key SGA biosynthetic genes from 87 EMS-treated lines showing varied SGA profiles revealed 61 unique functional SNP mutations in 56 unique EMS-treated individual lines. Mutational frequencies in the target genes ranged from 1/16 kb (SGT2) to 1/341 kb (GAME7), with an average of 1/47 kb. Among these mutations, mutations were detected in GAME7, GAME6, GAME11, GAME4CH6, GAME4CH12, and SGT3 genes of low SGA EMS-treated lines, genes deemed essential for steroidal aglycone hydroxylation, oxidation, and solanidine glycosylation. Subsequent comparative transcriptomic analysis of a low SGA mutant line and a high SGA wild type line showed significant downregulation of UDP-glycosyltransferases and cytochrome P450s expression in the low SGA EMS-mutant line. Using EMS-mutagenesis, this study is the first to show evidence of an effective alteration of SGA production in diploid potato tubers and paves the way for more functional analysis of this mutant population as well as diploid mutant potato cultivar development.

KEYWORDS: diploid; EMS; GAME genes; glycoalkaloids; mutations; potato; SGAs; RNA transcriptome


SGA, Steroidal glycoalkaloid; EMS, Ethyl methane sulfonate; GAME, Glycoalkaloid metabolism; SGT1, Solanidine galactosyltransferase; SGT2, Solanidine glucosyltransferase; SGT3, Rhamnosyltransferase; KASP, Competitive allele specific PCR


Potato (Solanum tuberosum L) is an important tuber crop of the Solanaceae family. It is the most important vegetable crop in Canada, with a total production of 4.8 million tons and a total export estimated at $1.72 billion in the 2016–2017 cropping season [1]. Potatoes are a rich source for carbohydrates, vitamins, minerals, micronutrients and dietary fiber. Despite its dietary and nutritional importance, potato products may contain anti-nutritional factors such as steroidal glycoalkaloids (SGAs) [2,3] as well as acrylamide-forming factors including asparagine [4] that is synthesized by asparagine synthetase 1 and 2 (AS1 and AS2) [5,6]. Because of its wide cultivation and consumption worldwide, management of anti-nutritional factors in potato products is considered as a priority for producers and food industries [6].

SGAs are naturally occurring secondary plant metabolites produced by solanaceous crops [7]. They constitute an elaborate class of compounds in the plant defense mechanism [8] but are toxic to humans at higher concentrations [9]. The acceptable level of SGAs in potato tubers for human consumption is 20 mg/100 g fresh weight (FW) or 100 mg/100 g dry weight (DW), a level established as the upper limit for the release of commercial potato cultivars by USDA [10,11]. SGA poisoning is reported to cause various health conditions, including gastrointestinal disorders, hallucinations, partial paralysis, coma, and even death [12]. During food processing, SGAs are stable and not destroyed by cooking [9]. Several factors including drought [13], high temperature [14], tuber exposure to light, and wounding are reported to increase SGA content in potato tubers [15,16]. SGA content can vary depending on potato varieties and plant tissues during development, with the highest SGA levels observed in the floral and sprouting tissues [17]. A recent study by Krits et al. [18] suggested that SGA biosynthesis occurs mainly in the innermost phelloderm cell layers. To date, as many as 80 different SGAs have been identified in different potato varieties [19]. Alpha-solanine and α-chaconine constitute the major SGAs in potato, accounting for 95% of the total SGAs encountered in cultivated potatoes [20,21]. The ratio of α-solanine to α-chaconine in cultivated potatoes can widely vary depending on the anatomy or cultivar [22]. Structurally, potato glycoalkaloids are nitrogen-containing steroids with a 27 carbon (C27) solanidine (aglycone) backbone to which carbohydrate moieties are attached at the 3-OH position [23–25]. In potatoes, trisaccharides solatriose and chacotriose are attached to the aglycone solanidine at the 3-OH position to form α-solanine and α-chaconine [9,23,26].

SGA biosynthesis can be divided into three major steps: the first step occurs in the cytosol and starts with the condensation of acetyl CoA into mevalonic acid, HMG-CoA, and C5 isoprenoid units. This step is part of the primary metabolism in the mevalonate/isoprenoid pathway, and is catalyzed by a family of enzymes including 3-hydroxy-3-methylglutaryl coenzyme-A synthases and 3-hydroxy-3-methylglutaryl coenzyme-A reductases (HMGR) [27–29]. The C5 isoprenoid units are further condensed to form squalene and 2,3-oxidosqualene from farnesyl diphosphate, and the reactions are mediated by squalene synthase (SQS) and squalene epoxide (SQE) [18,29]. At the end of this step in primary metabolisms, lanosterol synthase (LAS) and cycloartenol synthase (CAS) convert 2,3-oxidosqualene into lanosterol and cycloartenol [28,29]. Cycloartenol leads to various phytosterols including cholesterol, sitosterol, stigmasterol and campesterol, of which cholesterol is the widely accepted precursor for glycoalkaloid formation [30]. Synthesis of cholesterol from cycloartenol is reported to be mediated by sterol side chain reductase 2 (SSR2) and Δ(7)-sterol-C5-desaturase (C5-SD) genes [31,32]. The second step in SGA biosynthesis is the conversion of cholesterol to solanidine. It occurs in the cytosol [19,33] and is considered as secondary metabolism. It is catalyzed by a group of glycoalkaloid metabolism (GAME) genes namely GAME4, GAME6, GAME7, GAME8, GAME11, GAME12 located on chromosomes 6, 7 or 12 [24,29,34]. GAME7 and GAME8 genes are involved in the hydroxylation of cholesterol at C22 and C26 positions, resulting in 22,26-dihydroxycholesterol formation [34]. This is followed by hydroxylations and oxidations at the C16 and C22 positions by GAME11 and GAME6, forming a furostanol-type aglycone intermediate [19,34,35]. This intermediate is further oxidized and transaminated by GAME4 and GAME12 genes, leading to the aglycone which acts as the precursor in the formation of SGAs [34]. The third step of glycoalkaloid metabolism involves the glycosylation of solanidine at the 3-OH position by the addition of sugar moieties and is catalyzed by a set of glycosyltransferases, namely solanidine galactosyltransferase (SGT1), solanidine glucosyltransferase (SGT2), and rhamnosyltransferase (SGT3), leading to the formation of α-solanine and α-chaconine [19,36].

The majority of the genes involved in the second and third steps of SGA biosynthesis are located on chromosomes 7 and 12 [24,34]. Among these genes, chromosome 7 harbours most of the genes including GAME6, GAME7, GAME11, SGT1, and SGT3, whereas chromosome 6 carries GAME8a, GAME8b and chromosome 12 contains GAME4, GAME12 [37]. In potato, two GAME4 have been reported in Phytozome (, with one on chromosome 6 referred to as GAME4CH6 (PGSC0003DMG400010882), and one on chromosome 12, referred to as GAME4CH12 (PGSC0003DMG400024274), both of which are predicted with cytochrome monooxygenase oxidation activities. GAME9 is a transcription factor reported to be involved in the transcriptional upregulation of multiple SGA biosynthetic genes [25,32,38]. Cárdenas et al. [32] demonstrated that GAME9 regulates the expression of SGAs in both tomato and potato. Functional studies of SGTs have shown that downregulation of SGT1 gene in potato reduced the accumulation of α-solanine and increased α-chaconine content, thereby with no overall change in total glycoalkaloids [28]. Conversely, down-regulation of SGT2 lowered α-chaconine content and elevated α-solanine levels [36]. Moreover, Itkin et al. [34] showed that RNAi-mediated GAME4 gene silencing substantially reduces SGA content in potato leaves and tubers and the level remains unchanged even after exposure to light. RNAi-induced gene silencing of the St16DOX (2-oxoglutarate-dependent dioxygenase) gene in transgenic potato lines was also shown to significantly reduce SGAs without affecting tuber yield [24]. Approaches including transcription activator-like effector nucleases (TALENs) [31] and transgenic approaches have also been used successfully in developing low SGA producing potato lines [39].

While genetic engineering approaches are exciting among the scientific community for improving potato tuber characteristics, transgenic potato acceptability by consumers is a challenge. Given their role in plant defense, a complete elimination of SGAs from the plant is not ideal. However, due to the negative anti-nutritional effect of SGAs on human health, the use of consumer-accepted approaches to reducing tuber SGA content in potato cultivars is a prime goal for all breeding programs. The traditional breeding methods to breed-out anti-nutritional factors are labour intensive and time consuming, whereas transgenic approaches are of concern to the public. Moreover, cultivated potato (S. tuberosum L.) is an auto tetraploid crop species with a narrow and highly heterozygous genetic background [40], making its genome complex and difficult to work with, especially for curating small recessive and deleterious mutations from breeding lines [41]. Because potato is highly susceptible to in-breeding depression, it is usually propagated clonally as heterotic F1 hybrids [42,43]. Moreover, widespread self-incompatibility in diploid potato hampers its self-fertilization [43]. Recently, as a proof-of-concept, we have developed and morphologically characterized an EMS-mutagenized pre-breeding diploid potato population, for use in diploid and tetraploid potato breeding [44]. The objective of the current study is to characterize this population at the genetic and phenotypic levels for the identification of lines carrying EMS-induced mutations in SGA biosynthesis genes and displaying altered SGA content in the tubers. Using a panel of SGA biosynthesis genes in a reverse genetics approach and SGA phenotyping, we identified mutant lines with an altered SGA content and showed that mutations in genes including SGT3 and GAME4CH6 may be sufficient to lower the SGA levels in potato tubers.


Plant Materials

The plant materials used in this study were derived from a EMS-mutant population previously developed and described by our group and in which a large phenotypic variation in terms of plant architecture, leaf and stem color, flower color, tuber color, tuber shape and size, as well as yield potential were observed in EMS-treated lines compared to the non-EMS treated controls [44].

Leaf samples were collected from 1776 EMS-treated and 242 non-EMS treated seedlings grown in a greenhouse and were stored at −80 °C for further processing. To maximize throughput and minimize the sequencing efforts, a pooling strategy with 3 to 10 samples per pool was adopted as a first step for pre-screening using four targeted genes SGT1, SGT2, AS1, and AS2. To achieve this, a 6.7 mm diameter leaf disk was sampled from 1,750 of the 1776 EMS-treated, and 140 of the 242 non-EMS treated lines (representing 20 non-EMS treated lines for each of the 7 crosses) and arranged in 2 × 96 well plates containing glass beads. Leaf samples were freeze-dried for pooled DNA extraction. In a second step, based on the data from step 1 screening, single 6.7 mm diameter leaf disks were sampled from individual plants of 364 EMS-treated lines, focusing preferentially on lines that revealed mutations in more than one target gene in the pools, along with 20 non-EMS treated lines (2–3 lines per cross). The samples were arranged individually in 4 × 96 well plates containing glass beads. Leaf samples were freeze-dried for DNA extraction. After the first pre-screening, lines showing mutations only in AS1 and/or AS2 were dropped from the study, and 46 other EMS-treated lines were selected based on a set of desired agronomic traits including tuber forming ability in the field, tuber availability, resistance to light-induced tuber greening, or resistance to Roundup herbicide (data not shown).

Reverse Genetics DNA extraction and targeted gene amplification

Dried leaf materials in the 96 well plates were disrupted and homogenized in a TissueLyser II (Qiagen, Hilden, Germany) at 30 Hz for 1 min and genomic DNA extraction was performed on a KingFisher® 96 robotic system (Thermo Fisher Scientific, Carlsbad, CA, USA) using the Sbeadex Maxi Plant Kit (LGC, Berlin, Germany) according to the manufacturer specification. Four target genes were used to pre-screen the population as a first step (Supplementary Table S1). To enhance the specificity and robustness of the amplification using multiplex PCR, a nested PCR approach was used. The first round of multiplex PCR (PCR I) was run to pre-amplify SGT1, SGT2, AS1, and AS2 genes simultaneously under long range PCR conditions (Supplementary Table S2). The reaction mixture contained 50 ng of genomic DNA, 5 pM of each primer, 4 µL of 5× MyTaq buffer (Bioline, Luckenwalde, Germany), 2 µL of BioStab II PCR optimizer (Sigma Aldrich, Oakville, CA, USA), 1 mM ATP, 20 ng of E.coli SSB (MCLAB, South San Francisco, CA, USA), 20 ng Tth-RecA (MCLAB, South San Francisco, CA, USA), 1.2 U of MyTaq (Bioline, Luckenwalde, Germany), 0.4 U of HerculaseII Fusion (Agilent, Waldbronn, Germany), in a total reaction volume of 20 µL. Cycling conditions consisted of 2 min at 96 °C for pre-denaturation, followed by 20 cycles of 15 s at 96 °C for denaturation, 15 s at 60 °C for annealing, 5 min at 68 °C for extension, and 8 °C for holding. Then, the reaction products from this pre-amplification multiplex PCR I were digested by mixing with 20 µL of 1× Exonuclease buffer containing 2 U of ExoI (NEB, Frankfurt, Germany), incubated at 37 °C for 30 min, and inactivated for 5 min at 95 °C. The digestion products were purified from Long-range PCR primers and diluted.

In a second round of Multiplex PCR (PCRII), 24 amplicons covering all the 4 genes were generated by simultaneously amplifying 2–3 regions of the genes of interest in a total of 10 multiplex PCRs using 48 primers designed in 4 different versions, each elongated at the 5’ end by an inline barcode (Supplementary Table S3). The reaction mixture was as in multiplex PCRI, except that it contained instead 1 µL of the purified Multiplex PCR I products as template, 1 U of MyTaq (Bioline, Luckenwalde, Germany), and 0.2 U of HerculaseII Fusion (Agilent, Waldbronn, Germany) in a 20 µL reaction volume. Cycling conditions were 1 min at 96 °C for pre-denaturation, followed by 20 cycles of 15 s at 96 °C for denaturation, 15 s at 55 °C for annealing, 90 s at 70 °C for extension, and 8 °C for holding. To purify the amplicons, one 96-well plate of amplicon pools was generated by combining 5 µL per well from all 4 Multiplex PCR II plates and mixed. A half volume (120 µL) of the pooled amplicon was purified using Agencourt AMPure XP beads (Beckman Coulter, Krefeld, Germany). In short, 80 µL of AMPure beads were added to each well, mixed by pipetting, and incubated for 10 min at room temperature. The sample was incubated on a magnet for 5 min to collect magnetic beads, and the supernatant was removed. The beads were washed twice with 200 µL of 75% ethanol, air dried, and the DNA was eluted from the beads using 20 µL of 5 mM Tris buffer (pH 9).

Library construction and sequencing

Using the purified amplicons, a total of 96 Illumina libraries with different adaptor indices were constructed using the Ovation Rapid DR Multiplex System 1–96 (NuGEN Technologies, Inc., Redwood, CA, USA) following the kit manual. The libraries were purified, pooled, and fragments in the 300–600 bp range were gel-purified to remove any non-specific amplicons. The final library was sequenced as 300 bp paired-end reads using an Illumina MiSeq V3 Chemistry (Illumina, San Diego, CA, USA).

Bioinformatics analysis of sequence reads and EMS-induced SNP discovery

After the sequencing run, reads were pre-processed by demultiplexing the reads using Illumina's CASAVA v1.8.2 data analysis software (Illumina Inc.). The barcodes were sorted and trimmed, and individual FASTQ files were generated for each sample. To confirm the reference identity of the wild type target sequences, reads from the 3 wild type plants in each cross were aligned to the genomic reference sequence for each target gene (Supplementary Table S1) using Bowtie2 v2.2.3 [45] with default parameters in “local” alignment mode. Next, a consensus sequence for each target gene was generated from the non EMS-treated controls for each cross and was aligned with its respective reference sequences obtained from phytozome (Supplementary Table S1). The alignments were coordinate-sorted and saved in a standard Binary Sequence Alignment/Map (BAM) format. Subsequently, SAMtools mpileup followed by fixed cutoff variant calling and genotyping ( was used to call SNPs. To detect EMS-induced SNPs in the EMS-treated lines, reads from all the 1750 (first round sequencing) or 364 (second round sequencing) EMS-treated lines were aligned using Bowtie2 v2.2.3 as described earlier with both the phytozome reference and the cross-specific consensus sequences, and raw and filtered SNPs were called. SNPs causing codon changes were identified using SnpEff 4.0e ( Exon positions described for each target gene in Phytozome were used to set up a SnpEff database for automatic annotation of VCF files following SnpEff & SnpSift manual [46] and an allele table was constructed for all samples, and mutation rates, mutational effects of missense and nonsense mutations were determined as previously reported [47,48]. To validate some of the detected SNPs, Kompetitive Allele Specific PCR (KASP, LGC Genomics, Beverly, MA, USA) assay was performed to target 22 SNP mutations (Supplementary Table S4) in 49 potato lines shown to carry SNPs mutation in AS1, AS2, SGT1, or SGT2. The KASP assays were performed and analyzed as described by Fofana et al. [48].

SGA Phenotypic Characterization in Mutant Lines Plant materials

After generating and characterizing the potato mutant population at the morphological levels [44], it was of interest to determine whether glycoalkaloid biosynthesis is altered in the pre-screened and reduced population. Among the 364 EMS-most putative mutants detected after the pre-screening sequencing and the 70 non-treated lines, a total of 246 EMS-treated lines and 21 non-treated control lines across the 7 crosses were selected for total SGA quantification using UPLC. The selection of these EMS-treated lines was based on the observed phenotypic characteristics in the field, tuber yield, tuber availability, presence of positive mutations in SGT1 and/or SGT2 from the second pre-screening sequencing data. Three commercial varieties Atlantic, Yukon Gold and Russet Burbank were also analyzed and used as references.

SGA extraction and quantitation

A modified Quick, Easy, Cheap, Effective, Rugged, and Safe (QuEChERS) method [49] was employed for glycoalkaloid extraction. Three tubers per line, each representing a biological replicate, were used. Tuber periderm tissue samples were used for total SGA quantification. To avoid tissue sampling on one side of the tuber surface, four cores were cut at random from different sides of each tuber using a 1.5 cm diameter cork borer, and the 2 mm thick periderm was separated from the core, pooled and lyophilized. The lyophilized samples (79–259 mg) were blended using a polytron (Kinematica, Luzern, Switzerland) at 30,000 rpm in a 20 mL mixture of water: acetonitrile (5:15), acidified with 1% formic acid. Samples were incubated in a sonic bath at 37 °C for 1 h before the addition of 3 g of QuEChERS salts (2.0 g MgSO4, 0.7 g sodium citrate dihydrate (C6H5Na3O7·2H2O), and 0.3 g NaCl). The samples were then vigorously mixed by vortex and subsequently briefly centrifuged at 1200 rpm, to promote phase separation, with the upper acetonitrile layer containing the extracted glycoalkaloids. The upper acetonitrile organic phase was collected, further diluted with acetonitrile as required for calibration curve linearity, and SGA content was analyzed and quantified by UPLC-MS on a Waters Acquity H-Class UPLC and Tandem Quadrupole (TQD) MS system (Waters, Milford, MA, USA). Briefly, a Waters Acquity BEH-amide column (2.1 × 150 mm, 1.7 um particle size) was used for separations, at a flow rate of 0.500 mL/min and column temperature of 45 °C, using the following binary gradient: t0, B = 90%; t6 min, B =65%; t7 min, B = 50%, t8 min, B = 50% (isocratic); t9 min, B = 90%; t10 min, B = 90% (isocratic), where A consisted of 10 mM ammonium formate (NH4HCOO) with 0.1% formic acid in water, and B was acetonitrile, modified from Nikolić et al. [50]. Using this BEH-amide column method, α-chaconine and α-solanine were fully baseline separated with 1 min between peaks, and with all glycoalkaloid peaks eluting by 5.5 min. Electrospray ionization mass spectrometric analysis (ESI-MS) was conducted in positive ion mode (PIM), with 3.9 kV capillary voltage, 90 V cone voltage, 3 V extractor, 150 °C source temperature, nitrogen desolvation gas temperature of 430 °C and flow rate of 500 L/h, and cone gas flow for 3 L/h. A combination of PIM-ESI mass scans (120–1200 amu), a selective ion recording (SIR) for pseudomolecular ions [M + H]+ of α-chaconine (m/z 853), α-solanine (m/z 869), and solanidine (m/z 399) (Supplementary Figure S1), and multiple-reaction monitoring (MRM) for α-chaconine (853/707) and α-solanine (869/707) were used to identify peaks, using authentic standards for α-chaconine, α-solanine, and solanidine (ChromaDex, Irvine, CA, USA). For SIR scans, linear y= mx + b calibration curves of integrated areas under the curve were constructed (MassLynx, Waters, Mississauga, ON, Canada), relating total ion counts (TIC) vs concentration, achieving r2 = 0.996, 0.995, and 0.990 for α-chaconine, α-solanine, and solanidine, respectively (Supplementary Figure S1). The linear detector responses ranged from 4.1 to 0.01 ug/mL. The SGA content was expressed as mean ± standard deviation mg/100 g dry weight from three biological and technical replicates.

Targeted Gene Panel Ampliseq Sequencing of SGA Biosynthetic Genes and Mutation Validation

The glycoalkloid metabolism (GAME) gene clusters involved in the core SGA biosynthetic pathway have been reported using a comparative co-expression analysis of tomato and potato [34] and the core pathway starts prior to the glycosylation steps of the aglycone. Similar to previous reports [48,51], targeted Ampliseq sequencing was performed on 9 genes including SGT1 and SGT2 using the Ion PGMTM system (Thermo Fischer Scientific, Waltham, MA, USA) to determine whether EMS treatment induced mutations in any other SGA biosynthetic genes. The details of the 9 target gene structures are shown in Supplementary Figure S2. The plant materials used in this Ampliseq experiment consisted of 87 EMS-treated and 15 non-treated lines, of which 41 out of 87 EMS-treated lines were selected based on the presence of functional mutations in at least one or both of the SGT target genes after the first (pooled samples) and second (individual samples) Miseq amplicon sequencing screening. The remaining 46 EMS-treated lines were selected based on the agronomic traits including tuber forming ability in the field, tuber availability, resistance to light induced tuber greening, or resistance to Roundup herbicide (data not shown). DNA extraction from each of the 102 lines (87 EMS-treated and 15 non-treated) was performed using Mag-Bind® Plant DNA Plus 96 Kit (Omega Bio-tek, Norcross, GA, USA) as per the manufacturer’s guidelines. The extracted DNA was quantified using a Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, Carlsbad, CA, USA) as per the manufacturer’s specifications.

Custom Ion AmpliSeq panel design

A custom gene panel consisting of nine SGA biosynthesis genes was designed using Ion AmpliSeq Designer (IAD) software v.5.2 (Thermo Fisher Scientific, Carlsbad, CA, USA) as previously reported [48]. The gene panel included three SGT biosynthetic genes (SGT1, SGT2 and SGT3) and 6 GAME genes (GAME4CH6, GAME4CH12, GAME6, GAME7, GAME11 and GAME12) reported by Itkin et al. [34] and was used as targets for Ampliseq sequencing. The custom Ampliseq panel was designed in such a way that it covered 25,544 bp of the 25,669 bp potential target regions, achieving 99.4% coverage. The custom panel included a total of 107 primer pairs divided into two pools of 53 and 54 primer pairs each, respectively (Supplementary Table S5). The 25,669 bp target sequence was uploaded as a BED file and used as a reference for the initial screening of variants from the EMS-treated and non-treated potato plants. Next, similar to pre-screening sequencing, a cross-specific consensus sequence was generated for each target gene by aligning three non-treated controls for each cross and used as a reference to further screen and call true EMS-induced SNPs.

Ampliseq library preparation and sequencing

The quantified genomic DNA was normalized to 10 ng/µL for all the 102 potato lines. A sample pooling strategy was employed to minimize the sequencing efforts and Ampliseq sequencing was performed in two steps: In a first step, a total of 13 DNA pools containing each an equal amount of seven to eight target DNA samples were prepared from the 87 EMS-treated and 15 non-treated potato lines. Ion AmpliSeqTM Library Kit 2.0 (Thermo Fisher Scientific, Carlsbad, CA, USA) was used for the targeted amplification of the 13 DNA pools. For each of the 13 DNA pools, two separate multiplex PCR reactions were setup, one with 53 and the other with 54 AmpliSeqTM primer sets for target amplification. The resulting amplicons from the two PCRs were combined together for each DNA pool, partially digested with FuPa reagent (Thermo Fisher Scientific, Carlsbad, CA, USA) and ligated using IonXpressTM barcode adapters. The barcoded libraries were purified using Agencourt® AMPure® XP Kit (Beckman Coulter, Mississauga, ON, Canada) according to the manufacturer’s specification and further quantified by qPCR using Ion Universal Library Quantitation Kit (Thermo Fisher Scientific). The quantified libraries were equalized to 30 pM and used in template preparation for sequencing. Template preparation was performed using Ion PGMTM Hi-QTM Chef Kit (Thermo Fisher Scientific) on an Ion Chef Instrument (Thermo Fisher Scientific) as per the manufacturer’s instructions. Each sequencing template was prepared by pooling 6 to 8 barcoded libraries and loaded on to a 316 v2 chip, achieving a minimum of 250–300× coverage of the 107 targeted and generated amplicons. The sequencing was performed on an Ion Torrent Personal Genome Machine (PGM) (Thermo Fisher Scientific, Carlsbad, CA, USA) using Ion PGM HiQ Sequencing kit and 850 flows (Thermo Fisher Scientific).

In a second step, pooled libraries containing EMS-induced mutations were identified and their individual DNA samples were used as templates to construct individual AmpliSeq libraries for re-sequencing using the same gene panel as described above, but with a higher coverage (10,000–18,500×).

Read mapping and variants call for SNP detection

The sequencing data analysis was performed on the local IonTorrent Server using pre-installed Ion Torrent Suite Software v 5.6.0 (ThermoFisher Scientific) as in Fofana et al. [48]. The 25,669 bp target genomic region uploaded as a BED file was used as a reference for mapping the sequencing reads. A Torrent Variant Caller 5.8 plugin was selected for SNP and INDEL detection in target genes. The default variant caller plugin setup was selected with the parameters for a minimum value of allele frequency, variant quality score and coverage of the reads for INDEL identification set to 0.1, 10 and 10, respectively for the listed parameters whereas the same parameters were set to 0.1, 10 and 5 for SNP identification. Variant calls with allele frequency ≥50% were considered after the filtration criteria. The position of the variants, nucleotide changes, allele type and type of mutations in EMS-treated lines versus non-treated lines were summarized and reported. The mutation rates were determined as previously described and the mutational effects of the EMS-induced variations were predicted using Phyre 2 investigator software as reported previously by Ghose et al. [52].

Transcriptome Sequencing of High and Low Glycoalkaloid Potato Lines Plant materials and RNA extraction

Following the validation of the EMS-induced mutations and SGA quantification, low and high SGA potato lines were identified among the EMS-treated and non-treated lines, respectively. It was therefore of interest to determine if low and high SGA-producing EMS-treated and control lines differ at the transcriptional levels. A mutant line (EMS4_901) with low SGA content (50.4 mg/100 g DW) and a control line (CTL4_492) with high SGA content (165.1 mg/100 g DW) were used as plant materials for the RNA transcriptomic study. Three tubers from each low and high glycoalkaloid lines were used for skin tissue sampling, with each tuber considered as a replicate. The tubers were thoroughly washed under running water and cleaned with 70% ethanol prior to sampling. Tuber tissue was collected as per the method described by Grunenfelder et al. [15]. Upon collection, the tissue was immediately frozen in liquid nitrogen, and RNA extraction and purification were performed using SpectrumTM Plant Total RNA kit (Sigma-Aldrich, Toronto, ON, Canada) following supplier’s guidelines. The DNAse-treated total RNA was diluted in 100 µL nuclease-free water and supplemented with 1 µL of RNAse OUT. RNA quality was visualized on agarose gel and quantified using NanoDrop (Thermo Scientific, Madison, WI, USA) before shipping the ethanol-precipitated total RNA samples to Macrogen Inc. (Macrogen Inc, Seoul, South Korea) for transcriptome sequencing, following Macrogen’s Sample Submission Guide [53]. Once received, the RNA samples were re-pelleted, and the quality control was performed using a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA QC was ensured before moving to the library preparation as all samples had an RNA integrity number (RIN) ≥ 7.0 and for absence of DNA contamination.

Library construction, transcriptome sequencing, and data analysis

To perform the RNA transcriptome analysis, cDNA libraries were constructed following the Macrogen’s in-house workflow (Macrogen Inc, Seoul, South Korea). Briefly, using 1 µg total RNA as starting material, the cDNA libraries were generated with the TruSeq® Stranded Total RNA library preparation kit (Illumina). The libraries were quantified by qPCR and paired-ends (2 × 100 bp) sequencing was performed on a HiSeq2500 platform (Illumina, Inc). The raw sequencing read data were submitted to the GenBank Short Read Archive (SRA) database under the Bioproject ID# PRJNA533334.

The quality control check of the raw sequencing data at each cycle was done using FastQC V0.10.0 [54]. Trimming the data for the removal of adapters, low quality bases and shorter reads (<36) was carried out using Trimmomatic V0.32 program [55]. The reference genome sequence of Solanum tuberosum (GCF_000226075.1, SolTub_3.0) and the annotation data were downloaded from the NCBI. The reference genome was indexed using HISAT [56] and the trimmed paired end reads were indexed, aligned and mapped to the reference genome using HISAT [56]. After alignment, the processed reads were assembled into transcripts and their abundance was estimated using StringTie v1.3.3b [57,58]. StringTie infers the relative transcript abundance as values of Fragments per Kilobase of transcript per Million reads mapped (FPKM), which were further normalized to the library size. StringTie, as the successor of Cufflinks, was used to determine the known genes and transcript expression levels and to investigate novel transcripts and novel alternatively spliced transcripts, when they existed. Genes with FPKM = 0 across all samples were excluded. The FPKM +1 values were log2 transformed and quantile-normalized to produce more even data distribution and to reduce systematic bias from the pre-processed core libraries and reported as processed normalized FPKM. Differentially expressed genes (DEGs) analysis was performed based on the normalized processed FPKM data from the 12 paired-comparison including the three replicates and 2 time points in each of the two lines using scripts of the comprehensive R archive network (CRAN), version R 3.4.3 ( Statistical significance of the differential expression data was determined by performing independent t-test and fold change considering that there was no difference between groups null hypothesis. False discovery rate (FDR) was controlled by adjusting P-value using the Benjamini-Hochberg algorithm [59]. Genes that satisfied the absolute fold change (|fc|) of ≥2 and p < 0.05 in the independent T-test from at least one of the paired-comparison were reported for each sample and transcript/gene. For DEG sets, correlation analysis, hierarchical clustering (Euclidian method, complete linkage), and multidimensional scaling analyses were performed using the heatmap.2 function provided by the R3.4.3 package gplots option for data visualization. Functional classification and gene-set enrichment of the DEGs was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID, tool v6.8 [60,61] based on Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG, and other functional annotation databases including NCBI ( and Uniprot (

RNAseq Data Validation Using Quantitative RT-PCR

To validate the RNAseq data, quantitative RT-PCR (qPCR) was performed using gene specific primers. The primers were designed from 16 differentially expressed genes selected in 12 gene families through RNAseq data (Supplementary Table S6). The One-step qPCR approach [62] was used. The qScriptTM 1-Step SYBR Green qRT-PCR kit (Quanta Biosciences, Beverly, MA, USA) and CFX96 Real Time system (BioRad, Laboratories, Mississauga, ON, Canada) were used to perform qPCR. Each 25 µL PCR reaction contained a total of 100 ng RNA and 300 nM for each primer. PCR cycling conditions included, cDNA synthesis at 50 °C for 10 min, pre-denaturation at 95 °C for 5 min, denaturation using 40 cycles at 95 ° C for 10 s, annealing at 60 °C for 30 s, and holding at 95 °C for 10 s. A melting curve was generated between 60 °C and 95 °C to ensure the specificity of amplification for each gene. For each sample group, three biological replicates with three technical replicates were analyzed. Relative fold change in expression was calculated using 2^-ΔΔCq method [63] and the data was normalized to 18S rRNA (X67238), used as a housekeeping gene.

Relationship between DEGs and EMS-Induced Mutations in SGA Biosynthesis

To establish a potential link between EMS-induced mutations in the low SGA line and its RNA transcriptomic profile and identify putative DEGs associated with SGA biosynthesis, a phylogenetic analysis was performed. A total of 68 protein sequences including 39 KEGG-enriched DEGs putatively involved in SGA biosynthetic genes were selected from the RNAseq DEGs dataset and used as input sequences along with 28 known SGA biosynthetic genes belonging to gene families such as glycosyltransferases, cytochromes, amino acyl transferases, transcription factors, sterol biosynthesis gene families downloaded from NCBI [37,38,64–66] (Supplementary Table S7). The 68 protein sequences were aligned using ClustalW and a phylogenetic tree was constructed using Neighbor-Join and BioNJ algorithm using 5,000 bootstrap replicates [67]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. Evolutionary analyses were conducted using MEGA6 [68].


Detection of EMS-Induced Mutations through Pre-Screening Amplicon Sequencing of First 4 Target Genes

After pre-screening the 1,750 EMS-treated and 140 non-treated lines by amplicon resequencing of SGT1, SGT2, AS1 and AS2, a subpopulation of 839 EMS-treated lines was selected from 57 positive pools and used as the core germplasm collection in multiple screenings. A selection was further carried out within this core EMS population and it was reduced to 364 EMS-treated lines for the detection of individual mutant lines. A total of 71 EMS-induced mutations were detected in 166 individual EMS-treated lines. Of these 71 mutations, 10 were located in introns, whereas 61 appeared to be located in coding regions (exons) of the 4 target genes, occurring in 133 EMS-treated lines, with 42 functional mutations found in 98 of the 133 lines (Table 1). A total of 8, 10, 15 and 9 functional mutations were observed in SGT1, SGT2, AS1 and AS2 genes, respectively. These 98 lines included 7 lines carrying mutations in both SGTs and ASs, with 5 plants having mutations in AS2 and SGT2, 1 plant carrying mutations in AS2 and SGT1, and 1plant with mutation in AS1 and SGT1. These 98 lines were further considered in downstream analysis. The mutation frequency in the exonic regions of the 4 targets genes ranged between 1/43 kb and 1/436 kb with individual mutation frequencies of 1/43, 1/45, 1/228, and 1/436 kb, for SGT2 (16 SNPs), AS2 (16 SNPs), SGT1 (11 SNPs), and AS1 (18 SNPs), respectively. Of the 22 SNPs positions assayed by KASP, twelve of the 22 targeted mutations were thus validated in 36 of the 49 lines genotyped. Interestingly, 7 of the 10 targeted mutations in SGT1 and SGT2 were validated in 19 lines tested but the other 10 from 13 M2 families were not (Supplementary Table S8). Altogether, the calculated average mutation frequency was 1/75 kb across the 4 genes in the 1750 EMS lines screened.

Table 1. Number of EMS-induced mutations identified in the exonic regions of SGT1, SGT2, AS1 and AS2 genes from 133 EMS-treated lines following the targeted amplicon sequencing.
Effect of EMS Mutations on Tuber SGA Production

To determine the genotype to phenotype relationship and assess whether the induced mutations led to altered SGA production in the detected EMS mutant lines, SGA content was assessed in 246 EMS-treated and 21 non-treated lines along with 3 commercial potato varieties (Figure 1A–G). SGA content in the tuber of commercial varieties ranged from 58 mg/100 g DW (Yukon Gold) to 71 mg/100 g DW (Russet Burbank) (Figure 1A–G; Supplementary Figure S3), whereas it ranged in the non-treated controls from 73 mg/100 g DW (CTL5_587) (Figure 1E) to 239 mg/100 g DW (CTL3_353) (Figure 1C). The highest tuber SGA variation was observed in EMS-treated lines, ranging from 23 mg/100 g DW (EMS4_403) (Figure 1D)

Figure 1. Variations of total SGA content in EMS-treated and non-treated diploid potato lines in comparison with three commercial potato varieties. Green bars, commercial potato varieties; Blue bars, EMS-treated diploid potato lines; Red bars, non-treated diploid potato controls derived from 7 crosses (AG, respectively). The baseline was set at the lowest non-treated control for comparison with EMS-treated and commercial lines..

to 301 mg/100 g DW (EMS5_512) (Figure 1E). Forty five (45) of the 246 EMS-treated lines (18%) showed a lower total SGA content compared with their respective non-treated controls, whereas 34 (14%) had lower or comparable SGA level with the commercial varieties tested (Figure 1A–G). Taken together, 34 lines (14%) of the EMS-treated lines revealed a lower total SGA content than their respective non-treated controls and had lower or comparable levels with the commercial varieties evaluated. The mutant lines with low SGA profiles than commercial varieties included 2 lines in EMS1, 6 lines in EMS2, 1 line in EMS3, 13 lines in EMS4, 2 lines in EMS5, and 6 in EMS7 (Figure 1A–G). Although a mutant line (EMS6_662) showed a lower SGA level compared to its non-treated controls, the SGA level was higher than those found in the commercial varieties (Figure 1F). Chaconine to solanine ratio ranged from 2.8 to 3.4 in the commercial varieties, 1.5 to 2.6 in the non-treated controls, and 1.0 to 3.4 in the EMS-treated lines. The chaconine to solanine ratio was the highest (3.4x) in line EMS7_749, whereas it was the lowest (1.0×) in line EMS6_653.

Validation of Induced Mutations in SGTs and SGA Biosynthetic Pathway Extended Coverage by Ampliseq Gene Panel Sequencing

To further confirm the EMS-induced mutations observed in SGTs and to increase the coverage of the SGA biosynthesis pathway, an Ampliseq panel sequencing of 9 target genes reported to be involved in the core SGA biosynthetic pathway was performed in 87 EMS-treated and 15 non-treated lines. For each of the 9 target genes sequenced, a 98% base alignment was achieved, with an average coverage depth of 18,500×. A total of 657 EMS-induced SNPs with an allelic frequency of ≥ 50% were observed in 80 of the 87 EMS-treated lines. Of these mutations, 131 mutations were exonic and detected in 67 of the 87 lines subjected to Ampliseq panel sequencing. Among the 131 exonic mutations, 55 were missense and 6 were nonsense mutations, for a total of 61 mutations with predictable functional effects, and were found in 56 EMS-treated lines (Tables 2 and 3). Of the 18 functional mutations detected in SGT1 and SGT2 through the Miseq amplicon sequencing procedure, 13 were confirmed by the Ampliseq panel screening. A high proportion (78%) of heterozygous mutations were observed and 28% of the mutations were predicted to have medium to high mutational effects (Table 3). The results also showed that 3 of the 6 observed nonsense mutations were from EMS-treated lines in cross 4. Nonsense (stop codon) mutations were detected in 4 of the 9 target genes namely SGT2, GAME4CH6, GAME4CH12 and GAME12 (Tables 2 and 3, Figure 2). The mutation frequency in the exonic regions of the targeted genes was the highest in SGT2 (1/16 Kb) and the lowest in GAME7 (1/341 Kb) (Table 2). The average mutation frequency was determined to be 1/47 Kb.

Table 2. Summary of EMS-induced mutations identified in the exons of 9 targeted SGA biosynthetic genes from 87 mutagenized lines.
Table 3. Details of 50 unique SNP positions leading to 61 EMS-induced functional mutations detected across 9 target SGA genes, observed in exonic regions of 56 mutagenized lines.
Figure 2. EMS-induced functional mutations detected using Ampliseq gene panel sequencing of 9 SGA biosynthetic genes including SGT1, SGT2, SGT3, GAME4CHR6, GAME4CHR12, GAME6, GAME7, GAME11 and GAME12. All 60 functional mutations (54 missense and 6 nonsense) are shown. Note that 4, 21, 6, 1, 13, 1, 4 and 10 mutations are listed for SGT1, SGT2, SGT3, GAME4CHR6, GAME4CHR12, GAME6, GAME7, GAME11 and GAME12 respectively. UTRs are indicated by blank boxes; exons by solid black boxes; introns by downward lines; protein conserved domains by red boxes. Green arrows, missense mutations; red arrows, nonsense mutations; plus (+), homozygous mutations; minus (−), heterozygous mutations; asterisks (*), homozygous and heterozygous missense mutations. Numbers are indicated in bp.

Among the EMS-treated lines showing low SGA content, EMS4_403 (23 mg/100 g DW) carried missense mutations in SGT3 and GAME7. In EMS4_898 (71 mg/100 g DW), three positions were found to be mutated in GAME7 only, whereas frameshift mutations leading to a stop codon were identified in SGT2 and GAME4CH6 in line EMS4_901 (50.4 mg/100 g DW) (Supplementary Tables S9 and S10). GAME4CH12 showed missense mutations at multiple positions, including positions 642, 648, 710, 2960, 5266, and as many as 12 lines (EMS4_892, EMS4_446, EMS4_467, EMS4_906, EMS4_917, EMS7_690, EMS7_752, EMS7_760, EMS7_778, EMS7_795, EMS7_805, EMS7_831) carried mutations at one or more of these positions and displayed low SGA phenotypes (Supplementary Table S9; Figure 1D,G). In particular, lines EMS4_892 showed missense mutations predicted with medium to low mutational effect at positions 642, 648, 710, and 5266 (Supplementary Table S9, Table 1) and displayed a low SGA content in its tuber (Figure 1D). In total, 9 EMS-treated lines including 5 lines in cross 4 (EMS4_892, EMS4_446, EMS4_467, EMS4_906, EMS4_917) and 4 lines cross 7 (EMS7_690, EMS7_778, EMS7_795, EMS7_805) carried mutations in GAME4CH12 and produced low SGA content compared to the non-treated controls (Figure 1D,G). Moreover, nonsense and missense mutations were observed in GAME12 of 8 low SGA lines including EMS1_861 (73 mg/100 g DW), EMS1_47 (80 mg/100 g DW), EMS7_690 (51 mg/100 g DW) and EMS7_805 (68 mg/100 g DW) (Supplementary Table S9). Interestingly, the low SGA lines EMS7_725 (49 mg/100 g DW) carried mutations in SGT3 and GAME6. Similar to GAME6 with the low SGA line EMS7_725, GAME11 was found to be altered alongside with SGT2 in the low SGA line EMS7_827 (66 mg/100 g DW) (Supplementary Table S10).

Table 4. Summary statistics of transcriptomic sequencing in EMS4_901 and CTL4_492 lines.
EMS-Induced Mutation Leads to Contrasted Transcriptome in the Wild Type and Mutant Lines

Transcriptome sequencing was successfully conducted on the three biological replicates in each of the low SGA EMS mutant (EMS4_901) and the high SGA control (CTL4_492) lines. Mapped reads in the triplicated samples ranged from 16 to 17 million in the control line and from 16 to 18 millions in the EMS lines, with an average of 89% and 88% overall mapped reads, respectively (Table 4).

Following the mapping, transcript assembly, data transformation and quantile normalization, a total of 21,855 processed expressed genes were observed in at least one replicate of the two lines (Supplementary Table S11). Further statistical analyses of this dataset revealed a total of 1096 significant and differentially expressed genes between the two lines across all replicates, and of which 456 and 640 genes were upregulated and downregulated, respectively in EMS4_901 when compared to CTL4_492 (Supplementary Table S12). Fold change of the upregulated genes in EMS4_901 ranged between 2.0 to 36.4 whereas that of the downregulated genes ranged from 2.0 to 3194 in the same line, in comparison with CTL4_492. A consistent and contrasting gene expression profile was observed between the low SGA mutant (EMS4_901) and the high SGA control (CTL4_492) lines and the altered genes included UDP-glycosyltransferase 74F2-like (XP_006351615.1), UDP-glucose flavonoid 3-O-glucosyltransferase 6-like (XP_006353724.1), UDP-glucuronate 4-epimerase 6-like (XP_006349916.1), cytochrome P450 76A1 (XP_006341183.1), cytochrome P450 83B1-like (XP_006358770.1), flavonoid 3',5'-methyltransferase-like (XP_006363473.1), methyltransferase-like protein 2 (XP_006367297.1) and aspartate aminotransferase (XP_006343100.1) (Figure 3). Interestingly, genes sharing similar conserved domains with SGT or GAME genes were found significantly downregulated in EMS4_901 line compared to CTL4_492 (Table 5). In particular, two isoforms of cytochrome p450 82C4-like (XP_006352674.1; XP_015166509.1) and a cytochrome P450 83B1-like (XP_006358770.1) gene sharing 40–45% sequence similarity and same conserved domain as GAME4CH6, GAME4CH12, GAME6, and GAME7 were found to be significantly downregulated in EMS4_901 compared to CTL4_482 (Table 5). The UDP-glycosyltransferase 74F2-like (XP_006351615.1) was 45% similar to SGT2, whereas glutamine-fructose-6-phosphate aminotransferase 2 (XP_006345399.1) was weakly (15%) similar to GAME12 gene. All these isoforms were also found to be downregulated in EMS4_901 (Table 5).

Figure 3. Heatmap showing the hierarchical clustering of 1096 differentially expressed genes in the mutant (EMS4_901) low and the control (CTL4_492) high SGA producing potato lines. Complete linkage and Euclidian distance were used as a measure of similarity to display expression patterns of DEGs with absolute fold (|fc|) ≥ 2 and T-test p-value < 0.05. Each row corresponds to a gene and each column corresponds to an individual replicate for each cultivar. The mutant line EMS4_901 and the control line CTL4_492 are color-coded in green and red, respectively. The normalized expression z-score scale is indicated by the color key in the top left corner. Blue, down-regulated genes; Yellow, up-regulated genes.
Table 5. Selected DEGs showing similar conserved domain with SGTs and GAME genes and found downregulated in EMS4_901 compared to CTL4_892.

Using FPKM values ≥ 1 from the transcriptomes data, the proportion of known transcripts, known genes, novel transcripts and alternatively spliced variants were found to be higher in EMS4_901 than in CTL4_492 by 7%, 19%, 31%, and 26%, respectively. Furthermore, a total of 4,972 known transcripts, 12,194 known genes, 305 novel transcripts and 1,407 novel alternative splicing transcripts were found to be shared between CTL4_492 and EMS4_90 (Figure 4).

Figure 4. Venn diagram showing the differential number of transcripts observed between the high (CTL4_492) and the low (EMS4_901) SGA producing diploid potato lines. A, known transcripts; B, known genes; C, novel transcripts; D, novel alternative splicing transcripts. The high SGA line (CTL4_492) and le mutant low SGA (EMS4_901) are represented by the blue and the red circles, respectively.
Gene Ontology Classification and KEGG Pathway Analysis of DEGs

To identify GO categories, classify pathways, and assign biological functions associated with the 1096 sets of differentially expressed genes across all six replicates, GO and KEGG pathway enrichment analysis were performed using DAVID tools. The GO enrichment analysis showed 71 DEGs classified into 11 GO terms, with 5, 2, and 4 GO terms associated with biological processes (15 DEGs), molecular functions (16 DEGs), and cellular components (40 DEGs), respectively (Figure 5). KEGG pathway analysis assigned 287 DEGs with high confidence (Bonferroni corrected P < 0.05 and FDR < 0.05) to 20 KEGG pathways, and genes involved in metabolic pathways and biosynthesis of secondary metabolites accounted for 44% of the 287 DEGs (Figure 6). Of interest was the finding of significant enrichment of DEGs in various secondary metabolite biosynthesis pathways including phenylpropanoid biosynthesis (p < 0.001), isoquinoline alkaloid biosynthesis (p < 0.01) and tropane, piperidine and pyridine alkaloid biosynthesis (p < 0.05) (Figure 7). DEGs corresponding to UDP-glycosyltransferases (XP_006367260.2 and XP_006346605.1) were found to be enriched under the biosynthesis of secondary metabolites (sot01110) pathways (Supplementary Table S13).

Figure 5. GO term classification of DEGs between the between the high (CTL4_492) and the low (EMS4_901) SGA producing diploid potato lines using David Tool. Bar plots represent the number of DEGs enrichment in each GO term gene in each biological process, molecular function and cellular component.
Figure 6. KEGG pathway enrichment of DEGs between the low (EMS4_901) and the high (CTL4_492) SGA producing diploid potato lines. Only the top 20 annotated pathways (FDR < 0.05) are shown.
Figure 7. Gene-set enrichment analysis showing significantly enriched KEGG pathways in the low (EMS4_901) and the high (CTL4_492) SGA producing diploid potato lines. An emphasis was put on secondary metabolite biosynthesis pathway. Color-codes indicate levels of enrichment significance as shown in the top left corner as expressed by the expression ratio of EMS4_901/CTL4_492.
RNAseq Data Validation Using Quantitative RT-PCR

To validate the RNASeq data, qPCR targeting 16 genes showed consistent gene expression pattern between the RNAseq profiling and qPCR analyses for 15 of the 16 genes tested (Figure 8; Table 6). The expression of beta-1,3- galactosyltransferase 7-like was the only gene found to be upregulated in EMS4_901 in the qPCR, whereas it was downregulated from the RNseq profiling. Transcription levels were found to be very low in CTL4_492 for UDP-glucuronate 4-epimerase 6-like, probable calcium-binding protein CML-45 and 8-hydroxygeraniol dehydrogenase –like when compared to EMS4_901 (Figure 8).

Figure 8. Quantitative RT-PCR expression patterns for sixteen genes randomly selected from the RNAseq data for validation. The gene expression data are expressed as relative expression from three independent replicates and normalized to the 18S RNA expression. Error bars indicate the standard deviation of means (n = 3).
Table 6. RNAseq gene expression validation using qPCR. Fold change expression obtained by qPCR is shown along with the expression pattern observed in the RNAseq profiling.
Figure 9. Phylogenetic relationship between KEGG-enriched DEGs and known SGA biosynthesis genes. Maximum Likelihood method based on the JTT matrix-based model [1] was used to build the tree. Boostrap values are indicated on the tree. GT, glycosyltransferase, CYP, cytochrome P450, MT, methyltransferase, AT, Amino tranferase, ERF, ethylene responsive factor. Gene IDs followed by number 1 or 2 and letter D or U indicate that 1 or 2 transcripts of the genes were down-regulated (D) or up-regulated (U).
Linking Transcriptome Data to Loss-of-function in SGA Biosynthetic Genes

To link EMS-induced mutations in the low SGA line with its RNA transcriptomic profile and identify putative DEGs associated with SGA biosynthesis, the phylogenetic tree constructed using 39 KEGG-enriched DEGs putatively involved in SGA metabolism and 28 genes known for their roles in SGA biosynthesis showed five main groups representing five gene families, namely glycosyltransferases (GT), methyltransferases (MT), cytochrome P450 (CYP), aminotransferases/transaminases (AT), and ethylene -responsive transcription factor (ERF) (Figure 9). Of the 22 DEGs with glycosyltransferase activity, 14 strongly clustered with the known SGT1, SGT2 and SGT3 under the GT group. The rest of the 8 DEGs with GTs activity were spread into 3 small clusters. Similarly to the GT group, 8 of the 9 DEGs with cytochromes activity clustered with the GAME genes known as cytochromes under the CYP group. All DEGs with amino acyl transferase activity formed a group with the known amino acyl transferase (AT group). The DEGs for methyltransferases and transcription factors for sterol biosynthesis were grouped with known sterol biosynthesis genes under MT and ERF groups, respectively (Figure 9). Interestingly, it was observed that 11 of the 22 DEGs in the GT group, 2 DEGs in the CYP group, 3 DEGs in the MT group, and the single ERF were found to be down-regulated in the RNAseq transcriptomic analysis.


Anti-nutritional factors including SGAs have always been of great concerns for potato breeders, retailors, processors, and consumers [7,9,69]. To address this issue, different approaches including traditional breeding [70,71], RNAi [24,72], TALENs [31], and transgenic [39] approaches have been attempted. Whereas the traditional breeding methods are more sustainable, they are also time-consuming and less efficient [73]. TALENs and RNAi are effective but costly, demand technical expertise, and generate limited number of germplasm with potential off-target effects [74–77], and the transgenic approaches have always been of concern to public due to their inherent nature [78,79]. In the current study, a diploid potato germplasm panel generated by a conventional mutagenesis approach using EMS mutagenesis [44] was characterized at the genomic and phenotypic levels for the identification of mutant lines altered in SGA production. Using next generation sequencing (NGS) of SGA biosynthesis genes and SGA profiling, we show that mutation of GAME4CH6 or SGT3 is sufficient to alter SGA profiles. To our knowledge, the study is unique in its approach by screening a large mutant diploid potato population and identifying mutant lines with altered SGA profiles.

The generation of mutant alleles in potato using EMS mutagenesis has been scarce and is relatively new [44,80]. The use of the chemical mutagenesis approach in plant reverse genetics combined with next generation sequencing allow a high throughput screening capacity of germplasm [81,82]. In the current report, an F1 diploid mutant potato population was screened by Reverse genetics using 9 SGA biosynthesis genes and, as expected, mostly G to A or C to T transition mutations were observed in all targeted genes as previously reported [81]. The calculated mutation frequency in the targeted genes ranged from 1/16 Kb to 1/341 Kb, ranges that fall within those recently reported in flax [48,83], rapeseed [84,85] and tomato [86] treated with different EMS doses. A total of 20% of the EMS-induced SNP variations were located in exons, of which 46% were functional and 22% of these functional exonic mutations were homozygous, thus having potential to cause loss-of- [87] or gain-of-function effects [88]. In fact, although DNA mutations can alter all amino acids, not all substitutions have the same mutational effects on gene functionalities [89]. Hence, depending on the amino acids, homozygous missense mutations with low or medium mutational effects can be either neutral with no effect or hypermorphic, in which case the mutation may cause an increased contribution of the mutated gene to the original function [48,90]. The high heterozygosity (78%) of the functional mutations observed here is in agreement with the F1 nature and the clonally propagated option used for the screened plants. The wide variety of mutations detected in this study, including nonsense mutations via deletions and frameshift, missense mutations via transitions, insertions and deletions demonstrates the effectiveness of EMS-mutagenesis in developing genetically diverse germplasm with novel allelic variants. In the current study, functional mutations were found in almost all key SGA biosynthetic genes as also likely occurred in other metabolic pathways. This finding highlights the high potential of this population as a premium genetic resource for potato breeding programs and for the gene functional characterization in potato by plant biologists.

By phenotyping a set of EMS-treated clones, a wide range of SGA patterns were observed in tuber periderm, and 14% of the EMS-treated lines showed SGA contents lower or equal to commercial varieties whereas 21% of the lines showed lower SGA content than the safety limit established by USDA [11]. SGA content in the commercial potato cultivars was shown to vary from 58 to 71 mg/100 g DW, which is lower than the threshold (100 mg/100 g DW) set for a safe cultivar release, in agreement with the previously reported tuber periderm SGA content [15,22]. Since all plant materials used in the current study were grown, processed and analyzed under the same conditions throughout the study, the observed SGA values most likely reflect the actual SGA phenotype of each line, especially when a highly sensitive method such as UPLC-MS was used. Given the higher toxicity of chaconine compared to solanine, the ratio of chaconine to solanine is very important in potato tubers [91]. The chaconine to solanine ratio was found to be 1:1 in some EMS-treated lines as opposed to the previously reported ratio ranging from 1.2 to 2.6 [22], albeit their total SGA content still remained higher, suggesting a metabolic flow rechanneling in one or the other route after the disruption of one the two SGTs (SGT1, SGT2). Interestingly, 4 mutant lines showed SGA levels as low as 23 to 50 mg/100 g DW, some of which carried heterozygous missense and frame shift mutations in SGT3, GAME4CH6, GAME4CH12, and/or GAME6. Similar observations were previously reported after EMS mutagenesis for metabolic pathways in flax [48], egg plants [92], peppers [93] and tomato [94] for metabolites ranging from SDG lignan to anthocyanin and carotenoid biosynthesis.

By associating genotype to SGA phenotype, potato lines with altered SGA content revealed the presence of mutations in key SGA biosynthetic genes. As such, mutations in GAME6 and GAME11 were found alongside with SGT3 and SGT2, respectively, and led to low SGA levels in EMS7_725 (49 mg/100 g DW) and EMS7_827 (66 mg/100 g DW). GAME6 oxidizes the 22,26-dihydroxycholesterol at C22 whereas GAME11 hydroxylates at C16 leading to the 16, 22, 26-trihydroxycholestreol [37]. The impairment of GAME6 and GAME11, even at a heterozygous level, seems to have a critical impact on SGA production in these two lines despites a potential roles for SGT2 and SGT3 mutations. Likewise, some of the EMS-treated lines (EMS7_690, EMS7_805) carried heterozygous missense mutations in GAME12 and showed low SGA content (51 mg/100 g DW, 68 mg/100 g DW, respectively), whereas some other EMS-treated lines (EMS3_288) showed high SGA content (235 mg/100 g DW) despite carrying heterozygous missense mutations in GAME12, a gene encoding for the enzyme that incorporates nitrogen into the steroidal alkaloid aglycone intermediate backbone [34]. It is known that the transamination activity is critical for glycoalkaloid production [32,34,95]. Here, the low SGA lines carrying mutations in GAME12 also showed mutations in GAME4CH12 and SGT2 (for EMS7_690) and in GAME6, GAME4CH12, and SGT1 (for EMS7_805). We therefore speculate that the heterozygous mutation present in GAME12 may have not altered its function enough to significantly block the pathway and that the alternate allelic variant within the heterozygous background seems to be functional enough for proceeding with the production of 26-amino-furostanol prior to its glycosylation, as it could be seen for line EMS3-288 carrying mutation only in GAME12. Alternatively, a yet unknown transaminase may have also contributed to the transamination process along with the heterozygous allele of GAME12 (Figure 10). Hence, we conclude that the low SGA phenotype observed in EMS7_690 (51 mg/100 g DW), and in EMS7_805 (68 mg/100 g DW) may be associated only with the effects of heterozygous mutations in more than one of GAME6 and GAME4CH12 genes. Contrary to GAME12, the missense heterozygous mutation in SGT3 and GAME7 on one hand (EMS4_403; 23 mg/100 g DW) and the frameshift heterozygous mutations leading to stop codon in GAME4CH6 and SGT2 on the other hand (EMS4_901; 50.4 mg/100 g DW), led both to low SGA phenotypes. SGT3 adds rhamnose units to both solanine and chaconine [96] and is therefore critical at this glycosylation step, whereas GAME7 hydroxylates cholesterol into 22-hydroxycholesterol in the early step of SGA biosynthesis. Taken together, heterozygous mutations of both genes may have led to the low SGA phenotype observed in EMS4_403, a plant producing very small tubers with poor yield (not shown). GAME4a (CYP88B) [66], oxidases furostanol-type saponin aglycone into furostanol-26-aldehyde that serves as a substrate for the transamination by GAME12 [34]. The frameshift mutations identified in SGT2 and GAME4CH6 and observed in EMS4_901 (50.4 mg/100 g DW) were located upstream to the PSPG and CYP450 conserved active domains of these two genes, respectively. This observation supports and increases the likelihood for an alteration in the gene function by these frameshift mutations. But, uncertainties around the mutation of which of the two genes (SGT2 or GAME4CH6) that may have led to the low SGA level in this EMS4_901 line can arise. It is well known that mutation of either SGT1 or SGT2 does not alter the total SGA level [36] and that RNAi-mediated silencing of GAME4 gene resulted in SGA reduction in both potato leaves and tubers [34,66]. It is thus reasonable to assume that the low SGAs observed in the line EMS4_901 is mostly attributable to the frameshift mutation in GAME4CH6. However, the roles for mutations in other genes including uncharacterized transcription factors regulating SGA biosynthetic genes can not be ruled out, and only a formal complementation test in a wild type would provide a definite proof to this assumption. Nonetheless, we conclude that mutations of GAME4CH6 or SGT3 are critical for SGA production (Figure 10) and that the EMS4_901 mutant line thus becomes an ideal candidate for molecular and phenotypic comparative analysis between low and high SGA producing lines. As such, by comparing EMS4_901 with its non-treated control CTL4_492, a high SGA producing line, altered and contrasting transcriptomic patterns were observed between the two lines and downregulated metabolic pathways were observed in the mutant line compared to the wild type. These data were validated by qPCR. Importantly, DEGs pertinent to SGA biosynthesis were found to form tight clusters with the known SGT1, SGT2 and SGT3, cytochromes, GAME genes or transcription factors playing roles in SGA biosynthesis, linking transcriptomic DEGs and SGA biosynthetic genes. Moreover many of these gene transcripts were found to be downregulated in the mutant line, providing further proof of the mutant status and altered metabolism of this EMS line.

Taken together, this study is the first to demonstrate that EMS-mutagenesis can be successfully used to alter the SGA biosynthetic pathway and the content of SGA in the tubers of diploid potatoes. It also paves the way for further functional studies such as complementation tests or gene editing. Given the high level of heterozygous mutant alleles observed and the clonal propagation process, most of these mutations can be maintained in the somaclones. Nonetheless, self-pollination, whenever possible, will be required to fix the heterozygous mutations as homozygous for the true potato seed production and for developing in-bred cultivars as performed in corn. However, this will require solving first the self-incompatibility issues frequently encountered in wild diploid potatoes [41] and a full genome sequencing will help uncovering all potential mutations in the genome, including the promoter regions toward a gene regulatory network analysis. The current study focused only on the SGA biosynthetic pathway. But, many other biological processes may have also been altered within the population as shown by the wide morphological variations already reported [44], making this mutant population a valuable genetic resource for potato biologists and breeders for the functional characterization of biological processes such as light-induced greening or self-compatibility toward mutant diploid potato cultivar development.

Figure 10. EMS-induced alteration of SGA biosynthesis genes leading to α-solanine and α-chaconine from cholesterol in potato (According to Itkin et al., 2013 [34,97] and Cardenas et al., 2016). Red “X” marks indicate the steps/genes showing EMS-induced frameshift mutations in GAME4 and SGT2 genes and missense mutation in SGT3 gene of the low SGA producing EMS4_901 and EMS4_403 lines.


Supplementary File (zip, 4.7MB)


All relevant data are within the manuscript and the supporting Supplementary files provided. The authors confirm that all raw variant call data underlying the findings will be made fully available without restriction upon request to any qualified researcher. The raw RNAseq dataset can be found in the GenBank Short Read Archive (SRA) database under the Bioproject ID# PRJNA533334.


BF: Conception, coordination, design, experiments, data analysis, interpretation and writing of the manuscript; AS and KG: Performed experiments, data analysis, interpretation, drafting and revision of the manuscript; JM: UPLC and mass spectrometry data acquisition and analysis of SGA extracts, revision of manuscript; DM and AS: EMS population nursery and tuber processing; BB: provided original plant materials, coordination, and revision of the manuscript;. All authors read, commented and approved the manuscript.


All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


This study was conducted as part of the AAFC base project 1320 funded to BF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


The authors thank Rick Doucet, Leanne Carter and all summer students at the Charlottetown Research and Development Centre, Charlottetown, PEI, for their help during the course of this project.



































































































How to Cite This Article

Somalraju A, Ghose K, Main D, McCallum J, Bizimungu B, Fofana B. Altered Steroidal Glycoalkaloid (SGA) Biosynthesis in Diploid Potatoes as Induced by EMS-Mutagenesis. Crop Breed Genet Genom. 2020;2(4):e200017.

Copyright © 2020 Hapres Co., Ltd. Privacy Policy | Terms and Conditions