Application of Genomic Big Data in Plant Breeding: Past, Present, and Future

plants Review Application of Genomic Big Data in Plant Breeding: Past, Present, and Future 1, 2, 2,3, Kyung Do Kim y , Yuna Kang y and Changsoo Kim * 1 Department of Bioscience and Bioinformatics, Myongji University, Yongin 17058, Korea; [email protected] 2 Department of Crop Science, Chungnam National University, Daejeon 34134, Korea; [email protected] 3 Department of Smart Agriculture Systems, Chungnam National University, Daejeon 34134, Korea * Correspondence: [email protected] These authors contributed equally to this work. y Received: 28 September 2020; Accepted: 26 October 2020; Published: 28 October 2020 Abstract: Plant breeding has a long history of developing new varieties that have ensured the food security of the human population. During this long journey together with humanity, plant breeders have successfully integrated the latest innovations in science and technologies to accelerate the increase in crop production and quality. For the past two decades, since the completion of human genome sequencing, genomic tools and sequencing technologies have advanced remarkably, and adopting these innovations has enabled us to cost down and/or speed up the plant breeding process. Currently, with the growing mass of genomic data and digitalized biological data, interdisciplinary approaches using new technologies could lead to a new paradigm of plant breeding. In this review, we summarize the overall history and advances of plant breeding, which have been aided by plant genomic research. We highlight the key advances in the field of plant genomics that have impacted plant breeding over the past decades and introduce the current status of innovative approaches such as genomic selection, which could overcome limitations of conventional breeding and enhance the rate of genetic gain. Keywords: next-generation sequencing; single nucleotide polymorphism; marker-trait association; genomic selection; marker-assisted selection 1. Introduction The history of plant breeding might have begun when humans changed from hunter gatherers to farmers, otherwise known as an agricultural society. The key concept of early plant breeding was “crop domestication”, which implies the adaptation of wild plant species, so that humans could sustainably cultivate food plants to feed an ever-increasing population. Researchers estimate crop domestication dates back 9000 to 11,000 years. Since then, humans have bred plants using selection schemes according to the decisions of farmers. In addition, humans have historically traded crop plants from other countries for hundreds of years, which has diversified the agricultural environment and facilitated plant breeding [1]. The concept of scientific breeding started in the 19th century, triggered by Gregor Mendel’s research with garden peas, for which his plant hybridization led to his laws of inheritance. At the time of publishing his research paper, “Experiments on Plant Hybridization”, his work failed to get the attention of other scientists because biology was always interpreted by quantitative matters not by qualitative approaches [2]. However, his work was re-discovered by William Bateson in 1900. After integration with the Boveri–Sutton chromosome theory, scientists finally conceded his experiment was “the core of classical genetics”. Consequently, plant breeding faced a new era by the advances of Plants 2020, 9, 1454; doi:10.3390/plants9111454 www.mdpi.com/journal/plants Plants 2020, 9, 1454 2 of 24 Plants 2020, 9, x FOR PEER REVIEW 2 of 25 modern genetics with a plethora of newly generated breeding populations in many crops. In other words,advances the full-scaleof modern application genetics with of transmissiona plethora of genetics newly generated to plant breeding breeding hadpopulations begun at in that many point (e.g.,crops. introgression In other words, of useful the full-scale traits in application other accessions). of transmission However, genetics at the to plant beginning, breeding plant had breeding begun wasat that only point performed (e.g., introgression by phenotype of useful evaluations, traits in meaningother accessions). that a breeder’s However, ability at the tobeginning, select desirable plant individualsbreeding was was only very performed critical. Soon by phenotype after this, evaluations, the breeding meaning environment that a dramaticallybreeder’s ability changed to select ever sincedesirable the Maxam–Gilbert individuals was sequencing very critical. method Soon wasafter publishedthis, the breeding in the late environment 1970s. The dramatically method was thechanged first to ever provide since nucleotidethe Maxam–Gilbert sequence sequencing information method that couldwas published directly bein usedthe late for 1970s. developing The molecularmethod was markers, the first which to provide are very nucleotide objective andsequen havece ainformation capability forthat early could stage directly selection. be used After for the applicationdeveloping of molecular this method markers, in plant which genetics, are firstvery generationobjective and (Sanger have sequencing), a capability secondfor early generation stage selection. After the application of this method in plant genetics, first generation (Sanger sequencing), (next-generation sequencing technology mostly referring to Illumina chemistry), and third generation second generation (next-generation sequencing technology mostly referring to Illumina chemistry), sequencing technologies (Pacific BioScience or Oxford Nanopore technologies) have emerged and and third generation sequencing technologies (Pacific BioScience or Oxford Nanopore technologies) had significant impacts on plant breeding in a variety of ways [3]. For example, researchers started have emerged and had significant impacts on plant breeding in a variety of ways [3]. For example, to utilize nucleotide information by applying molecular marker systems such as single nucleotide researchers started to utilize nucleotide information by applying molecular marker systems such as polymorphisms (SNPs), which are theoretically infinite in terms of numbers in plant genomes, enabling single nucleotide polymorphisms (SNPs), which are theoretically infinite in terms of numbers in plant expanded marker-assisted selection (MAS) or genomic selection (GS) [4]. Additionally, having the genomes, enabling expanded marker-assisted selection (MAS) or genomic selection (GS) [4]. wholeAdditionally, genome sequencehaving the information whole genome is very sequence powerful information in plant breeding is very powerful because the in locationsplant breeding of traits andbecause their underlyingthe locations genes of traits or chromosomes and their underlyi couldng be genes helpful or chromosomes for the introgression could be of helpful interesting for the traits tointrogression other accessions. of interesting In addition, traits transcriptome to other a sequencingccessions. In (RNAseq) addition, has transcriptome revealed gene sequencing expression changes(RNAseq) under has variousrevealed abiotic gene orexpression biotic stress changes conditions, under various leading abiotic to profiles or bi ofotic genes stress associated conditions, with manyleading agricultural to profiles traits of genes (Figure associated1). with many agricultural traits (Figure 1). FigureFigure 1.1. Key technological milestones milestones in in plant plant breeding. breeding. InIn lectures lectures onon “Plant Genetics” Genetics” atat some some universities, universities, professors professors always always define define genomics genomics as “the as “thestudy study of ofgenetics genetics with with massive massive nucleotide nucleotide sequen sequencece information”. information”. In Infact, fact, plant plant breeding breeding based based onon geneticgenetic tools tools is tois identifyto identify genetic gene variantstic variants of interest of interest that can that be associatedcan be associated with phenotypic with phenotypic differences. Indifferences. this respect, In genomicsthis respect, provides genomics breeders provides and breeders geneticists and with geneticists new opportunities with new opportunities in that modern in molecularthat modern genetics molecular have ogeneticsffered ahave relatively offered limited a relatively number limited of genetic number variations of genetic whereas variations whole genomewhereas information whole genome can theoreticallyinformation providecan theoretica an unlimitedlly provide number an unlimited of genetic number variations of genetic based on SNPs.variations Thus, based the power on SNPs. of the Thus, whole the genome power referenceof the whole becomes genome a critical reference tool becomes for this purpose.a critical tool As stated, for sequencingthis purpose. technologies As stated, advancesequenci quickly,ng technologies and the advance cost is decreasing. quickly, and The the massive cost is amountdecreasing. of genetic The variantsmassive thanks amount to technologicalof genetic variants advances thanks has to enabled technological breeders advances to find marker-trait has enabled associations breeders to (MTA),find whichmarker-trait exploit associations the genotyping (MTA), of which various exploit populations the genotyping with molecular of various markerspopulations covering with molecular the whole genomemarkers and covering analyze the the whole associations genome and between analyze phenotypic the associations variations between and phenotypic genotypic polymorphisms.variations and Thus,genotypic researchers polymorphisms. may be

Application of Genomic Big Data in Plant Breeding: Past, Present, and Future

Reference Genome Sequence of the Model Plant Setaria

Y Chromosome Dynamics in Drosophila

The Bacteria Genome Pipeline (BAGEP): an Automated, Scalable Workflow for Bacteria Genomes with Snakemake

Lawrence Berkeley National Laboratory Recent Work

Telomere-To-Telomere Assembly of a Complete Human X Chromosome W

Human Genome Reference Program (HGRP)

De Novo Genome Assembly Versus Mapping to a Reference Genome

Practical Guideline for Whole Genome Sequencing Disclosure

Y Chromosome

Reference Genomes and Common File Formats Overview

Informatics and Clinical Genome Sequencing: Opening the Black Box

Using DECIPHER V2.0 to Analyze Big Biological Sequence Data in R by Erik S