"Gene Families: Multigene Families And
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/229531275 Gene Families: Multigene Families and Superfamilies Chapter · March 2008 DOI: 10.1002/9780470015902.a0005126.pub2 CITATIONS READS 4 3,866 1 author: Tomoko Ohta National Institute of Genetics 168 PUBLICATIONS 10,844 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Weak selection on gene expression processes View project All content following this page was uploaded by Tomoko Ohta on 25 September 2018. The user has requested enhancement of the downloaded file. Gene Families: Multigene Advanced article Families and Superfamilies Article Contents . Overview Tomoko Ohta, National Institute of Genetics, Mishima, Japan . Genomic Organization of Gene Families . Forces that Shape Gene Families Multigene families are defined as groups of genes with sequence homology and related doi: 10.1002/9780470015902.a0005126.pub2 overlapping functions, whereas superfamilies are defined as a group of proteins or genes of common origin with nonoverlapping functions. If a group of proteins or genes contains a domain of common origin, it is a superfamily; a protein or a gene may belong to two or more superfamilies. Overview gives some examples of large superfamilies in the human genome. See also: Gene Families In the genomes of higher organisms, there are numerous gene families, from large to small, and from those with uni- Table 1 Examples of large superfamilies in the human form members to those with diverse members. Research on genome multigene families started in the 1970s, arising mainly through a curiosity about how the genetic variability gen- Number of erated in an individual or existing in a population is related Domain proteins to gene redundancy. Smith (1974) and Hood et al. (1975) Developmental and homeostatic regulators studied the evolution of immunoglobulins and showed that Cadherin 100 this and other families are characterized by concerted evo- Transforming growth factor b-like domain 27 lution; they suggested that unequal crossing-over and gene Hemostasis conversion are contributing to immunoglobulin evolution. Sushi domain 53 To understand the evolution and variation of gene families, Immune response therefore, these illegitimate crossing-over events need to be Immunoglobulin domain 381 considered along with mutation, selection and random drift. PI-PY-Rho GTPase signalling Sequence homology among members of a gene family C2 domain 73 mainly depends on the relative rate of occurrences of mu- FYVE zinc-finger 28 tation and illegitimate crossing-over events, and is influ- PH domain 193 enced to some extent by selection and drift (Ohta, 1983). Ras family 126 An important feature of a superfamily is the differ- Src homology 2 domain 87 ent expression patterns of the gene members. By cont- Src homology 3 domain 143 rast, genes belonging to a multigene family are usually Extracellular matrix adhesion under common regulatory control. It will, therefore, EGF-like domain 108 be interesting to clarify the process of how diverse Fibronectin type III domain 106 expression patterns are acquired. In view of the rapid Protein interaction progress in our understanding of genetic regulatory Ank repeat 145 systems, this issue will be a most prominent one in the EF hand 83 coming years. PH domain 193 As shown by the examples given later, a large superfam- WD40 domain 136 ily can contain single genes and several multigene families. Nuclear interaction Formation of these families must have been fundamental in BTB/POZ domain 97 organism evolution. Almost all genes belong to one or Helix–loop–helix DNA-binding domain 60 more gene families, and homology search is popular as the Homeobox domain 160 first step for identifying gene relationships. Phylogenetic KRAB box 204 analyses are useful for understanding relationships of RNA recognition motif 224 member genes of a gene family. Gene trees may clarify the Zinc-finger, C2H2 type 564 history of gene duplication events and can provide infor- Zinc-finger, C3HC4 type 135 mation for identifying orthologues in comparative studies of homologous genes using data from many species. Table 1 Note: Data are taken from Table 18 of Venter et al. (2001). ENCYCLOPEDIA OF LIFE SCIENCES & 2008, John Wiley & Sons, Ltd. www.els.net 1 Gene Families: Multigene Families and Superfamilies Genomic Organization of Gene Genes of immunoglobulins are known to be encoded by var- iable (V), diversity (D), joining (J) and constant (C) segments. Families Several copies of V, D and J segments exist, and the enormous diversity of immunoglobulins is generated by the Clustered gene families combinatorial usage of V, D, J and light and heavy chains. Many important gene families exist as clusters, that is, gene Somatic mutation also contributes to generating diversity. members are repeated in tandem in the genome. They have To have efficient generation of diversity by combinatorial arisen by unequal crossing-over during meiosis or mitosis usage, however, mutant accumulation in evolution is es- of germ cell lineages. The large copy number of members of sential. Detailed analyses of variable region gene families some multigene families, such as ribosomal ribonucleic suggest that natural selection for enhancing diversity at the acid (RNA) or histone families, is due to a need for large crucial regions of antigen recognition has been operating. amounts of gene product. Gene families of the major histocompatibility complex Genes of ribosomal RNA usually form a multigene (MHC) belong to the immunoglobulin superfamily. They family. In many eukaryotes, a repeating unit contains a have attracted much interest because of associated clinical transcribed region and a nontranscribed spacer, and the problems and exceptionally high polymorphisms. The former encodes 18S, 5.8S and 28S ribosomal RNAs. This MHC has been suggested to be evolving dynamically under unit is repeated hundreds of times. This multigene family is various illegitimate recombinations, as alleles have been characterized by concerted evolution such that homology found that differ by a short segment of sequence that has among repeating units is high even though they show been apparently converted by a homologous gene. See also: divergence when compared with the sequences of different Major Histocompatibility Complex (MHC) species (Eickbush and Eickbush, 2007). Detailed analyses by sequence comparison show that In regard to the histone multigene family, four core natural selection that enhances amino acid diversity at the histone genes are usually clustered in higher organisms and antigen recognition site is important for maintaining high this cluster is repeated tens of times. These genes are polymorphisms (Hughes and Nei, 1988). Simultaneously, expressed as a cohort during the S phase of the cell cycle. gene conversions including the antigen recognition site are There are also a few histone genes that are expressed inde- generating useful variability for selection to work. pendently from the cell cycle. These replication-independent Another remarkable example of a gene family with clus- genes may behave like ordinary tissue-specific genes. tered and dispersed members is that of the olfactory re- Some clustered gene families have more diverse ceptor. This receptor is a seven-transmembrane protein functions than the earlier examples. The best-known case and belongs to the large G-protein-coupled receptor is the globin superfamily. The globin superfamily of mam- (GPCR) superfamily. Functional diversity is encoded in mals consists of three families, that is, the a-like family, the the genome without combinatorial usage or somatic mu- b-like family and myoglobin. The first two families are each tation. It is estimated that there are several hundred GPCR encoded by clusters of genes, whereas myoglobin is encoded genes in mammals; these genes are organized in many clus- by a single gene. Each cluster contains the embryonic and ters and one cluster may have diverse members. This family adult-type together with pseudogenes. Many adult genes also contains many pseudogenes. The catfish has a much exist in multiples, for example, two genes, a1anda2, encode smaller number of olfactory receptor genes than mammals, adult human a-haemoglobin. These two genes may be said and the gene family is thought to have expanded in the to form a small multigene family and are characterized by ancestral lineage of mammals. concerted evolution. See also: Gene Duplication: Evolution Dispersed gene families Superfamilies with clustered and dispersed Many of the dispersed gene families are thought to have members been formed by reverse transcription of RNA and subse- quent integration into the genome. The integrated se- Superfamilies often contain both clustered and dispersed quence, or ‘retrosequence’, is derived from the RNA members. The clustered genes usually form multigene families transcript of a gene, and therefore does not contain introns. with overlapping functions, whereas the dispersed ones may It is likely that most retrosequences degenerate and become have more diverse functions. In some cases, however, clus- pseudogenes; however, there are several interesting cases of tered genes may have distinct functions as exemplified by the retrosequences that have retained function or acquired new Hox genes. See also: Hox Genes: Embryonic Development functions (Long et al., 2003). A functional retrosequence is The immunoglobulin superfamily includes both clustered