The plant genome: an evolutionary perspective on structure and function
The genome sequences of a small but rapidly growing number of plant species have been completely or substantially determined, particularly through initiatives like the SOL-100 project (the project of the SOL Genomics Network [SGN] to sequence 100 genomes from Solaneacea; see http://solgenomics.net/organism/sol100/view) and the mission of the Beijing Genomics Institute – Hong Kong (BGI – HK) to sequence 1000 plant and animal genomes (http://www.genomics.cn/en/bgi.php?id=158). Not surprisingly, the majority of the approximately 20 plant species with completely or almost completely analyzed genomic sequences to date are angiosperms. Nonetheless, plant species representing more ancestral lineages are beginning to attract the interest of more researchers either because of their phylogenetic position or because they possess unusual properties that make them more amenable to experimental manipulations. Thus, the sequences of a couple of plant genomes from non-angiospermous species, as well as from several algae, are already available.
What scientific issues does the availability of a complete sequence of a genome allow us to address that we could not satisfactorily address before? Some of the obvious benefits of having such information include being able to determine the number of genes in a given genome, their structure, and their organization on the chromosome, and, assuming the transcriptome sequence is available too, to design experiments to globally monitor gene transcription patterns and to study the mechanisms controlling such patterns of expression. Data collected in such analyses constitute a crucial basis for identifying the set of genes involved in a specific process and the cellular and biochemical functions of such genes.
The approaches mentioned above can be applied to each genome in isolation. There are, however, many reasons why bioinformatic approaches that take advantage of multiple sequenced genomes of both closely and distantly related species are much more powerful. The presence of conserved sequences in different species – both coding and non-coding – can alert us to conserved functions, even if none is yet suspected. Luckily, plant researchers study a range of plant species, and based on sequence similarity, a function assigned to a sequence in one species can often be tested for the same function in another. Interspecific variation, as well as intraspecies variation, can also be quite instructive, particularly when correlated with physiological, developmental, and metabolic differences.
The finding that the genome of Arabidopsis thaliana contains more genes than the genome of Homo sapiens was met with surprise in some circles. It is well established that most new genes arise from a process of gene duplication followed by functional divergence. However, duplication of individual genes or a whole genome (i.e. polyploidization), as well as gene inactivation and deletion, occasionally occur in all living organisms, and each such change initially occurs in a single individual. So the ultimate mechanism that determines the number of genes found in the majority of individuals in a given species is, in most cases, selection. Those individuals that are more fit will contribute disproportionally to the next generation, and the number of genes found in such individuals will come to define the species’ genome. When viewed in this light, it is not surprising that most plant genomes likely contain more genes than the genomes of non-plant species – since most plants are true autotrophs (parasitic plants are an exception) and have to synthesize many compounds that other organisms acquire through their food. Furthermore, because plants are at the bottom of the food chain, they have to produce a large number of compounds to defend themselves against predators. This line of reasoning suggests that plant genomes devote a larger number of genes to encode metabolic functions than animals do, but there is no doubt that genes involved in plant growth and development are also abundant, with many types of differentiating cells found in plants and many physiological and anatomical responses observed in plants coping with major changes in their environments. Thus, it is not surprising that plants come equipped with very complex genomes to help them survive and thrive, and even with comparative genomic approaches, the elucidation of the function of all the genes in even a few plant genomes is clearly going to take quite some time.
The value of genome sequences goes beyond helping us to assign a specific function to a specific gene. The ultimate goal of plant biologists is to understand mechanistically how every facet of the plant’s life cycle is brought about as a consequence of the interplay between the information contained in the genome and cues from the environment the plant finds itself in. In fact, a more holistic question often posed is how such a wide range of morphologically and ecologically distinct species can arise out of a similar set of genes/proteins. Understanding genome structure and the processes that occur within the nucleus (and plastid and mitochondria) is essential to answering such questions, and again a comparative approach is highly beneficial. Furthermore, we strive to understand how plant genomes came to be what they are today – the general mechanisms that govern increases and decreases in the number of genes and how the genes are arranged in the genome. In large part due to all the recent genomic information, our view on the stability of chromosome and genome have changed drastically; we nowadays look upon the genome as a most dynamic entity, the structure of which constantly undergoes changes.
Thus, the first six articles in this collection describe the nature and evolution of the hereditary material. In article 1 (Fransz and de Jong, 2011), the interplay between nucleosome, DNA and proteins is presented. Such interplay serves to enable transcription of genes, as well as to contribute to the faithful transmission of the genetic material to the next generation. Article 2 (Heslop-Harrison and Schwarzacher, 2011) describes the structural necessities, such as the presence of a centromere and telomeres for chromosome stability to maintain integrity and provide the mechanistic conditions for transmission to the next generation. Article 3 (Green, 2011) clarifies the events around one of Nature’s greatest tricks: the cohabitation of originally independent, free-living organisms which created a new atmosphere. What algae added to these developments is outlined in Article 4 (Tirichine and Bowler, 2011). In Article 5, evolutionary forces that shaped the overall plant genome are summarized (Proost et al., 2011). And then finally in this first section, Article 6 (Chu et al., 2011) provides examples of a rather recently discovered phenomenon: the organisation of genes, involved in the same process, in operon-like structures in eukaryotes. Taken together, our aim with these articles is to outline comprehensively what chromosomes are made of, what their structural and functional components are, and how they evolve over evolutionary time.
The last nine articles describe specific groups of genes, or ‘gene families’ that contribute to various functions in plants, from gene transcription and translation, to environmental information processing, to metabolite transport function, and various metabolic functions in both primary and specialized metabolism. Michaud et al. (2011) describe the family of genes encoding tRNAs; Feller et al. (2011) describe the surprisingly large families of MYB and bLHL transcription factors in plants. Gish and Clark (2011) chose to focus on a subset of the receptor-like kinase family, which in total numbers more than 600 members in Arabidopsis. Palmieri et al. (2011) discuss the family of inner membrane transporters of the mitochondrial carrier proteins that carry out essential transport of small metabolites across membranes.
A lot of fine plant biochemistry research had been accomplished before we had gene sequences and we began to realize how complicated things really are – sometimes there are multiple genes that encode the same biochemical function, sometimes different biochemical functions are encoded by similar genes. Strommer (2011) takes on a well-studies group of enzymes, the alcohol dehydrogenases, and analyzes them in view of the new genomic information. Shockey and Browse (2011) and Yonekura-Sakakibara and Hanada (2011) similarly describe the families of carboxyl-CoA ligases and glycosyltransferases, respectively, many of whose members had been discovered prior to the genomic era but the advent of large-scale sequencing made us realize that there are many more such genes in the genome than we originally thought – and the functions of many of them still remain to be determined. Nelson and Werck-Reichhart (2011) depict the large and extremely diverse family of the cytochrome P450 oxidoreductases, which seem to be able to use any class of metabolites in plants, as well as xenobiotics, as substrates. And Chen et al. (2011) describe the family of terpene synthases, a group of enzymes whose origin is in primary metabolism but whose members now are responsible mostly for the synthesis of compounds that help the plant with its interactions with the environment.
These gene/enzyme families were chosen because they are relatively well studied, and therefore illustrate the approaches that have been used to study such families and the types of results obtained. These families are otherwise likely to be representative of many other such families in plant genomes. A couple of general conclusions could be derived from the results described in these articles. First, they illustrate the constant changes that occur in gene number and function within each family in different plant lineages, as some branches of the family increase by duplication and divergence while others decrease through gene loss. Another prominent theme is the occurrence of convergent evolution within a family, as similar biochemical functions may evolve in different plant lineages from different branches of the same gene/enzyme family.
We do not yet fully understand how DNA in its diverse forms evolved its function as the conservator of all necessary information for cellular and organismal development and maintenance, and thus became the archive and memory of life itself, with all its fantastic diversity. We are increasingly faced with huge quantities of data consisting of DNA sequences and its genomic organization. Making good use of this information is clearly the most pressing challenge of any area of biology, including plant biology. In this special issue we have tried to provide a comprehensive overview of the evolution of the organization of the genetic material in plants, as well as the functional evolution of its most important components, the genes, and to highlight the major advances that have been made since the dawn of the plant molecular biology era in the late 1970s and early 1980s. Much has been achieved since that time, but clearly we are only just beginning to understand the functions and concerted actions of plant genes and how they have evolved since the start of the plant kingdom to give rise to the existing plethora of species. There is so much left to be discovered!