Volume 22, Issue 9 pp. 2496-2510
Original Article
Full Access

The molecular basis of invasiveness: differences in gene expression of native and introduced common ragweed (Ambrosia artemisiifolia) in stressful and benign environments

Kathryn A. Hodgins

Corresponding Author

Kathryn A. Hodgins

Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4

Correspondence: Kathryn A. Hodgins, Fax: 604 822 6089; E-mail: [email protected]Search for more papers by this author
Zhao Lai

Zhao Lai

Department of Biology, The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN, USA

Search for more papers by this author
Kristin Nurkowski

Kristin Nurkowski

Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4

Search for more papers by this author
Jie Huang

Jie Huang

Department of Biology, The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN, USA

Search for more papers by this author
Loren H. Rieseberg

Loren H. Rieseberg

Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada, V6T 1Z4

Department of Biology, The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN, USA

Search for more papers by this author
First published: 07 January 2013
Citations: 50

Abstract

Although the evolutionary and ecological processes that contribute to plant invasion have been the focus of much research, investigation into the molecular basis of invasion is just beginning. Common ragweed (Ambrosia artemisiifolia) is an annual weed native to North America and has been introduced to Europe where it has become invasive. Using a custom-designed NimbleGen oligoarray, we examined differences in gene expression between five native and six introduced populations of common ragweed in three different environments (control, light stress and nutrient stress), as well as two different time points. We identified candidate genes that may contribute to invasiveness in common ragweed based on differences in expression between native and introduced populations from Europe. Specifically, we found 180 genes where range explained a significant proportion of the variation in gene expression and a further 103 genes with a significant range by treatment interaction. Several of these genes are potentially involved in the metabolism of secondary compounds, stress response and the detoxification of xenobiotics. Previously, we found more rapid growth and greater reproductive success in introduced populations, particularly in benign and competitive (light stress) environments, and many of these candidate genes potentially underlie these growth differences. We also found expression differences among populations within each range, reflecting either local adaptation or neutral processes, although no associations with climate or latitude were identified. These data provide a first step in identifying genes that are involved with introduction success in an aggressive annual weed.

Introduction

Invasive plants have become valuable systems for studying evolution over contemporary timescales, as founder events, admixture and novel selection pressures in the introduced range can result in rapid evolutionary change in introduced populations (Dlugosch & Parker 2008; Prentis et al. 2008; Colautti et al. 2010). Invasive plants also represent a major environmental and economic threat. They erode native biodiversity, harm proper functioning of ecosystems (McNeely 2001; Colautti et al. 2006) and cause significant economic damage, through such costs as lost crop production and the application of control measures (Pimentel et al. 2000; Myers & Bazely 2003; Colautti et al. 2006). Consequently, the considerable ecological and financial burdens of invasive species provide incentive for understanding how plants become invasive. However, despite a wealth of research into invasion biology, our understanding of the molecular mechanisms that underlie the capacity for certain species to become so abundant in their introduced ranges is sparse, largely due to a deficit of genetic and genomic resources for invasive species (Broz et al. 2007; Stewart et al. 2009).

There is no unique suite of traits found in all invasive plants, but a change reported in some alien species is increased growth and reproduction (Elton 1958; Crawley 1987; but see Willis et al. 2000; Thébaud & Simberloff 2001), and such improved vigour could contribute to the rapid spread of these invasive populations. One hypothesis to explain this phenomenon is that invasive plants have evolved along fitness trade-offs, gaining competitive advantage at the expense of other functions that are not selected for in the introduced range (Grime 1977). For example, invaders may have escaped specialized herbivores, pathogens and competitors found in their native ranges (Blossey & Notzold 1995; Keane & Crawley 2002; Callaway & Ridenour 2004), and resources formerly allocated to costly defence mechanisms might be reallocated to other functions such as increased growth, competitive ability or reproduction. Similarly, some invasive species may have switched from a strategy of high tolerance to abiotic environmental stress to a strategy of enhanced competitive or colonizing ability (Alpert et al. 2000; Richards et al. 2006; He et al. 2010). However, the evolutionary mechanisms that have led to these growth differences between invasive and native populations remain controversial (Colautti et al. 2009), and the molecular underpinnings of these differences, where they occur, have rarely been investigated (Stewart et al. 2009; Mayrose et al. 2011).

Identifying genes involved in rapid adaptation to novel environments represents a major challenge for invasion biologists (Lee 2002; Weinig et al. 2007; Prentis et al. 2008). Genes have been identified that could be important during invasion in model species, such as those involved in phenotypic plasticity (Schmitt & McCormac 1995; Schmitt et al. 2003; Galen et al. 2004), defence (Chen 2008; Heil & Ton 2008), tolerance to physiological stress such as drought (Shinozaki & Yamaguchi-Shinozaki 2007; Amudha & Balasubramani 2011), growth rate (Dodd et al. 2005; Teale et al. 2006) and herbicide resistance (Powles & Preston 2006; Gaines et al. 2010). In many cases, however, the extent to which variation in such genes plays a role in determining natural variation in phenotypes among invasive and noninvasive populations is unknown. Quantitative trait loci mapping studies of invasive species can identify genomic regions associated with invasion success (e.g. Paterson et al. 1995; Linde et al. 2001), but require further genetic and functional work to identify the genes that underlie invasive phenotypes. Genomic approaches, such as selective sweep mapping and genome-wide analysis of gene expression, are promising avenues for identifying candidate genes for invasiveness in nonmodel species (Lai et al. 2008; Prentis et al. 2008; Stewart et al. 2009). Examining the changes in gene expression is particularly valuable because it allows assessment of the prediction that invasive populations have evolved faster growth rates by reducing investment in costly mechanisms involved in stress tolerance. For example, differences in the regulation of genes involved in growth or stress tolerance between native and invasive populations would be consistent with this prediction (Lai et al. 2008).

Common ragweed (Ambrosia artemisiifolia) is an aggressive annual weed native to North America and has been introduced into parts of Australia, Asia and Europe. The weed is particularly problematic in France and in many Eastern European countries, such as Hungary where it is very abundant, reaching high population densities (Chauvel et al. 2006; Kiss & Beres 2006). The species is a ruderal and generally found in disturbed habitats, such as ditches and abandoned fields (Bassett & Crompton 1975). It is a well-known agricultural weed and produces highly allergenic pollen (Bassett & Crompton 1975; Laaidi et al. 2003). Both historical records (Chauvel et al. 2006) and genetic evidence (Genton et al. 2005a; Chun et al. 2010, 2011; Gaudeul et al. 2011) indicate that the invasion of common ragweed into Europe was characterized by multiple introductions at both a local and a regional scale. Common garden studies have found evidence for genetic differentiation in quantitative traits between native and introduced populations, with European populations experiencing faster growth and greater reproduction across an array of environments, but with abiotic stress such as drought, or nutrient stress, reducing or reversing the advantage experienced by the introduced populations (Hodgins & Rieseberg 2011).

Our goal is to use genomic techniques to identify candidate genes that may be responsible for the superior growth and reproduction that we have observed in introduced European populations of common ragweed. Specifically, we addressed three questions. (i) Are there differences in gene expression between native and introduced populations of common ragweed? (ii) Are there functional categories of genes that are over-represented? In particular, we predict genes involved in growth or stress tolerance to be over-represented in our set of differentially expressed genes. (iii) Are any gene expression differences between native and introduced populations dependent on the environment and developmental stage? To answer these questions, we performed a microarray experiment using seeds gathered from six introduced populations from Europe and five native populations from North America, followed by RT-qPCR validation for 19 genes. These experiments allowed us to determine whether there is any evidence for differences in gene expression between native and introduced common ragweed that may have evolved during the expansion of the species out of its indigenous range.

Methods

Study species

Common ragweed (Ambrosia artemisiifolia) is an erect annual herb that typically prefers full sun and moderate to slightly dry soil conditions. The species is found in a wide variety of soil types, but will thrive in soil containing high amounts of clay, gravel or sand because of reduced competition from other plants. It is a self-incompatible (Friedman & Barrett 2008), monoecious, wind-pollinated species. It has uniovulate flowers, but a large individual may produce more than 60 000 seeds (Dickerson & Sweet 1971), which can remain viable in the soil for several years. Seed dispersal by water, birds and humans is probably important for the spread of common ragweed, although the seeds possess no obvious dispersal mechanism (Bassett & Crompton 1975).

Plant materials

We used seed material collected from 22 North American and 12 European populations during the fall of 2008. We selected a subset of populations and maternal families (with some exceptions due to limitations in seed availability for some families) used in an earlier common garden experiment examining phenotypic differences among the ranges of common ragweed (Hodgins & Rieseberg 2011). We selected six populations from each range based on the bioclimatic similarity, to reduce the potential for differences in gene expression due to maternal effects (see Table S1 and Fig. S1, Supporting information). We assessed this by obtaining 19 bioclimatic variables from the WorldClim database version 1.4 (release 3) for each population (Hijmans et al. 2005). We used principal components analysis (PCA) to summarize the bioclimatic variables. Using the first two principle components, we took the six populations from each range that provided shortest Euclidean distance between native and introduced populations.

Following stratification procedures suggested in Willemsen (1975), we germinated 30 seeds from four or five maternal families from each population on damp filter paper in Petri dishes with 1% plant preservative mixture. We placed the seeds in a germination chamber with a 24 °C day and a 18 °C night and a 14:10 light:dark cycle. On 10 June 2010, after 5–6 days in the germination chamber, we transplanted the seedlings into one-inch pots that were filled with potting soil and were misted at regular intervals. The remaining seeds were monitored for another week, and no further germination occurred after this point. For one native population (OTON), little germination occurred, and we were forced to remove this population from the experiment. After 2 weeks, we transplanted the seedlings to 3½-inch pots with a 1/2 sand and 1/2 soil mixture. Plants were watered by automatically flooding the bench with fertilized water.

We randomly assigned the plants from each family to a treatment before transplanting into the larger pots. Individuals from each maternal family were assigned to two treatments (light stress and nutrient stress) and two controls (control 1 and control 2). Plants within each treatment were randomly assigned to trays, and the trays were randomly assigned a location on the bench. Two weeks following the transplant, when the seedlings had approximately 4–8 true leaves, we harvested tissue from the first control (control 1). New leaves were collected, placed in a foil envelope and instantly frozen in liquid nitrogen upon removal from the plant. To minimize individual differences that are prominent in natural populations, three undamaged individuals from the same family were randomly selected and pooled as one biological sample. All samples were collected in a random order. We then applied a light stress (simulating aboveground neighbour effects) and a nutrient stress to the treatment plants. To simulate aboveground neighbour effects, we constructed one shade box (1.5 m × 0.6 m × 1.3 m) with shade cloth and green filters to reduce the quantity and quality of light following Hodgins & Rieseberg (2011). To establish the nutrient stress, we flooded the trays with water that lacked the fertilizer, while all other trays received fertilizer. Treatments had a visible effect on the phenotype of the plants, particularly plant size. After 2 weeks of growth under stress conditions, we harvested and pooled three individuals per family in the two treatments and the remaining control (control 2) in the same manner as above. Tissue was then stored in a −80°C freezer. In total, 36 families represented in all four treatments were collected.

RNA extraction

We extracted RNA samples in a random order to avoid confounding batch effects with the experimental groups. We extracted total RNA using the TRIzol reagent (Invitrogen)/RNeasy (QIAGEN) approach as described in Lai et al. (2006). We used a nanodrop to ensure that each sample had a concentration of >1.0 µg/µL, a A260/A280 > 1.8 and a A260/A230 > 1.8.

cDNA preparation, hybridization, washing and scanning

In collaboration with NimbleGen, we developed a high-density customized expression 12-plex array with 60-mer probes (see Lai et al. 2012 for details). In total, 45 063 unigenes were represented on the array with two or three probes per unigene for a total of 134 996 features. We prepared the double-stranded cDNA according to the array manufacturer's instructions (Roche NimbleGen; NimbleGen Arrays User's Guide v.5.1) except in some cases, we added 1ul SUPERase (Ambion) to the first-strand synthesis reaction to prevent RNA degradation. We then obtained cDNA yields with the Nanodrop ND-1000 and assessed cDNA quality using the Agilent Bioanalyzer. We labelled cDNA and performed the hybridization on a 12-bay NimbleGen Hybridization System and the array washes as recommended by NimbleGen (NimbleGen Arrays User's Guide v.5.1). Using a GenePix Professional 4200A scanner (Axon Instruments), we acquired individual array images independently adjusting the PMT gain for each image as recommended. Following this, we analysed the images with the NimbleScan software (NimbleGen) and exported feature intensities as .pair and .xys files.

Preprocessing and analysis

We manually inspected each array and removed a small number of features due to quality (10 arrays out of 143 with an average of 33.1 probes per array were affected) using NimbleScan software. One chip sample (sample FR7-6, control 1) had a major flaw, and so we completely removed it from the analysis. We have deposited the microarray design and experimental data at the public repository ArrayExpress (ArrayExpress accession A-MEXP-2254 and E-MTAB-1350). As our method of analysis did not allow for missing data, we replaced the missing intensity scores using the k-nearest neighbour method implemented by the impute package for r (KNN method, 10 nearest neighbours; Troyanskaya et al. 2001). We then preprocessed the microarray data using the oligo package from Bioconductor (release 2.9; Carvalho & Irizarry 2010), applying the robust multichip averaging method (Bolstad et al. 2003; Irizarry 2003; Irizarry et al. 2003). Specifically, the data were log2-transformed, and then, the probe-level data for each microarray were background-corrected independently using a probabilistic model. For normalization and summarization, we applied the quantile normalization method followed by robust median averaging. Genes where 95% of the samples had intensity levels lower than the intensities mean + 2*SD of the random probes were removed leaving 33 464 out of the 45 062 unigenes on the array.

We then used the r package maanova version 1.22.0 (Cui & Churchill 2003) to identify genes with significantly different expression. We incorporated population and family as random effects and treatment (control 1, control 2, light stress and nutrient stress), range and their interaction as fixed effects. For those genes with a significant interaction effect, we performed contrasts to assess differences between the ranges in each treatment. For those genes without a significant interaction effect, we then removed the interaction from the model and tested for range and treatment effects. P-values were determined using 1000 permutations where samples were randomly shuffled, and the observed values (FS test option in r/maanova) were compared to this distribution. We implemented the false discovery rate (FDR) procedure to correct for multiple comparisons using an FDR cut-off value of 5% (Storey 2002).

To assess the importance of bioclimatic variation in driving differences in gene expression of the native and invasive populations, we used the 19 bioclimatic variables from the WorldClim database version 1.4 (release 3) for each population (Hijmans et al. 2005). Due to the number and the multicollinearity of the bioclimatic data, we used PCA to summarize the patterns of correlation among a subset of the bioclimatic variables. Specifically, we first examined pairwise correlations among the variables and removed one variable from each pair with a high correlation (R > 0.8) leaving eight variables. Following this, we performed PCA on the correlation matrix of these eight variables. The first principle component, PC1bio, was most strongly correlated with several temperature variables (annual temperature, mean diurnal range, isothermality, temperature seasonality), and the second principle component, PC2bio, was most strongly correlated with annual precipitation and mean temperature of the wettest quarter. The ranges (native vs. introduced) were significantly different for PC1bio (t9 = 7.46, < 0.0001), but not for PC2bio (t9 = 0.01, = 0.99).

We used the lmer procedure of the lme4 package in r to examine the association between gene expression and climate or latitude (Bates & Maechler 2011). For each treatment, we used general linear mixed models where the fixed effects were range, a covariate (latitude, PC1bio or PC2bio) and their interaction, and the random effect was population. To assess the significance, we used likelihood ratio tests of the full model compared with a reduced model excluding the fixed effect of interest. For those genes with no significant interaction between latitude or PC2bio and range, we removed the interaction and tested the significance of the covariate. A significant interaction between the covariates and range determined whether there were differences between native and introduced populations in the slope of the relation between the covariates and gene expression. As range and the climatic variable PC1bio were confounded, we were unable to statistically disentangle the effects of PC1bio and range. Therefore, we then analysed the association between gene expression and climate for the native and the introduced populations separately to determine whether there were any associations with climate within each range. As above, we implemented the FDR procedure to correct for multiple comparisons using an FDR cut-off value of 5%. We also examined the significance of the random effect population. Specifically, for each treatment, we compared the linear mixed model with the fixed effect range and the random effect of population with a reduced linear model (lm function) including only range. Significance was assessed using the exactLRT function from the RLRsim package in r, which provides an exact likelihood ratio test based on simulated values from the finite sample distribution (Scheipl & Bolker 2011). Using this model, we also examined the average proportion of the variation in gene expression explained by population, range and residual variance across all genes.

For functional annotation of the 33 464 genes expressed on the array, we first performed a Blastx against the nonredundant protein database (NR) that had been filtered to retain only proteins from green plants using custom Perl scripts. We retained the 20 top hits that had an E-value of at least 1 × 10−10. We then used the Gene Ontology (GO) blast2go pipe (B2G4PIPE version 2.3.5 www.blast2go.com/b2ghome) to identify the GO terms associated with each gene (Conesa et al. 2005; Conesa & Götz 2008). We used an annotation cut-off of 55 and a GO weight of 5 and a blast minimum alignment length of 100. We imported our blast results and annotation file into blast2go (version 2.5.0). We augmented the annotations using the ANNEX annotation expander, ensuring that only the lowest term per branch remained in the final annotation set. Following this, we performed an enrichment analysis to test for an excess or paucity of gene classes (based on GO terms) in our significant sets of genes relative to the all the other expressed genes on the array using Fisher's exact test.

Real-time-quantitative PCR validation

We used the same RNA samples in the first control for both the microarray and the real-time-quantitative PCR (RT-qPCR) to verify expression patterns, but picked only a single individual per population for RT-qPCR. We synthesized the first-strand cDNA as described in Lai et al. (2006). We used primer3plus to design the primers (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi; Table S2, Supporting information). We carried out the RT-qPCR following Lai et al. (2008). We ran each reaction in triplicate. Dissociation curve analysis followed after PCR amplification. We used standard curves that were generated by the Stratagene RT-qPCR software with a cDNA mix from all 11 populations as a template and corresponding gene-specific forward and reverse primers. Overall, we had adequate RT-qPCR results for 20 primers, and we used glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as a control to normalize the actual cDNA amount of each sample, as this housekeeping gene had relatively low variance in expression in both experiments. We then analysed the data using the MxPro RT-qPCR software version 4.10.

To assess whether a similar pattern of expression was found for both the microarray and RT-qPCR, we acquired the normalized expression values from the microarray experiment for the same individuals used in the RT-qPCR experiment. For both the microarray and RT-qPCR, the populations with the highest level of expression were set at 100%, and the expression levels of the other populations were calculated relative to this standard. We then did a two-way anova with the main effects range (introduced or native) and measurement method (RT-qPCR or microarray) as well as their interaction. A significant interaction would suggest incongruence between the methods when determining the expression differences between the ranges. We also took the difference in the average expression level between the native and introduced accessions for each method and gene and then examined the correlation between the methods across the 19 genes used in our study.

Results

Differential expression

Overall, the treatment by range interaction was significant for 103 genes after the FDR correction, revealing that for these genes, the degree of difference in expression between the native and introduced populations changed depending on the treatment. Contrasts of those genes with a significant interaction term identified 66 that were differentially expressed between the ranges in control 1 (35 up-regulated and 31 down-regulated in the introduced populations), seven that were differentially expressed in control 2 (four up-regulated and three down-regulated in the introduced populations), 86 that were differentially expressed in the light stress treatment (51 up-regulated and 35 down-regulated in the introduced populations) and 27 that were differentially expressed in the nutrient stress (10 up-regulated and 17 down-regulated in the introduced populations; Fig. 1). Many genes (43) showed a reversal in the direction of the expression differences between the ranges across environments, with most reversals involving differences between control 1 and the light stress. For example, 17 genes had reduced expression in the plants from introduced populations in control 1 and greater expression in the plants from introduced range in the light stress (Table S4, Supporting information), while 15 genes had the reverse expression pattern.

Details are in the caption following the image
A venn diagram illustrating the number of genes differentially expressed between native and introduced populations of common ragweed (Ambrosia artemisiifolia) for those genes with a significant range by treatment interaction. (a) A comparison of the control groups and the light stress. (b) A comparison of the control groups and the nutrient stress.

Of the remaining genes, 180 were significantly different between the ranges, and 27 652 genes revealed a significant treatment effect. Of those genes that were significantly different between the native and introduced populations, 103 had greatest expression in the plants from introduced populations, while 77 had higher expression in the native populations (Fig. 2). Contrasts between control 1 and the other treatments revealed that 15 581 genes had different expression levels between control 1 and control 2, 19 040 genes had different expression levels between control 1 and the light stress, and 25 006 genes had different expression levels between the control 1 and the nutrient stress. Similarly, contrasts between control 2 and the other treatments revealed that 16 489 genes were differentially expressed between control 2 and the light stress, and 25 629 were differentially expressed between the control 2 and the nutrient stress. The nutrient stress had the largest effect on the expression of genes on the array. This stress treatment also had much lower quantities of RNA, suggesting differences in RNA metabolism between the nutrient stress and the control treatments, which may have contributed to the large number of genes differentially expressed between the nutrient stress and the other groups. There were no large differences in the numbers of genes that were significantly up- and down-regulated in comparisons between the stress treatments and controls.

Details are in the caption following the image
A heat map of the 180 differentially expressed genes between the native and introduced populations of common ragweed (Ambrosia artemisiifolia) for the control 2 treatment. The normalized expression values for the expressed genes on the array are shown. Heatmaps were drawn in r with heatmap.2, and both the genes (rows) and samples (columns) were clustered using dendrograms. The dendrograms were constructed using Euclidean distances and hierarchical clustering.

We found no significant interaction between latitude and range or PC2bio and range for any treatment at either the 5% or the 10% FDR level, and there were no genes with a significant association with latitude or PC2bio in any of the four treatments, when we controlled for expression differences due to range. Similarly, within each range, we found no genes with a significant association with PC1bio for any of the four treatments. For our analysis of the variance in gene expression due to population, we found that after the FDR correction, 76 genes were significant in the control 1 treatment, 90 genes were significant in the control 2 treatment, and 29 genes were significant in the light treatment, while no genes were significant in the nutrient stress. On average, we found that most of the variance in gene expression was due to residual variance (control 1 mean = 84.03%, SE = ±0.09; control 2 mean = 83.48%, SE = ±0.09; light mean = 87.4%, SE = ±0.08; nutrient mean = 91.41%, SE = ±0.07), followed by among-population variance (control 1 mean = 13.29%, SE = ±0.09; control 2 mean = 15.23%, SE = ±0.08; light mean = 10.04%, SE = ±0.08; nutrient mean = 5.79%, SE = ±0.07) within each range, and then variance between the ranges (control 1 mean = 2.68% SE = ±0.03; control 2 mean = 1.28%, SE = ±0.05; light mean = 2.54%, SE = ±0.03; nutrient mean = 2.80%, SE = ±0.03).

Putative function of differentially expressed genes

Of the 33 465 genes on the array, 26 823 had a hit matching our criteria in the NR green plant database. The highest percentages of blast hits were to Vitis vinifera (16.6%), Arabidopsis thaliana (13.3%), Oryza sativa (11.4%) and Populus trichocarpa (10.8%) with the remainder (34.6%) blasting to other plant species. Of the genes that had hits in the NR plant database, 16 779 had significant GO annotations, and from the 180 genes significantly different between the ranges, 82 had significant GO annotations (Table S3, Supporting information). Genes with a significant range effect produced no significant GO terms at α = 0.05 after FDR correction for multiple statistical tests. Out of the 103 genes with a significant interaction between range and treatment, 57 had significant GO terms assigned (Table S4, Supporting information). We found that this set had a significant over-representation of genes relating to oxidoreductase activity and response to blue light (Table 1a). When combining the genes that were significant between the ranges in at least one treatment (both range and interaction genes), GO terms relating to oxidoreductase activity, monooxygenase activity, iron ion binding, catalytic activity and metal ion binding were significant (Table 1b). Response to blue light was marginally significant. When we examined the biological process GO terms at level 2, there were no terms significantly over- or under-represented that were differentially expressed using an FDR corrected Fisher's exact test (Fig. 3a). Similarly, when we examined the molecular function GO terms at level 2, there was only a single over-represented term (catalytic activity, < 0.01; Fig. 3b).

Table 1. The results of a Fisher's exact test examining the number of genes associated with GO terms. The analysis was conducted using Blast2GO, and only significant results (α = 0.05, FDR corrected) are shown. (a) Genes with a significant range by treatment interaction. (b) Genes with a significant range and interaction effect
GO term Name Type FDR P-value Test group Reference group Nonannot test Nonannot reference group
(a)
GO:0016682 Oxidoreductase activity, acting on diphenols and related substances as donors, oxygen as acceptor Molecular function 9.4E−5 6.0E−8 5 26 52 16 696
GO:0016679 Oxidoreductase activity, acting on diphenols and related substances as donors Molecular function 9.4E−5 7.1E−8 5 27 52 16 695
GO:0009637 Response to blue light Biological process 3.0E−2 3.4E−5 2 1 55 16 721
(b)
GO:0016491 Oxidoreductase activity Molecular function 3.9E−3 1.5E−6 34 1710 105 14 930
GO:0016682 Oxidoreductase activity, acting on diphenols and related substances as donors, oxygen as acceptor Molecular function 5.4E−3 5.2E−6 5 26 134 16 614
GO:0016679 Oxidoreductase activity, acting on diphenols and related substances as donors Molecular function 5.4E−3 6.1E−6 5 27 134 16 613
GO:0004497 Monooxygenase activity Molecular function 2.2E−2 3.4E−5 8 141 131 16 499
GO:0005506 Iron ion binding Molecular function 2.4E−2 4.5E−5 10 240 129 16 400
GO:0003824 Catalytic activity Molecular function 2.7E−2 6.2E−5 102 9524 37 7116
GO:0046872 Metal ion binding Molecular function 5.1E−2 1.4E−4 34 2124 105 14 516
GO:0009637 Response to blue light Biological process 6.7E−2 2.0E−4 2 1 137 16 639
  • FDR, false discovery rate; GO, Gene Ontology.
Details are in the caption following the image
The percentage of genes associated with different Gene Ontology (GO) terms for (a) biological processes and (b) molecular function (level 2) in common ragweed. Genes with a significant range effect or treatment by range interaction are compared with the remaining genes on the microarray that were expressed. As genes can have multiple annotations, the percentages will not sum to 100%.

When examining the genes significantly affected by treatment, we found that the only GO term significantly over-represented after applying the FDR correction was RNA metabolic process. When we compared those genes with significant differences between specific treatments, we found many significant GO terms. We were most interested in gene function with respect to biological process, as well as the comparisons between control 1 and control 2, representing developmental changes, and control 2 and the two stress treatments, which would represent the effects of the treatments on gene expression (Table S5, Supporting information).

For the 193 genes where population explained a significant component of the variation in gene expression, we found that 143 had blast hits, and 93 had associated GO terms. Following FDR correction, we found that there were no GO terms significantly over-represented.

RT-qPCR validation

When comparing the microarray and RT-qPCR data using a two-way anova, we found a significant interaction in only two cases out of the 19 tested (Contig18498 and HU1cat_c16455, anova results not presented). For these cases, the same pattern of significance was discovered with the introduced populations having, on average, greater expression levels in both cases, but the RT-qPCR offered a larger difference than the microarray, perhaps because of the reduced sensitivity and dynamic range of microarrays (Nagalakshmi et al. 2008). A Spearman's rank correlation comparing the differences between the introduced and native expression across all genes, as measured using the microarray and RT-qPCR, found a strong positive correlation, as expected (Fig. 4; ρ = 0.78, < 0.0001, = 19). Overall, the methods offered similar results when comparing between the ranges.

Details are in the caption following the image
A comparison of the RT-qPCR results and the microarray experiment for 19 genes from common ragweed. The difference in the expression between the native and introduced range is shown.

Discussion

Although much research has examined ecological and evolutionary factors that contribute to invasion, investigation into the genomic basis of invasiveness is just beginning (Stewart et al. 2009). Here, we identify candidate genes that may contribute to invasiveness in common ragweed based on differences in expression between native and introduced populations. Previously, in common gardens of native and introduced common ragweed, we found more rapid growth and greater reproductive success in introduced populations, particularly in benign and competitive (light stress) environments (Hodgins & Rieseberg 2011), and many of these candidate genes potentially underlie the growth differences that we observed. Specifically, we found 180 genes with differences in expression between the ranges across all treatments and a further 103 genes with differences in expression in at least one treatment, with the greatest number of significant genes found in those treatments with the greatest phenotypic differences between the range (control 1 and light stress). Below, we discuss aspects of experimental design and analysis for population studies of expression in nonmodel organisms and briefly examine the potential function of several of these candidate genes.

Variation in gene expression

The distribution of genetic variation within and among populations has been a long-standing question in population genetics. For outcrossing plants, most of the neutral genetic variation is partitioned within populations rather than among populations (Hamrick & Godt 1996) and common ragweed in both North America and Europe is no exception to this pattern (Genton et al. 2005a; Chun et al. 2010). However, the question of how higher levels of molecular variation are distributed within and among plant populations has not often been considered, although population microarray studies have begun to partition variation in gene expression in this way (Oleksiak et al. 2002; Storey et al. 2007; Lai et al. 2008). When considering each treatment separately, our study found that most of the variation in gene expression was within population variation (mean = 87%), and neither population (mean = 13%) nor range (mean = 2%) generally explained a large proportion of the variation in gene expression across all of the expressed genes on the array. As the environment and developmental time were held constant within each treatment, this pattern could reflect the prominence of within population genetic variation for expression in common ragweed, although technical sources of error probably contributed to the variation among the samples within populations. As our interest was mainly in population- and range-level differences in expression, technical replicates were not taken.

Studies of expression differences among populations have been concerned with the relative importance of neutral drift and directional selection (Khaitovich et al. 2004; Whitehead & Crawford 2006; Broadley et al. 2008). The divergence in expression among populations in our study may reflect the action of drift, or divergent selection and local adaptation. Clinal variation along an environmental gradient in introduced plants has often been used as evidence for rapid adaptation to local environments in the new range, particularly when population history is accounted for (Maron et al. 2004; Keller et al. 2009). Interestingly, although we previously found phenotypic associations with climate and latitude (Hodgins & Rieseberg 2011), we found no such associations with gene expression despite significant differences in expression among populations for some genes. This could be due to a number of reasons, such as: (i) limited population sampling within each range (five native and six introduced populations), which could reduce our capacity to detect associations given the high level of technical and biological variation typically found in microarray studies (Whitehead & Crawford 2006); or (ii) the timing or location of expression differences, as we sampled the leaves of plants well before reproductive maturity, and expression differences associated with reproduction (e.g. flowering time) may not have been apparent.

Our study identified many genes that could be contributing to the phenotypic differences that we have observed between the native and introduced populations of common ragweed (Hodgins & Rieseberg 2011). Whether these differences in expression are driven by directional selection during or after invasion or reflect the colonization history of common ragweed has yet to be confirmed. Several studies of population structure in ragweed have shown high levels of admixture and weak population structure in Europe, probably due to multiple introductions (Genton et al. 2005a; Chun et al. 2010, 2011; Gaudeul et al. 2011; Gladieux et al. 2011). This pattern indicates that founder effects and populations bottlenecks probably do not play a large role in the expression differences that we observed between the ranges.

In three treatments, samples from the population in eastern North America (MNON) had expression patterns that were more akin to the European samples compared to other North American samples for genes that were significantly different between the ranges (e.g. Fig. 2). Our common garden studies revealed that MNON was closer phenotypically to the European samples relative to the other North American populations, as this population exhibited more rapid growth and greater reproduction compared to other North American populations (Hodgins & Rieseberg 2011). Most of our North American samples were from the Great Plains, where common ragweed is known to be native, as this species has only become abundant in eastern North American in the last 200 years (Bassett & Crompton 1975; Lavoie et al. 2007). Similar to European populations, common ragweed is problematic in Southern Ontario and is an abundant weed of agricultural fields (Alex & Switzer 1992). Gaudeul et al. (2011) identified slightly greater genetic similarity between Western and Central European populations, and eastern North American populations. However, the populations closest to the areas where we sampled in western North America (Montana and Minnesota) appear to be as similar genetically to those of France and Hungary as the Ontario samples (Gaudeul et al. 2011). This suggests that colonization history may not be the cause of the phenotypic similarity in expression and morphology that we have observed between MNON and the European samples, although investigation into specific populations used in this study would be important to confirm this. The Great Plains has more extreme seasonality, is drier and has a shorter growing season than eastern North American populations, and the similarity in candidate gene expression between MNON and the European populations could be a response to the more moderate, European-like climatic conditions in Southern Ontario.

To capture differences in gene expression between populations, treatments and regions, levels of biological variation need to be accurately estimated. For our study, we were mainly interested in differences between the native and introduced range, as well as in differences among populations within each range, so adequate biological replication at both levels was required. Poor sampling of populations could lead to the erroneous conclusion that idiosyncratic differences among populations reflect general differences between the ranges and result in a large number of false positives (Whitehead & Crawford 2006; Colautti et al. 2009). As the cost of assessing gene expression declines, larger sample sizes combined with careful experimental design and statistical analysis will reduce spurious findings that may have been common in early studies of gene expression (Whitehead & Crawford 2006). This will also reduce the need to pool individuals due to limitations in cost (Kendziorski et al. 2005). Pooling can be problematic if individuals comprising different pools are not independent, as this could artificially reduce variation among pools and again lead to false positives.

Functional analysis

Annotation using GO terms allows characterization of functional information about genes that can be quantified and compared across genomes. GO analysis has been a popular method for translating a set of differentially expressed genes into a set of functional groups. However, this approach is hampered by two main challenges. First, annotations are biased and incomplete in existing databases, due to a lack of investigation, lags in annotating the database or oversight (King et al. 2003; Khatri & Draghici 2005; Thomas et al. 2012). Second, annotations can be imprecise or incorrect (Schnoes et al. 2009). Many GO terms are inferred through electronic annotation, and although the inferred function often is correct (Camon et al. 2005), these annotations are generally made at a high level, which restricts their utility (Khatri & Draghici 2005). This process is complicated even more with nonmodel organisms where almost all of the annotations are based on homology to a model species, and although conserved function of orthologs is often the case (Koonin 2005), shifts in function can occur across species (Gharib & Robinson-Rechavi 2011; Thomas et al. 2012). Consequently, such annotations and the GO term analysis that are produced from them represent only a coarse investigation of the putative function of differentially expressed genes in nonmodel species. Determining the role of genes in these organisms represents a daunting challenge, and although the sequence information for these organisms is growing exponentially, detailed functional analysis and gene annotation represent the next major hurtle to understanding the molecular basis of ecologically important traits. Given these caveats, we discuss the potential function of the differentially expressed genes below.

Expression differences among treatments can give an initial glimpse into the biological role of genes in species where functional work is limited. We found substantial differences in gene expression among treatments. Changes in expression across a large number of genes among treatments that differ in developmental time or environmental stress are common in studies of plant gene expression (e.g. Kreps et al. 2002; Qin et al. 2008). We found that many GO terms were over-represented depending on the comparisons made between the control and stress treatments; however, the only GO term over-represented across all treatments was RNA metabolism, suggesting that there were differences in transcription/translation rate or RNA stability among the treatments, perhaps reflecting the poorer RNA yields for the nutrient stress treatment. Most of the genes that were differentially expressed among treatments with this GO term were genes involved in transcription, and there were several genes involved in tRNA metabolism, as well as RNA catabolism. Large changes in transcription and RNA stability are known to occur in response to stress in plants and other species (Narsai et al. 2007; Qin et al. 2008; Shalem et al. 2008), and changes in the regulation of genes that control these processes are probably responsible for the large differences in gene expression that we observed among treatments.

When we examined genes differentially expressed between the ranges in at least one treatment, we only found that the biological process of ‘response to blue light’ was significantly over-represented. This suggests that introduced populations may have a different molecular response to light stress, which could be the basis of their more rapid growth in shaded environments relative to the native populations (Hodgins & Rieseberg 2011). Upon further inspection of the candidate gene lists, many of the top blast hits to these genes are known to be involved in stress response, as well as growth and development, and many times, these functions were not annotated, especially in species other than Arabidopsis thaliana. However, about half of the candidate genes either have no close homologues in GenBank or do not have any information concerning their potential function in the literature.

Of the 180 genes differentially expressed between the ranges across all four treatments, there were many genes known to be involved in the metabolism of secondary compounds, such as terpenoids ((-)-a-terpineol synthase, Martin & Bohlmann 2004), sesquiterpenes (e.g. beta-caryophyllene synthase, Wang et al. 2009; sesquiterpene cyclase, Prosser et al. 2002; amorphadiene synthase, Cai et al. 2002), flavonoids (flavonoid 3′-hydroxylase, Seitz et al. 2006) and phenylpropanoids (e.g. cinnamate 4-hydroxylase, Teutsch et al. 1993). Many of these genes were cytochrome P450s, a large gene superfamily involved in the biosynthesis of many plant secondary products and hormones, as well as in the detoxification of xenobiotics (Werck-Reichhart & Feyereisen 2000). In addition, we found genes responsible for the modification of secondary compounds (e.g. UDP-glycosyltransferases; Richman et al. 2005), as well as genes involved in the metabolism of several key precursors to secondary compounds (e.g. lipoxygenases, Porta et al. 2008). We also identified genes involved in the metabolism of important plant hormones or hormone signalling (e.g. cytokinins-zeatin o-glucosyltransferase, Martin et al. 1999; abscisic acid receptor pyl4, Lackman et al. 2011). Secondary compounds have diverse biological roles such as nitrogen storage, UV protection, insect attraction and allelopathy, but their primary function is thought to be chemical defences against pathogens and herbivores (Chen 2008). Although there was no clear pattern with respect to up-regulation and down-regulation of these genes between the ranges, our data suggest that there are constitutive differences in expression between the ranges of many genes involved in the production of these secondary metabolites. Further examination of the chemical composition of native and introduced common ragweed populations as well as comparisons between pathogen and herbivore resistance would be of interest. Previous studies of common ragweed have found greater herbivory in the native range, but no differences in tolerance or resistance to herbivory between native and introduced plants (Genton et al. 2005b; Hodgins & Rieseberg 2011), although pathogens and specialists herbivores have not been examined.

We identified a number of genes probably involved in the response to oxidative stress and detoxification of heavy metals and xenobiotics, such as herbicides, through glutathione (e.g. glutaredoxin-like protein, glutathione s-transferase, hydroxyacylglutathione hydrolase; reviewed in Rouhier et al. 2008; Powles & Yu 2010), as well as cytochrome P450s potentially important for xenobiotic metabolism (e.g. 7-ethoxycoumarin o-deethylase that was up-regulated in native populations; Robineau et al. 1998). Populations of common ragweed are known to have evolved in response to a number of herbicides (e.g. ALS, PPO, and glyphosate resistance; Jordan et al. 2010). Mutations in acetolactate synthase have conferred resistance to ALS-inhibiting herbicides in North American giant ragweed, but there is presently little information regarding the genetic basis of herbicide resistance in common ragweed nor the geographic distribution of resistance phenotypes.

Genes with a significant interaction between range and treatment could be important in governing any environment specific differences in phenotype including those that govern trade-offs in performance. Several of these candidate genes are putatively involved in the production of aromatic amino acids (dehydroquinate dehydratase shikimate dehydrogenase, 3-deoxy-d-arabino-heptulosonate 7-phosphate synthase, prephenate dehydratase; reviewed in Tzin & Galili 2010), which are not only important components of proteins, but also sources of aromatic secondary compounds, such as indole alkaloids and phenylpropanoids, as well as lignin (Yamada et al. 2008). Two of these enzymes are in the shikimate pathway, the target of glyphosate herbicides (reviewed in Powles 2008), while prephenate dehydratase is known to be involved in the blue light-mediated synthesis of phenylalanine in A. thaliana (Warpeha et al. 2006). All of these genes were significantly up-regulated in the introduced populations exposed to the light stress. Resistance to glyphosate herbicide is becoming increasingly common in ragweed populations (Powles 2008), and resistant populations of common ragweed have been shown to have higher activity in an enzyme in the shikimate pathway (3-deoxy-d-arabino-heptulosonate-7-phosphate synthase; Brewer & Oliver 2009). Accordingly, a follow-up experiment will determine whether there are differences in herbicide resistance between introduced and native populations of common ragweed.

Although our enrichment analysis for genes with expression differences between the ranges did not provide any significantly over-represented biological processes other than blue light response, it is clear that many important genes involved in a variety of stress responses have constitutive differences in expression between the ranges or differences that are induced upon exposure to a stress. Using these differentially expressed genes as candidates, the critical next steps are to (i) identify the genetic changes that may have occurred during the invasion of Europe by common ragweed via evolution and (ii) elucidate the molecular and biological role of the candidate weedy genes.

Acknowledgements

We would like to thank the Otto lab as well as Alessia Guggisberg and Sebastian Renaut for advice with experimental design or analysis, and Rob Colautti and three anonymous reviewers for comments on the manuscript. This research was funded by a Natural Sciences and Engineering Research Council of Canada Award (No. 353026 to L. Rieseberg, S. Otto, K. Adams, and J. Whitton).

    L.H.R. conceived of the experiment, helped design the experiment and helped write the paper. K.A.H. designed and carried out the experiment, did the analysis and wrote the paper. K.N. helped with the experiment, RNA extraction and cDNA synthesis. J.H. carried out the RT-qPCRs and Z.L. carried out the microarray hybridizations. All authors provided feedback for the final version of the manuscript.

    Data accessibility

    The raw microarray data files as well as the normalized and transformed data, protocols and array design information are deposited at ArrayExpress (Accession nos. A-MEXP-2254 and E-MTAB-1350). Other population metadata is provided in Table S1 (Supporting information).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.