Volume 55, Issue 5 e23027
RESEARCH ARTICLE
Full Access

Factors involved in early polarization of the anterior-posterior axis in the milkweed bug Oncopeltus fasciatus

Neta Ginzburg

Neta Ginzburg

The Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904 Israel

Search for more papers by this author
Mira Cohen

Mira Cohen

The Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904 Israel

Search for more papers by this author
Ariel D. Chipman

Corresponding Author

Ariel D. Chipman

The Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904 Israel

Correspondence Ariel D. Chipman, The Department of Ecology, Evolution and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Givat Ram, Jerusalem, 91904, Israel.Email: [email protected]Search for more papers by this author
First published: 22 April 2017
Citations: 11

Funding information: ISF, Grant Number: #75/11

Abstract

The axes of insect embryos are defined early in the blastoderm stage. Genes involved in this polarization are well known in Drosophila, but less so in other insects, such as the milkweed bug Oncopeltus fasciatus. Using quantitative PCR, we looked at differential expression of several candidate genes for early anterior-posterior patterning and found that none of them are expressed asymmetrically in the early blastoderm. We then used an RNA-Seq approach to identify novel candidate genes that might be involved in early polarization in Oncopeltus. We focused on transcription factors (TFs) as these are likely to be central players in developmental processes. Using both homology and domain based identification approaches, we were unable to find any TF encoding transcripts that are expressed asymmetrically along the anterior-posterior axis at early stages. Using a GO-term analysis of all asymmetrically expressed mRNAs, we found an enrichment of genes relating to mitochondrial function in the posterior at the earliest studied time-point. We also found a gradual enrichment of transcription related activities, giving us a putative time frame for the maternal to zygotic transition. Our dataset provides us with a list of new candidate genes in early development, which can be followed up experimentally.

1 Introduction

Initial axis determination is the first significant milestone in the process of embryonic patterning. The anterior-posterior (A-P) axis defines the position of the head and sensory structures, and the arrangement of structures and organs throughout the main length axis of the organism. The dorso-ventral (D-V) axis defines the position of structures orthogonally with the A-P axis, and in relation to the gravitational field of the earth. While axis determination is crucial to all subsequent events in development, it is actually very labile evolutionarily, and there are large differences in the details of the process, even within closely related organisms. This early liability is a main feature of the so-called “developmental hourglass model”, which suggests that early events and late events in embryogenesis are highly variable, while events in mid-embryogenesis tend to be highly conserved (Peel et al., 2005; Raff, 1996).

Axis determination in insects has been extensively studied in one of main models in biology, the fruit fly Drosophila melanogaster (Hartenstein and Chipman, 2015). However, work on other insects and arthropods indicates many developmental properties of Drosophila are not representative (Peel et al., 2005), and this is especially true for early processes such as axis formation. For example, the morphogen bicoid, an important anterior maternal determinant in Drosophila, apparently has no orthologs outside the cyclorhaphan flies (Stauber et al., 1999, 2000). Notably, the A-P and D-V axes in Drosophila are distinct and are determined through different pathways (Hartenstein and Chipman, 2015). There is evidence to suggest that this distinction is not so clear in other insects (Berni et al., 2014; da Fonseca et al., 2009; Sachs et al., 2015; Wilson and Dearden, 2011).

In order to place development in a comparative framework and to gain a broader understanding of development in insects, additional species of insects have been used increasingly for developmental studies over the last 20 years. Many such studies on non-model insects are done using the candidate gene approach, in which orthologs of interesting developmental genes from Drosophila are identified and cloned in the study species. These genes are then investigated in the study species in the context of their role in Drosophila. This candidate gene approach is relatively common, and has provided the basis for much of the knowledge gained about insect development in recent years. However, the candidate gene approach, despite its usefulness, is problematic. In a sense, it is “looking under the lamp post”: if one looks only for important genes in Drosophila development, one will not come across genes that are not important, or simply do not exist in Drosophila. This problem becomes more significant the earlier the stages one is interested in.

We are interested in early embryonic development in the milkweed bug Oncopeltus fasciatus. Oncopeltus is a hemipteran emerging model species, with an important phylogenetic position just basal to the holometabolous radiation (Misof et al., 2014). It is an established lab organism, with many experimental protocols available (Chipman, 2017). In the project reported herein, we aim to find the earliest factors that initialize A-P polarization of the Oncopeltus embryos. We have focused on transcription factors, which are known to be key players in early polarization in many insects (Lynch, 2014). This is a first report from a broader project to find novel candidate genes in Oncopeltus. The general experimental setup is described, although we only discuss results relevant to early A-P axis determination. Our approach involves separating embryos into anterior and posterior halves and then quantifying expression levels of different transcripts in the two halves at different time points, either using real-time quantitative PCR (qPCR) or through comparative RNAseq. This enables us to show that there are no transcription factor encoding genes differentially expressed along the A-P axis before the formation of the blastoderm, but also highlights other factors, including mitochondrial localization, that could be relevant to the primary axis determination events in the Oncopeltus embryo.

2 Results

2.1 Real-time quantitative PCR

Our first analysis was real-time quantitative PCR for 12 genes for which we anticipated we would find a difference in the level of expression between the anterior and the posterior halves. We quantified the expression level of these genes in the two halves at four equally spaced time points before the formation of the embryonic blastoderm, which is when we expect the embryonic axes to be defined: 12, 16, 20 and 24 hours after egg laying (hAEL) at 25˚C. For a few of the genes we added an additional earlier time point at 6 hAEL. The genes we tested were: hunchback (hb), caudal (cad), nanos (nos), wingless (wg), tailless (tll), torso-like (tsl), even-skipped (eve), Delta (Dl), huckebein (hkb), orthodenticle (otd), giant (gt) and wnt8. These genes were selected based on asymmetric expression patterns in Oncopeltus at later stages, or based on their involvement in A-P patterning in other species (Lynch, 2014). The relative expression levels of these genes over time are shown in Figure 1. The most striking result is the fact that none of these genes (with the exception of tll – but see Discussion) are expressed asymmetrically at the earliest stages tested, and their asymmetrical expression only develops at later stages. In other words, none had significantly different expression levels between the anterior and posterior halves at 12 hAEL (Table 1). In the cases where we also looked at the 6 hAEL time-point (not shown), the results were similar to the 12 hAEL time-point.

Details are in the caption following the image

Real-time quantitative PCR of candidate genes expression in Oncopeltus fasciatus. The X axis represents the age of the embryos (in hours AEL). The Y axis represents expression relative to the reference gene on a logarithmic scale. Blue line and triangles represent the anterior (A) samples, red line and circles mark the posterior (P) samples

Table 1. Statistical analysis of the difference in expression levels of the studied genes between the anterior and posterior halves at 10 hours after egg laying
Sample size
Gene A P t-value P-value
gt 3 3 0.461 0.535
hb 5 6 0.045 0.836
nos 3 3 2.027 0.228
otd 5 6 0.548 0.478
Dl 5 6 0.058 0.815
eve 6 5 0.513 0.492
hkb 5 6 2.029 0.188
tsl 5 6 0.068 0.801
wnt8 3 3 0.004 0.956
tll 3 3 15.13 *0.018
wg 3 3 0.224 0.661
cad 5 6 0.338 0.575
  • Only tll (marked with an asterisk) is expressed at significantly different levels between the two halves.

2.2 RNAseq

To gain a broader understanding of differential gene expression along the A-P axis throughout embryogenesis in Oncopeltus, we carried out an RNAseq analysis, again comparing anterior and posterior halves of embryos. This is similar to the approach carried out in Drosophila by Ding and Lipshitz (1993b), where the anterior and posterior ends were dissected and sequenced. In the RNAseq analysis, we covered a broader time period, starting slightly earlier than the qPCR analysis and continuing into the late blastoderm stage: Our samples were at 10, 19, 26 and 30 hAEL. This approach expands upon the qPCR approach by comparing the full transcriptome and not only selected genes. The time-point were chosen to correspond with key stages in Oncopeltus blastoderm development: 10 hAEL is pre-blastoderm and a stage where the embryos is presumed to still be under maternal control. 19 hAEL is just before the earliest expression of gap genes – the earliest reported patterning genes (Ben-David and Chipman, 2010). 26 hAEL is the peak of the gap gene stage (Ben-David and Chipman, 2010). 30 hAEL is the point where several genes begin to be segmentally expressed and the growth zone is first defined (Stahi and Chipman, 2016). The outcome of this analysis was a list of differentially expressed transcripts (DETs) between the anterior and posterior halves at each time point, and between corresponding halves at each consecutive time point (Supplementary file 1). In addition, we extracted two subsamples from this list: one of transcription factor encoding genes (see details below) and one of dramatically differentially expressed transcripts (DDETs – transcripts with a difference of three orders of magnitude in expression level between two samples). The number of DETs and the number of the transcripts in each of the subsets for each comparison are shown in Table 2.

Table 2. Summary of the differentially expressed transcripts recovered from the RNA-Seq analysis, the TF BLAST analysis, and DDETs.
Comparison DETs DETs in TF blast analysis DDETs
10A-10P 349 12 5
19A-19P 362 17 14
26A-26P 2801 258 15
30A-30P 1415 146 15
10A-19A 9868 990 65
19A-26A 13966 1210 89
26A-30A 8508 783 27
10P-19P 8933 922 69
19P-26P 13541 1153 112
26P-30P 6392 584 21

An overview comparison of all of the samples indicates that time is the major differential factor. The number of differentially expressed transcripts as well as the Jensen-Shannon distances (a statistical model for measuring the similarity between two probability distributions) is higher in the temporal comparisons than in the spatial comparisons (Table 2, Figure 2).

Details are in the caption following the image

Distance matrix of the 8 samples. distances between all the samples from all ages (10, 19, 26, 30 hAEL) and anterior and posterior halves. The distances are measured in Jensen-Shannon distance units. The redder the color (and higher the number in the box), the greater the difference between the two compared samples, as presented in the heat map on the right

In order to focus on transcription factors (TFs), which are more likely to have a role in development, we carried out a BLAST analysis on the pool of DETs using 6160 isoforms of 786 TFs from Drosophila as queries. The list of TFs used as queries was based on GO (Gene Ontology) terms and probably includes some proteins that are not actually transcription factors. This dataset was used nonetheless, in order to lose as little data as possible, and the resulting list of genes was subjected to manual analysis, where irrelevant genes were screened out. The analysis highlighted 3.4%–10.3% of the transcripts in each comparison, many of those were afterwards screened out in the manual step.

Approximately 1% of the DETs were differentially expressed with a difference of three orders of magnitude or more (DDETs), for most of these we could not find any orthologs in other genomes using BLAST. Many others had hits to transposable elements of various kinds.

2.3 Validation

In the past years, we have looked at the expression of several Oncopeltus genes using RNA in situ hybridization. These previous results can serve as a validation for our current analyses. If the expression levels of a previously-studied gene, quantified using the qPCR and RNAseq approaches, are consistent with the qualitative expression pattern over time seen using whole mount staining, this provides support for the fact that the quantitative approaches are indeed correctly identifying the activity of these genes, and probably of the novel genes as well. For at least six genes our previous expression data are consistent with the new results (caudal, tailless, hunchback, giant, torsolike, and wingless). We show this in Figure 3 for two genes, one with an anterior expression pattern (tailless) and one with a posterior expression pattern (caudal).

Details are in the caption following the image

Validation of the quantitative results with whole mount expression patterns. (a) tailless, (b) caudal. For each of the two genes we present expression levels over time and corresponding images of RNA in situ hybridization on whole embryos. Blue – qPCR experiments shown in relative expression, normalized to the lowest value for that experiment (left y-axis). Red – RNA-seq expression levels in FPKM (fragments per kilobase of transcript per million fragments sequenced – right y-axis). Circles – anterior half. Triangles – posterior half. For tailless (a) expression is first detected weakly in the anterior shortly before 25 hAEL. Expression increases in the anterior but remains low in the posterior until about 33 hAEL when weak expression appears in the posterior (arrow in panel A4). For caudal (B), expression begins weakly in the posterior at around 20 hAEL and is distinct by 28 hAEL. Expression increases rapidly in level and in extent and by 30 hAEL extends into the anterior half. Shortly afterwards it retracts again, leading to the drop in expression levels seen at the 30hAEL RNAseq timepoint. Note that the timing of the embryos imaged could be off by up to 2 hours, leading to a slight discrepancy in timing

2.4 10A vs. 10P comparison

The earliest comparison we focused on was that of the anterior vs. posterior halves at 10 hAEL. This is the comparison we analyzed in greatest detail, and the only one we report results for in the current paper. Transcripts that are expressed differentially between the anterior and posterior at this early stage represent maternally deposited transcripts, and the embryo at this stage is still syncytial, allowing free diffusion of patterning molecules. This comparison yielded a total of 349 DETs. Of these, 12 were identified in the BLAST search as encoding transcription factors. Manual analysis of these genes showed only one of them to truly encode a TF. This was a paired paralog, probably closest to gooseberry (Supporting Information Fig. 1), and it showed a relatively low level of expression and relatively low degree of differentiation between the two sample. We identified 5 DDETs in this comparison. We were unable to identify any of these through BLAST to all standard databases or using HHPred (Soding et al., 2005)—a standard structure prediction algorithm.

2.5 Transcription factor domain search

Since we found almost no differentially expressed transcription factors in the 10-hour sample using the BLAST-based query, we performed a specific search for TF domains in all the DETs of the 10A-10P comparison, using the HMMer (Hidden Markov Model) algorithm as a verification. The search used the gDNA sequences of all 349 DETs in the 10A-10P comparison, and 1775 HMMs of TF motifs as an input. The search had hits for 41 genes out of 349 DETs in this comparison, for 43 different families/domains. Even though the Pfams (protein family/domain model; Finn et al. (2016)) were selected based on a search of the query “transcription factor”, several of the hits did not have any clear connection to TFs. Out of those hits, only six genes had matches to clear domains/families of TFs. Another gene had a very high e-value and so was discarded from the results. None of these 6 genes had expression levels (Table 3) similar to what we would expect for a polarizing factor, with the exception of the same paired paralog mentioned above.

Table 3. Transcription factor domain analysis results
Gene ID 10A 10P Blast2GO description Domain (HMM hit) name
XLOC 000529 1.15 2.73 Homeobox protein Homeobox domain
XLOC 009098 64.98 83.33 Transposase Myb-like DNA-binding domain
XLOC 010876 0.00 0.47 Transcription factor PAX- 'Paired box' domain
Homeodomain-like domain
XLOC 016232 28.59 36.34 Protein tis11 isoform x1 Zinc finger C-x8-C-x5-C-x3-H type (and similar)
XLOC 017821 1.36 2.85 HIV tat-specific factor 1 homolog RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)
XLOC 019122 14.73 19.11 Peroxisomal targeting signal 2 receptor WD40- WD domain, G-beta repeat
  • a Probable paralog of Paired.
  • Each line represents a gene with domain hit. Expression levels in FPKM units are shown for the 10 hAEL, anterior and posterior samples. The Best BLAST hit from the Blast2GO analysis and the domain name from HMM analysis are shown for each gene.

2.6 Enrichment analysis

An additional approach to finding biologically relevant differences in the various comparisons was carrying out enrichment analysis based on the prevalence of GO terms in differentially expressed genes. This approach looks for statistically significant differences in the distribution of putative functions of the genes, in order to shed some light on general biological processes taking place during the blastoderm stage. This analysis was performed for all the comparisons (Supporting Information File 3).

The only spatial comparison that had significantly enriched GO terms was 10A vs.10P. For the earliest comparison of 10A vs. 10P the one significant result was an enrichment of mitochondria (FDR corrected P-value = 4.7E-2) and oxidoreductase activity (FDR corrected P-value =9.0E-3) in the posterior half sample (Supporting Information File 3). This suggests that there is an increased level of mitochondrial activity, probably of maternal source, at the early stages of A-P axis determination.

Looking at general trends, we found an enrichment of terms related to transcription and transcriptional regulation starting at 19 hAEL (GO:0006351 – “transcription, DNA-templated” 10A vs. 19A: FDR corrected P-value =3.6E-; 10P vs. 19P: FDR corrected P-value =2.6E-4. GO:0006355 – “regulation of transcription, DNA-templated” 10A vs. 19A: FDR corrected P-value =2.5E-3; 10P vs. 19P: FDR corrected p-value =1.1E-3), and increasing at 26 hAEL (GO:0006351 – “transcription, DNA-templated” 19A vs. 26A: FDR corrected P-value =3.4E-7; 19P vs. 26P: FDR corrected P-value =8.7E-6. GO:0006355 – “regulation of transcription, DNA-templated” 19A vs. 26A: FDR corrected P-value =3.2E-; 19P vs. 26P: FDR corrected P-value =9.3E-4).

3 Discussion

3.1 Changes in transcription throughout the blastoderm stage

The preliminary analysis of differential transcript levels throughout the blastoderm stage highlights two phenomena. The first is that most of the differences are in the temporal comparisons rather than in the anterior vs. posterior comparisons. This was also supported by the lack of significant enrichment of GO terms on the spatial comparisons (aside from the early blastoderm, in which the important step of A-P axis polarization probably occurs). This result is not a surprising result. There are significant changes throughout development, as more and more cellular processes take place and the complexity of the embryo increases. Conversely, only a small number of the active transcripts during the blastoderm stage are involved in patterning, and most of them will be expressed at the same level throughout the embryo, with no A-P difference.

A second observation is that the most dramatic differences in sequential samples are between 19 hAEL and 26 hAEL (Figure 2). We suggest that these changes are an indication of the activation of zygotic transcription, and the maternal to zygotic transition (MZT). In Drosophila, the maternal to zygotic transition is suggested to occur in two waves of zygotic transcription- (1) early and minor, and (2) late and major (Pritchard and Schubiger, 1996; Tadros and Lipshitz, 2009). The GO-term enrichment analysis indicates that terms related to transcription and transcription regulation are enriched in the 19hAEL samples relatively to the 10hAEL samples, and in the 26hAEL samples relatively to the 19hAEL ones. It is thus possible that a first wave of zygotic transcription occurs between 10 and 19 hAEL, and a second, more significant wave occurs between 19 and 26 hours after the egg is deposited and embryogenesis begins, at the same time in which we see a large change in general transcription. This is consistent with gene expression studies that show the earliest expression of gap genes at around 19-20 hAEL (Ben-David and Chipman, 2010) on the one hand, and our qPCR results, which show the expression of many genes increasing significantly between 20 and 24 hAEL (Figure 1).

3.2 The involvement of transcription factors in early A-P polarization

Our working hypothesis was that, similarly to Drosophila and other studied insects [see Lynch (2014) for review], asymmetrically distributed maternal determinants would be common at the earliest developmental stages (Ding and Lipshitz, 1993a), and that at least some of these would be transcripts of TF encoding genes. In many holometabolous insects, initial patterning is determined by a network consisting of several localized maternal transcripts and proteins (e.g. hunchback, orthodenticle, caudal, nanos, Wnt), with interactions that vary between species. Although many of these genes have been shown to be asymmetrically expressed in Oncopeltus during blastoderm stages (Ben-David and Chipman, 2010; Birkan et al., 2011; Liu and Kaufman, 2004; Liu and Kaufman, 2005; Liu and Patel, 2010; Stahi and Chipman, 2016; Weisbrod et al., 2013), there was no evidence for their asymmetrical activity very early in development, when the embryo is still under maternal control.

Our working hypothesis has not been supported by our results, and we have found no convincing evidence for such polarized TF encoding genes. Although some TFs were found among the DETs, none had the sharp expression differences between the A and P poles we would expect from a key symmetry breaking gene (Table 3). The most significant difference we found was in a prd paralog, which is expressed at a very low level in the posterior and is absent from the anterior. In all other cases, the ratio between the expression level in the two halves is 3:2 or lower, and the absolute expression levels are very low. The near absence of differentially expressed TFs in the early blastoderm is also evident from the gene-specific analysis of the real-time quantitative PCR (qPCR) experiments. Notably, tll, the only gene that did have significantly different expression levels, was not identified as a DET in the RNAseq analysis. The expression levels found for this gene in the qPCR experiments were very low and close to the detection limit for this method. Among the transcripts that had dramatic differences in expression level (DDETs), none were recognized as TFs, nor were any TF motifs identified in them. Indeed, none of these transcripts could be identified using any of the approaches we tried.

In the hemipteran pea aphid, Acyrthosiphon pisum, the closest relative of Oncopeltus for which there are data, the only differentially-loaded maternal factor that has been found is hunchback (Huang et al., 2010). Intriguingly, caudal is apparently not maternally deposited in this species (Chang et al., 2013). However, it has been demonstrated that there are differences between the developmental program of oviparous (sexual) and viviparous (a-sexual) Acyrthosiphon embryos, due to different selective pressures in different embryonic environments (Duncan et al., 2013). Thus, it is not clear whether Acyrthosiphon can be useful for an evolutionary comparison.

There are several possible explanations for the absence (or near absence) of polarizing TFs in the early blastoderm. One such explanation is that polarization is achieved by weak gradients of maternal transcripts. This would account for the minor difference in expression levels we have found for a small number of transcripts. While this is a distinct possibility, we suspect that, at least in some cases, the differences we detect at these low expression levels are within the error range of the RNAseq approach, and we would tend to not attach too much importance to them. Our skepticism is mostly based on the fact that many of the expected polarizing factors (e.g. nanos, evenskipped, caudal, hunchback) are expressed differentially at later stages, as evidenced by our qPCR results, and shown in previous reports based on gene expression staining (Ben-David and Chipman, 2010; Birkan et al., 2011; Liu and Kaufman, 2004; Weisbrod et al., 2013).

A second explanation would be that the polarizing factors are deposited not as mRNA transcripts but as proteins. If this were the case, our analysis would not detect these factors. This interpretation is again questioned by the differential expression of mRNA transcripts later in development. Considering the fact that from what is known in other insects, maternal patterning relies on localization of both transcripts and proteins, it would be surprising that Oncopeltus has specifically lost the set of A-P polarizing maternal transcripts at early stages, but not the proteins.

So, if TFs are not inducing the initial polarization, what are the polarizing maternal factors? As discussed above, there were 5 transcripts that were found to be DDETs in the 10A-10P comparison, but none of them had a significant and clear hit with any of the tools we used. It is possible that these transcripts are non-coding RNAs that have a downstream role in localizing maternally deposited transcripts, which are initially deposited uniformly. Late localization of uniformly deposited transcripts could also rely on structural elements in the egg, possibly through cytoskeletal components.

3.3 Asymmetrical mitochondrial activity

The most noticeable outcome of the enrichment analysis in the 10A-10P comparison, was the enrichment of mitochondria and ATP synthesis in the posterior (Supporting Information File 3). Intriguingly, some of the polarized mRNAs uncovered by Ding and Lipshitz (1993b) in their screen were also linked with mitochondrial activity. Axis specification via mitochondrial localization and redox signaling are quite common in metazoan embryos. For example, Aggregation of mitochondria was associated with early cleavages in ascidians, Drosophila and nematodes. It has been suggested that redox signaling was co-opted for regulation of patterning [see Coffman and Denegre, (2007) for a review]. Little is known about this phenomenon in insects. However, it is possible that this mitochondrial enrichment is linked with the asymmetrical structural components hypothesized above, and could be part of the mechanism localizing maternal transcripts at later blastoderm stages.

The uncovering of asymmetrical mitochondrial activity emphasizes the effectiveness of the RNAseq approach presented in this work. The purpose of the use of RNAseq data was to learn about the early development of Oncopeltus without relying on the Drosophila paradigm, and looking “away from the lamp post”. This finding of mitochondrial enrichment in the posterior was not based on any expectations from Drosophila or other previous data, and opens the door for a new series of future studies.

4 Conclusions

The combination of the qPCR and the RNAseq approach has proven useful for breaking away from the “Drosophila paradigm” and for opening new avenues for research in Oncopeltus fasciatus, an emerging developmental model. The main findings are the lack of localized transcription factor mRNAs at the pre-blastoderm stages of development and an asymmetrical distribution of mitochondrial activity at these stages. The preliminary analysis of comparisons at later stages suggests a candidate time-window for the MZT. Further work on the large dataset generated in this project (Supporting Information File 1) will uncover additional candidate genes for further work on blastoderm patterning in Oncopeltus and potentially in other species. Further detailed analysis of the comparisons aside from 10A vs. 10P is ongoing. At the time of this writing, work in our lab has started characterizing some promising genes that have come out of this analysis. The coming years are set to provide many interesting insights into the evolution of early insect development.

5 Methods

5.1 Animal husbandry and embryo collection

Cultures of Oncopeltus fasciatus were kept as previously described (Ben-David and Chipman, 2010; Weisbrod et al., 2013) at 25˚C with a 14/10h light/dark cycle. Embryos were collected in a 2-hour time window and then kept in an incubator until the desired age.

5.2 Material collection

For both the qPCR and for the RNAseq, embryos were collected and cut in a similar way. Embryos at the desired age were collected into petri dishes that were pre-chilled and embedded in dry ice. The embryos froze immediately upon contact with the plate. Frozen embryos were stored at −80˚C until processing.

Embryos were cut in two using a microsurgical knife (Oasis – Glendora, California). Cutting was done on a pre-cooled metal block on dry ice, under a dissecting scope. The anterior was identified by the distinctive respiratory organs (micropiles) at that end. Anterior and posterior halves from 10-15 embryos were pooled for each sample.

5.3 Real-time quantitative PCR

Total RNA was extracted from pooled half-embryos using a ZR RNA MicroPrep (Zymo Cat #R1060). The RNA was eluted with 30µl of buffer. cDNA was synthesized using Bioscript reverse transcriptase (Bio Line, bio-27036) with a poly-T primer. About 5 or 8ng/µl RNA were used in the reaction.

Quantitative PCR was carried out on an ABI PRISM 7000. For each of the 12 genes we used a gene-specific primer (Supplementary file 2). Primers were chosen to give a ∼100 bp product, and were pre-tested to confirm that they gave a single band, and calibrated using three different primer concentrations (0.1µM, 0.3µM, 0.5µM). Each primer was tested with template concentrations covering 4 orders of magnitude and all were found to give consistent results. The reference gene used was actin, and was chosen following a pre-screen of several candidates, where it gave the most consistent results over different embryonic stages.

The experimental setup consisted of three technical repeats of each sample. Each 96-well plate included up to three target genes and the reference gene, as well as a no-template control for each gene. Eight different templates were used (4 time points × two halves). The same template was used for all genes on a given plate. There were at least three repeats of each experiment using separate cDNA preparations.

Results were calculated in the following way: The ct (number of cycles to crossing a defined value threshold) was averaged for the technical triplicate experiments. We then subtracted this value for each gene from the average ct value of the reference gene, to give the expression difference (dct) for each biological repeat. The dct for the biological repeats was also averaged. The results in Figure 1 are presented as 2 to the negative power of the average dct. this gives the actual ratio between the tested gene and the reference gene.

We carried out a t-test analysis using the SPSS software package, to test whether the expression of a given gene was significantly different between two samples. This analysis was only done for the anterior vs. posterior of the 12 hAEL samples, so see if there was a significant difference between the anterior and posterior at the earliest embryonic stages.

5.4 RNAseq sequencing and analysis

Samples were collected from embryos of four age groups- 10, 19, 26, 30 hours after egg laying (hAEL ± 1 hour). In total, there were eight samples- 4 ages × 2 halves. Each sample had three or six repeats, with each repeat including about 10–15 half-embryos. Spike-In sequences (Life Technologies - ERCC RNA Spike-In Control Mixes; Mix 1) were added before library preparation. Poly-A enriched transcriptome libraries were prepared by the Silberman Institute of Life Sciences' center for genomic technologies using the Illumina TruSeq RNA library preparation kit (Illumina #RS-122-2001), according to the manufacturer's recommended protocol, starting with around 100–200 ng of total RNA per sample. Each sample was barcoded using a unique sequence tag. The amplified indexed libraries were quantified using an Invitrogen Qubit fluorometer and pooled. Pooled libraries were run on a 4% agarose gel and DNA around 270 bp (the length of RNA inserts plus the 3′ and 5′ adaptors) was size selected and recovered in 15 μL elution buffer (QIAGEN). Size selected libraries were then quantified again using the Qubit Fluorometer. Size was verified using the High Sensitive DNA gels on Agilent 2200 TapeStation instrument. The sequencing was done at the Technion Genome Center with a HighSeq instrument using the NextSeq 500 High Output V2 sequencing Kit (FC-404-2005), in a paired-end configuration, reading 50 bases from each direction.

The raw reads were uploaded to WebApollo for the Oncopeltus genome (https://i5k.nal.usda.gov/JBrowse-oncfas) as an evidence track.

Quality control of sequenced fragments (“reads”), and mapping of the reads to the genome, assembly of transcripts, differential analysis and statistical analysis was done at the bioinformatics unit of the Hebrew University Medical School. Raw reads (fastq files) were inspected for quality issues with FastQC (v0.10.1, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). According to the FastQC report, the 51st position was ‘N’ in all reads, thus it was trimmed using the fastx_trimmer program of the FASTX package (version 0.0.13, http://hannonlab.cshl.edu/fastx_toolkit/). Then, reads were quality-trimmed at their 3' end, using an in-house Perl script, with a quality threshold of 30. In short, the script uses a sliding window of 5 bases from the read's end and trims one base at a time until the average quality of the window passes the given threshold. Reads that became shorter than 15 bases were discarded. The remaining reads were further filtered to remove low quality reads, using the fastq_quality_filter program of the FASTX package, with a quality threshold of 25 at 95 percent or more of the read's positions.

The processed fastq files were mapped to the Oncopeltus transcriptome and genome using TopHat (Kim et al., 2013) (v2.0.11). The genome, at scaffolds stage, was taken from HGSC at Baylor College of Medicine (ftp://ftp.hgsc.bcm.edu/I5K-pilot/Milkweed_bug/), the I5K-pilot version, with annotations version 0.5.3. Mapping allowed up to 3 mismatches per read, a maximum gap of 3 bases, and a total edit distance of 6. Due to possibly missing annotations, an attempt to find new exons and junctions was made with both the –microexon-search and the –coverage-search options (full command: tophat -G OFAS.Models.gff3 -N 3 –read-gap-length 3 –read-edit-dist 6 -a 5 –segment-length 18 –microexon-search –coverage-search –read-realign-edit-dist 4 –b2-L 5 –b2-i S,1,0.75 –b2-mp 3,1 –b2-score-min L,-0.5,-0.5 –b2-rdg 3,2 genome processed.fastq).

For further analysis, quantification, normalization and differential expression were done with the Cufflinks (Trapnell et al., 2010, 2012, 2013) package (v2.2.0). First, uniquely mapped reads were used to assemble transcripts for each sample separately, using cufflinks (command: cufflinks -g OFAS.Models.gff3 -b genome -u accepted_hits_uniq.bam). Then, all transcripts were merged to a single annotations file using cuffmerge (command: cuffmerge -g OFAS.Models.gff3 gtfs.txt, where gtfs.txt contains the locations of all individual GTF files that were created with cufflinks). Quantification was then done with cuffquant, using the genome bias correction (-b parameter), and for uniquely mapped reads only. The GTF file that was created by cuffmerge was used for quantification.

Normalization and differential expression were calculated with cuffdiff, using the cuffmerge GTF file, a count threshold (-c parameter) of at least 3 for statistical significance testing, and requiring two replicates for testing relative isoform shift (–min-reps-for-js-test parameter). Several quality control assays, such as counts and FPKM distributions, as well as distance and MDS analyses, were calculated and visualized in R, using the cummeRbund package (Goff et al., 2013) and in-house scripts. Differential expression results were also visualized in R.

In addition, investigation of the amount of each ERCC spike-in sequence in the samples was done by mapping the processed reads to the spike-in sequences using bowtie, version 2.2.1.0 (command: bowtie2 -L 5 -i S,1,0.75 –mp 3,1 –score-min L,-5,-0.2 -x spike-ins -U processed.fastq), then either counting reads that mapped with less than 10% edit distance, or simply counting by running cuffquant and cuffnorm (version 2.2.1).

The differential analysis was done for every age group by comparing the A sample vs. the P sample, and for each sequential sample of the A or P, for a total of 10 comparisons. The summary of those comparisons is organized in tables of the gene expression levels (in units of FPKM – expected fragments per kilobase of transcript per million fragments sequenced), and other statistical parameters for each gene in all the 10 comparisons (Supplementary File 1).

For each pairwise comparison we used Cufflinks to calculate the Jensen-Shannon (JS) distance, which is the square root of the Jensen-Shannon divergence – a statistical tool for quantifying the distance between two probability distributions.

5.5 Transcript identification, Blast2GO, and enrichment analysis

Blast2GO (B2G) is a program that provides a pipeline for analysis and GO-term annotation of large gene sets (Conesa et al., 2005). The program accepts fasta files of the sequences in question, runs BLAST and maps the BLAST hits to GO terms associated with them.

In this analysis fasta files containing the genomic DNA sequence for each DET in the comparison were created (using python script “make_fasta_of_gdna.py”) and separated according to the samples in which they were upregulated (based on the expression levels).

The BLAST algorithm used was blastx (search protein databases using a translated nucleotide query) using the CloudBLAST service. The search was limited to Arthropoda nr database, e-value lower than 1.0e-3, word size 3, HSP length cutoff of 33 characters. These BLAST results, combined with the TF local BLAST results, were used to filter and annotate the list of genes.

B2G was then used for enrichment analysis to identify specific GO-terms that are over or under represented in any of the comparisons, with a Fisher's exact test and FDR correction (False Discovery Rate - a statistical correction of the P-value for designs that include multiple testings) (Blüthgen et al., 2005).

5.6 Transcription factor domain analysis

Transcription Factor domain analysis was done using the HMMsearch algorithm from the HMMer package (http://hmmer.org/), on both the gDNA and the CDSs (Coding DNA Sequence) of the all the significant genes (349 genes in total) in the 10A-10P comparison. This algorithm takes as an input an HMM model for a specific domain and a protein sequence as a query. The gDNA sequences were retrieved and translated to their 6 possible translations using the script “nt_to_aa_fasta_try3.py”. The CDS sequences were retrieved and translated to their 6 possible translations using the script “get_seq_fun.py” that used the CDS sequences of the Oncopeltus official gene set as first priority, other gene models (Augustus, snap) as second priority, and gDNA if there were no gene models that matched to the location of the gene. All the HMM models (1775 models in total) were found based on a search of the term “transcription factor” in the Pfam database (Finn et al., 2016), and then downloaded using the python script “fetching_hmm_from_http.py”. Afterwards a HMM search was done using the script “run_hmmsearch.py”, and the results were summarized and ordered into an excel file according to the Oncopeltus gene it matched, using the script “sum_hmmsearch_res_to_rnaseq_very_final.py”. All the scripts can be found at https://github.com/netaginzburg/rna-seq.

Acknowledgments

The authors thank Michal Bronstein technical advice and for building the libraries. They also thank Sharona Elgavish and Yuval Nevo for numerous discussions and preliminary analysis. They are grateful to Barbara Vreede for help and advice throughput this project and for her invaluable expertise in coding. The authors also thank Sivan Ginzburg for his help in coding and other computational matters. Two anonymous reviewers helped focus the manuscript and suggested some important changes. This work was supported by ISF grant #75/11.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.