NUCLEAR MONOPLOIDY AND ASEXUAL PROPAGATION OF NANNOCHLOROPSIS OCEANICA (EUSTIGMATOPHYCEAE) AS REVEALED BY ITS GENOME SEQUENCE1
Received 13 January 2011. Accepted 11 May 2011.
This article was published online on September 23, 2011. A spelling error in the title was subsequently identified. This notice is included in the online and print versions to indicate that both have been corrected September 29, 2011.
Abstract
Species in genus Nannochloropsis are promising candidates for both biofuel and biomass production due to their ability to accumulate rich fatty acids and grow fast; however, their sexual reproduction has not been studied. It is clear that the construction of their metabolic pathways, such as that of polyunsaturated fatty acid (PUFA) biosynthesis, and understanding of their biological characteristics, such as nuclear ploidy and reproductive strategy, will certainly facilitate their genetic improvement through gene engineering and mutation and clonal expansion. In this study, the genome of N. oceanica S. Suda et Miyashita was sequenced with the next-generation Illumina GA sequencing technologies. The genome was ∼30 Mb in size, which contained 11,129 protein-encoding genes. Of them, 59.65% were annotated by aligning with those in diverse protein databases, and 29.68% were assigned at least one function described in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Less frequent polymorphic nucleotides (one in 22.06 kb) and the obvious deviation from 1:1 (major:minor, minor ≥10) expectation indicated the nuclear monoploidy of N. oceanica. The lack of the majority of meiosis-specific proteins implied the asexual reproduction of this alga. In combination, the nuclear monoploidy and asexual propagation led us to favor the hypothesis that N. oceanica was a premeiotic or ameiotic alga. In addition, sequence similarity-based searching identified the elongase- and desaturase-encoding genes involved in the biosynthesis of long-chain PUFAs, which provided the genetic basis of its rich content of eicosapentaenoic acid (EPA). The functional genes and their metabolic pathways profiled against its genome sequence will facilitate its integrative investigations.
Abbreviations:
-
- CTAB
-
- cetyl trimethylammonium bromide
-
- EPA
-
- eicosapentaenoic acid
-
- HMM
-
- hidden Markov model
-
- KEGG
-
- Kyoto encyclopedia of genes and genomes
-
- PUFA
-
- polyunsaturated fatty acids
The microalgal genus Nannochloropsis in class Eustigmatophyceae consists of ∼6 species currently recognized as N. gaditana, N. granulata, N. limnetica, N. oceanica, N. oculata, and N. salina (Hibberd 1981). They have mostly been known from the marine environment but also occur in fresh and brackish waters (Fawley and Fawley 2007). All the species are small and nonmotile spheres, which have a very simple ultrastructure and do not express any distinct morphological features; therefore, their characterization is mostly done by rbcL and 18S rDNA sequence-based phylogenetic analyses (Andersen et al. 1998). Nannochloropsis used to be known as the marine Chlorella; however, it is different from Chlorella in that it does not contain chls other than chl a (Hibberd 1988, Fisher et al. 1998). Nannochloropsis is generally described as a component of the picoeukaryotic plankton since the sizes of organisms in this genus range from 2 to 5 μm (Hu and Gao 2003). They thus stand at the beginning of the food chain in aquatic ecosystems, playing significant roles in the global carbon and mineral cycles, especially in oligotrophic seawaters (Fogg 1995). Nannochloropsis is able to build up a high concentration of a range of pigments, such as astaxanthin, zeaxanthin, and canthaxanthin (Antia and Cheng 1982, Lubian et al. 2000). Furthermore, these microalgae are widely used for feeding fish larvae and rotifers, as they contain highly nutritional compounds, such as sterols (Veron et al. 1998) and long-chain PUFAs (Sukenik et al. 1989, Rocha et al. 2003). The fatty acids, especially EPA, are also the reason for the good growth of fish fry fed on these algae (Zittelli et al. 1999). In recent years, Nannochloropsis has been considered a promising alga for both biofuel and biomass production because of its ability to accumulate high levels of fatty acids and to grow fast (Gouveia and Oliveira 2009, Rodolfi et al. 2009).
Many microalgae have been cultured on large scales; however, their successful culturing is mainly due to their unique biological characteristics, such as fast growth and salt and alkali tolerances. In contrast, the yield of diverse terrestrial crops has been dramatically increased through domestication and breeding. The reproductive strategy of microalgae is either unclear or unmanageable; therefore, genetic breeding comparable with terrestrial crops has not been initiated in microalgae. Asexually reproductive microalgal species are desirable for breeding; the elite clones can be easily evaluated and maintained in continuous culture. Mutated genes of Nannochloropsis trended toward the recessive nuclear ones, implying their haploidy (Galloway 1990); however, their reproductive strategy remained unclear.
Owing to promising numerous products stretching from straightforward biomass for food and animal feed to valuable extracts including triglycerides that can be converted into biodiesel, microalgal biotechnology has recently emerged into the limelight (Harun et al. 2010). It is certain that a fuller description of functional genes of Nannochloropsis will facilitate their genetic manipulation and breeding. For example, the nutritional value and the high content of fatty acids make Nannochloropsis well appreciated for feeding rotifers and fish in hatcheries and for serving as the source of algal oil. Therefore, both engineering and mutational modifications of their fatty acid biosynthesizing pathways, especially long-chain PUFA biosynthesis, are commercially important. In addition, the reconstruction of their metabolic networks will significantly promote the accumulation of their valuable components (Fu 2009, Hyduke and Palsson 2010).
The development of next-generation sequencing technologies has catalyzed the delivery of fast, inexpensive, and accurate genome information of a wide range of species (Metzker 2010). The genome size of Nannochloropsis has been estimated to be small (Veldhuis et al. 1997). The expressed sequence tag analysis and proteomic analysis have been tried in Nannochloropsis (Kim et al. 2005, Shi et al. 2008). To promote the biotechnological advancement of Nannochloropsis and its breeding, we sequenced the genome of N. oceanica with next-generation Illumina GA sequencing technologies in this study, by which its key biological characteristics including its nuclear ploidy and reproductive strategy were revealed.
Materials And Methods
N. oceanica and its culturing. The original Nannochloropsis sp. was provided by the Key Laboratory of Mariculture of Chinese Ministry of Education affiliated with Ocean University of China. Seawater near Qingdao shore was filtrated with a membrane with pores of 0.4 μm in diameter, autoclaved at 121°C for 30 min, and used for preparing f/2 medium (pH 7.8, salinity 30) (Guillard and Ryther 1962, Guillard 1975). The solid medium was prepared by melting 1.2% (w/w) agar into f/2. The alga was cultured at 23 ± 1°C and a continuous irradiation of 70 μmol photons · s−1 · m−2 at a rhythm of 12:12 light:dark (L:D) dark. The alga was purified twice by streaking the cells (1 × 103 cells · mL− 1 in liquid medium) onto the solid plate medium containing 100 μg · mL−1 ampicillin. When algal colonies appeared in 13 d, a single colony was inoculated into 5 mL of liquid f/2 medium containing 200 μg · mL−1 ampicillin. The culture was then amplified into 100 mL and verified as being bacteria free by evenly inoculating the amplified cells (1 × 103 cells · mL− 1 adjusted with autoclaved liquid medium) onto solid medium without antibiotics. The purified line was identified as N. oceanica with the phylogenetic analysis of its 18S rDNA, partial rbcL gene, and internal transcribed spacer (ITS) region sequences (HQ201712–HQ201714).
DNA extraction. Algal cells were collected at the exponential growth phase with centrifugation (3k30, Sigma, Osterode, Germany). The cell pellet was washed three times with polysaccharide elimination buffer (Wang et al. 2005). DNA was extracted with cetyl trimethylammonium bromide (CTAB) method described previously (Chen et al. 2001). Conventional phenol extraction was applied to purify the DNA with the quality of DNA checked by digesting with Hind III, EcoR V, and EcoR I and separating in 0.8% agarose gel.
Library construction and sequencing. The DNA libraries were constructed with the Paired-ends Library Preparation Kit (Illumina Inc., San Diego, CA, USA) following manufacturer’s instructions. Two libraries with insert sizes of 500 bp and 2 kb, respectively, were constructed. Methods for DNA manipulation, including formation of single-molecule arrays, cluster growth, and paired-end sequencing, were performed following the standard protocols available from Illumina Inc. and on Illumina GA sequencer as were described by Boodhun et al. (2008) and widely practiced at Beijing Genomics Institute (BGI) at Shenzhen. The base-calling pipeline (SolexaPipeline-v1.7) (Sequencing Control Software, Illumina Inc.) was used to process the raw fluorescent images and call sequences.
De novo assembling. Those paired-end reads with low quality were filtered ahead of assembling. The filtered reads also included those with low complexity, which contained either a preset proportion of nondetermined or low-quality bases or a preset length of overlap between adapter and reads, the duplicated and the contaminated. The high-quality reads were assembled into scaffolds with SOAPdenovo (Li et al. 2010). To remove the possible flaws, the quality reads were assigned back to the assembly with BWA software (http://bio-bwa.sourceforge.net/index.shtml) with the suspected boxes (<the length of a read) replaced with the sequence of the reads accounting for ≥80% of the total (>20 in number).
Polymorphic nucleotides searching. Polymorphic nucleotides were searched by aligning the qualified reads (with a quality score of >20) to the assembly with SOAP2 (http://soap.genomics.org.cn/soapaligner.html). Once all reads were assigned and the nucleotides leveraging across reads and the gaps in the assembly were removed, the numbers of the most abundant or major (n1) and the second most abundant or minor (n2) nucleotides at each polymorphic site were counted according to the number of reads in which they were located. Of the sites covered by >10 nucleotides (n1 + n2, n2 > 2), the heterozygous ones covered by either >10 n2 or 3–9 n2 but within the expectation of 1n1:1n2 (P > 0.01) were considered as polymorphic.
A further correction of the sequence assembly was made at this step by replacing a minor nucleotide in the assembly with the major one when the number of reads at which the major nucleotide located was >10.
Gene prediction and function annotation. Based on the final assembly, messenger RNAs were predicted using hidden Markov model (HMM)–based GeneMark-ES version 2.3a (Ter-Hovhannisyan et al. 2008) with those encoding peptides shorter than 30 amino acid residues in length discarded. The yielded mRNAs were annotated by aligning with the known functional genes in the protein knowledgebase SwissProt and its computer annotated supplement TrEMBL (http://www.uniprot.org/downloads, release 56.1), GO (http://www.geneontology.org/GO.downloads.database.shtml), KEGG (ftp://ftp.genome.jp/pub/kegg/, release 46), and National Center for Biotechnology Information (NCBI)–HR (ftp://ftp.ncbi.nih.gov/blast/db/, 20100313) databases under the criteria of e-value ≤1 e−5 and identity ≥40. The rRNA and tRNA were identified by referring to rRNA (Pruesse et al. 2007) and tRNA (http://gtrnadb.ucsc.edu/) databases with rRNAmmer (Lagesen et al. 2007) and tRNAscan-SE, respectively (Lowe and Eddy 1997). Other noncoding RNAs, including miRNA, sRNA, and snRNA, were identified by referring to Rfam database (Griffiths-Jones et al. 2005) with Infernal software (http://infernal.janelia.org/) with default parameters. Transposons were determined by either referring to Repbase transposable elements library (Jurka et al. 2005) or aligning the genome sequence to the curated transposable element related proteins with RepeatMasker and RepeatProteinMasker (http://www.RepeatMasker.org/), respectively. Tandem repeats were predicted using TRF (tandem repeat finder) software (Benson 1999).
To remove the polymorphic nucleotides within the repeated region, the predicted repeats were searched against the assembly. Once a polymorphic nucleotide was identified within a match (identity >95% and length >50 bp), it was deleted from the polymorphic nucleotides list.
Interesting proteins searching. The amino acid sequences of the interesting proteins well defined previously were downloaded and used as the queries searching the predicted proteins. Among the matched (e ≤ 1 e−5, identity≥40), the best match was considered as the ortholog of a queried protein.
Results
Genome size. Paired-end sequencing reads (73 bp at one end and 75 bp at the other) of the fragments of two libraries (0.5 kb and 2 kb fragments, respectively) generated >4 Gb raw sequence in total. From each quality, 73 bp reads of 0.5 kb library, 17mers were framed out with the number (depth) and the abundance (proportion to the total) of each kind calculated. The abundance of 17mers in different depths should be in a Poisson distribution; however, such a distribution may change due to diverse causes, such as sequencing errors, genomic heterozygosity, and chromosomal fragment repeat. It was found that the abundance of 17mers with depths ranging from seven to 67 was in Poisson distribution, and the depth of the most abundant 17mers was ∼37. Accordingly, the genome size was estimated to be ∼30 Mb. The raw sequence covered 139 times of the estimated genome of N. oceanica. The raw reads were assembled into 886 scaffolds with a total length of 29.5 Mb and a scaffold N50 value of 294.5 kb (Table 1). Raw sequences have been deposited into the NCBI Sequence Read Archive with accession number SRA027386.
Scaffold | Contig | Predicted mRNA | |
---|---|---|---|
Total number | 886 | 8,102 | 11,129 |
Total length (bp) | 29,517,250 | 27,923,828 | 14,964,079 |
Total gap length (bp) | 1,584,442 | – | – |
Average length (bp) | 33,315 | 3,448 | 1,345 |
N50 length (bp) | 294,498 | 12,209 | 1,926 |
N90 length (bp) | 21,227 | 2,156 | 699 |
Maximum length (bp) | 918,621 | 110,256 | 19,923 |
GC content (%) | 53.59 | 53.59 | 56.73 |
Polymorphic nucleotides. In total, 1,360 polymorphic nucleotides were identified in the genome of N. oceanica (diverse repeats not included). The density of polymorphic nucleotide was one in ∼22.06 kb if they were evenly distributed, obviously lower than that in other genomes. In the scattering plot (Fig. 1, A and B), it was found that the ratios of the major to minor (≥10) nucleotides each polymorphic locus obviously deviated from 1:1, the ratio expected in diploid organisms. A clue could be drawn here that N. oceanica is a monoploid alga, namely, its rareness of polymorphic nucleotides and inequality of major and minor nucleotides at each polymorphic locus (n2 ≥ 10). The polymorphic nucleotides detected may originate from natural mutations and clonal expansion in continuous culture.

The scattering plot of the numbers of major and minor nucleotides at each polymorphic locus: (A) all polymorphic loci; (B) only those loci where the major nucleotide was <200 were shown.
Functional genes and metabolic network. From the assembly, 11,129 mRNAs (genes) were predicted. The length of predicted genes was 1,345 bp on average. The protein-encoding regions accounted for ∼50.69% of the total scaffold length. The assembly and the predicted genes have been deposited at DNA Data Bank of Japan (DDBJ)/European Molecular Biology Laboratory (EMBL)/GenBank under the accession AEUM00000000. The version described in this article is the first version AEUM01000000. In total, 6,639 predicted genes were annotated by searching the proteins in diverse databases, accounting for 59.65% of the total (Table 2). Of the annotated 3,303 genes (29.68% of the total) matched with those deposited in KEGG database, each was hit by at least one function (Table 3) defined in diverse metabolic pathways like glycolysis, fatty acid biosynthesis, and photosynthesis. The annotated included also six miRNAs, three snRNAs, and 86 tRNAs; and one 18S, one 28S, and seven 8S RNAs.
Database | Number of annotated proteins |
---|---|
Nr | 5,069 |
Swissprot | 5,621 |
TrEMBL | 5,041 |
GO | 4,523 |
KEGG | 3,303 |
Total | 6,639 |
Functional category | Hits |
---|---|
Cellular processes | |
Behavior | 1 |
Cell communication | 78 |
Cell growth and death | 155 |
Cell motility | 27 |
Development | 21 |
Endocrine system | 148 |
Immune system | 61 |
Nervous system | 34 |
Sensory system | 7 |
Environmental information processing | |
Membrane transport | 98 |
Signal transduction | 195 |
Signaling molecules and interaction | 12 |
Genetic information processing | |
Folding, sorting, and degradation | 209 |
Replication and repair | 200 |
Transcription | 54 |
Translation | 247 |
Human diseases | |
Cancers | 186 |
Immune disorders | 22 |
Infectious diseases | 26 |
Metabolic disorders | 27 |
Metabolism | |
Neurodegenerative diseases | 87 |
Amino acid metabolism | 499 |
Biosynthesis of polyketides and nonribosomal peptides | 15 |
Biosynthesis of secondary metabolites | 97 |
Carbohydrate metabolism | 441 |
Energy metabolism | 199 |
Glycan biosynthesis and metabolism | 108 |
Lipid metabolism | 280 |
Metabolism of cofactors and vitamins | 226 |
Metabolism of other amino acids | 122 |
Nucleotide metabolism | 179 |
Xenobiotics biodegradation and metabolism | 179 |
Protein families | |
Cellular processes and signaling | 198 |
Genetic information processing | 875 |
Metabolism | 338 |
It was unexpected that a certain portion of predicted genes were assigned functions involved in human diseases including cancers and immune, infectious, and metabolic disorders (Table 3). It was obvious that these assignments were not reasonable; these functions absolutely are not the characteristics of N. oceanica, a single-cell microalga. It was also noted that these genes were assigned multiple functions.
A protein should function in a species in which its function is defined; however, it may have an ancient origin in evolution and function differentially, across species. We found that those proteins of N. oceanica assigned functions involved in human diseases also function in genetic information processing and metabolism and cellular processes in most cases, indicating that their function can be traced back to the early stage of evolution. Domazet-Lošo and Tautz (2010) found two strong booming time points of cancer proteins, one at the origin of cellular structure and the other around the formation of multicellular organisms. These proteins are involved in cancer formation in humans but perform more general functions of maintaining genome stability (caretakers) and cellular signaling and growth processes (gatekeepers) in evolutionarily primary organisms. We searched the predicted proteins of N. oceanica using well-defined cancer genes (Futreal et al. 2004) as queries and found 45 orthologs (P < 3.00 e−25). The alignments of these proteins with those of mammals showed the existence of homologous domain(s), although the sequences themselves are diverse in length. The variation in sequence may have allowed the proteins to function differentially among species. Rather than functioning in cancers, these genes should function as the maintainers of cellular stability and processors of genetic information in N. oceanica. The biological characteristics of a species should be integrated into the annotation and function assignment of its genes.
Asexual reproduction. The nuclear ploidy and reproductive strategy of N. oceanica remained as puzzles in the past. It is critical to the breeding of this alga to know its reproductive strategy; mutation and elite clone expansion could be very effective ways of breeding if it is asexual. Meiosis is absolutely required for sexual reproduction. Nearly all sexual eukaryotes undergo meiosis with which some unique characteristics are associated (e.g., recombination, synapsis). As proposed by Schurko and Logsdon (2008), the presence of multiple genes required specifically for meiosis in a genome, in particular, those for recombination, formation of synaptonemal complexes, and chromosome cohesion, is a positive indication that the organism is capable of meiosis and, implicitly, sexual reproduction. In contrast, the absence of the majority, or all, of these genes would be consistent with asexuality. To verify the lack of meiosis in this alga, the meiosis-specific proteins well defined by Ramesh et al. (2005) and Malik et al. (2008), including Spo11, Hop1, Mnd1, Dmc1, Msh4, Msh5, Mer3, and Rec8 identified in diverse eukaryotes, were downloaded and used as queries to search the predicted proteins of N. oceanica. It was determined that these proteins tightly matched only two predicted proteins, Noc_GME3248 and Noc_GME7986 (0 < P < 7.00 e−29), the latter involved in DNA mismatch repairing. We believe that N. oceanica is not capable of performing meiosis using near-universal meiotic machinery and accordingly reproducing asexually.
Biosynthesis of long-chain PUFAs. It is known that N. oceanica synthesizes rich EPA. In total, 280 lipid metabolism functions hit the predicted proteins; however, the pathway of long-chain PUFA synthesis was not clearly revealed. To elucidate the mechanism of PUFA synthesis, we downloaded the protein sequences of diverse desaturases and elongase involved in the biosynthesis of long-chain PUFA, which were well identified previously (Hashimoto et al. 2008) and used as queries to align with the predicted proteins. Three desaturase-encoding genes (Noc_GME1570, Noc_GME8161, and Noc_GME7482, P < 2.00 e−57) and one elongase-encoding gene (Noc_GME10858, P = 3.00 e−72) were identified. The matched genes included also a few other genes; however, their P values were >e−20, and accordingly, they should not be involved in long-chain PUFA. According to Hashimoto et al. (2008), the genes we identified should involve in the biosynthesis of fatty acids with a carbon chain length >18C. The synthesis of long-chain PUFA by N. oceanica is certain; the genes of two types of key enzyme encoding genes existed in its genome.
Discussion
In this study, the genome of N. oceanica was sequenced with Illumina GA sequencing technologies. The genome was estimated to be ∼30 Mb in size, which contained 11,129 protein-encoding genes. Of them, 59.65% were annotated by referring to diverse databases, and 29.68% were hit by the functions defined in KEGG database. This collection of functional genes will certainly facilitate the integrated investigations of N. oceanica, especially its genetic improvement and gene engineering.
For those algae with small genomes, the biological characteristics, especially reproduction strategy and nuclear ploidy, have remained as puzzles in the past. For example, the ultrasmall unicellular red alga Cyanidioschyzon merolae living in the extreme environment of acidic hot springs is thought to retain primitive features of cellular and genome organization. The determination of its 16.5 Mb genome showed a mixed gene repertoire of plants and animals, also implying a relationship with prokaryotes (Misumi et al. 2005). Among the puzzles of its biological characteristics, its reproduction strategy and nuclear ploidy remained unclear. Nannochloropsis also had examples of possible asexual reproduction; only autospore (genetically identical to the parental cells) formation (Fietz et al. 2005, Barsanti et al. 2008) and mitosis (Murakami and Hashimoto 2009) were observed. In the mutational reproduction of Nannochloropsis, it was also observed that the mutated genes trended toward the recessive nuclear ones, indicating that they are probably haploid (Galloway 1990). In this study, the newly appeared polymorphic nucleotides followed a trend of gradual accumulation with the increase of cell density. These polymorphic nucleotides should be caused by the natural mutation and clonal expansion of the mutants. It was also found that N. oceanica bears only two of a set of meiosis-specific proteins, confirming that it does not perform sexual reproduction. These biological characteristics allowed the alga convenient breeding strategies of mutation and clonal expansion, and also the convenience of its genetic manipulation. Actually, mutational breeding and gene engineering have been proved to be effective for their genetic improvement (Chaturvedi and Fujita 2006, Chen et al. 2008).
Meiosis is necessary for sexual reproduction, which distinguishes eukaryotes from prokaryotes; however, its origin and evolution are not fully understood. It has been revealed that many meiotic genes are conserved among animals, fungi and plants, and some eukaryotic protists (Malik et al. 2008). It is possible that meiosis arose during the course of eukaryotic evolution after the divergence of early eukaryotes. In other words, meiosis appeared only in some early eukaryotes, but not in all or in very early eukaryotes. If it concomitantly originated with cell nuclei, fewer clues of its origin could be drawn from the living organisms. N. oceanica is asexual; only two orthologs of meiosis-specific proteins were identified in its genome. However, it is not certain that N. oceanica is a premeiotic or ameiotic organism; the two proteins identified may either be the remains of a lost meiotic machine or newly evolved parts toward the creation of a meiotic system.
The full diversity of protists is not described, leaving the possibility of the existence of an ancestrally ameiotic organism. Giardia intestinalis (syn. G. lamblia) was not known to have a sexual cycle before its capacity of meiosis and implicitly sexual reproduction was supported by the identification of a set of meiotic genes (Ramesh et al. 2005) and the observation of diverse recombinant chromosomal fragments (Cooper et al. 2007). In this study, the nuclear monoploidy and asexual propagation in combination led us to favor the hypothesis that N. oceanica was a premeiotic or ameiotic alga.
Genome stability is associated with DNA damage repair. Loss of heterozygosity and chromosome rearrangement are related to homologous or mitotic recombination. Two homozygous cells (loss of heterozygosity) can be produced by mitotic crossovers between homologous regions with different alleles, whereas translocations can be produced when a crossover took place between repeated genes on different chromosomes (Lee et al. 2009). Two meiosis-specific proteins were found in similarity-based searching of predicted proteins; however, it is possible that these two proteins may serve as maintainers of genomic stability in the very early stage of evolution. Armillaria, a genus of fungi, undergoes mating between diploid and haploid mycelia, which can result in a recombinant diploid without meiosis (Baumgartner et al. 2010). An alternative understanding about the existence of two meiosis-specific proteins in N. oceanica is that they evolved de novo for the somatic recombination before the appearance of highly efficient meiotic recombination. We propose that two evolution events have taken place between prokaryotic and sexual eukaryotic cells: obligate clonal propagation, and occasional cell fusion and somatic recombination.
The diatom Phaeodactylum genome contained a large portion of genes that were believed to be horizontally transferred from bacteria, allowing the diatom to thrive in diverse environments (Bowler et al. 2008). N. oceanica contained also a certain portion of genes originated very early in evolution, such as those involved in genetic information processing. Unfortunately, these genes were also hit by the functions of cancer formation in KEGG annotation. There is an obvious shortage of functional annotation. Based on sequence (either nuclear acid or amino acid) similarity, the queries will match a group of proteins in databases; however, only the best match has been listed as the output each query. Bound with the best match is the species in which its function was defined. However, a protein may function differentially in different biological systems, while they may originate early in evolution and evolve across a wide range of species. The matches at low similarities may show the real function of a protein. Therefore, the matches should be filtered not only based on the similarity, but also on the species or biological system information. This is an obvious shortage in the KEGG functional assignment. The function of a protein should tightly link with the biological characteristics of a species. What we should do is assign a function involved in a closely related species or a sharable biological process to a protein, not all possible functions to a protein. For example, cancer-related function was assigned to a relatively large portion of N. oceanica predicted proteins in this study. Usually, multiple assignments could be found for these proteins, among them a function originated early in evolution could be found. In these cases, those later evolved functions should be discarded. Therefore, both sequence similarity and the biological characteristics should be integrated into functional annotation.
Loss-of-function genetics through either gene knockout or RNA interference has proved essential for understanding the biological functions of genes and engineering the metabolic pathways in a wide range of species. In recent years, miRNA technology provides a convenient tool for reverse genetic studies. It is noteworthy that a collection of miRNAs and snRNAs were identified in this study, indicating that Nannochloropsis encodes the essential RNAi components and miRNA technology may serve as a tool for deciphering gene functions as was done in Chlamydomonas (Zhao et al. 2009). In addition, genetic transformation has been tried in Nannochloropsis (Chen et al. 2008). We have also obtained a stable transformant that is resistant to antibiotics zeocin (data not shown). This work will certainly be facilitated by our profiling of the functional genes and metabolic pathways of N. oceanica.
Acknowledgments
This work was supported by National High Technology Research and Development Program (863 program) of China (2007AA09Z427), Basic Research Program of Municipal Bureau of Science and Technology of Qingdao (09-1-3-22-jch), Opening Research Program of Experimental Marine Biology of Institute of Oceanography, Chinese Academy of Sciences; and Major State Basic Research Development Program (973 program) of China (2011CB200901).