Harnessing novel cytidine deaminases from the animal kingdom for robust multiplexed base editing in rice
Summary
CRISPR-Cas-based cytosine base editors (CBEs) are prominent tools that perform site-specific and precise C-to-T conversions catalysed by cytidine deaminases. However, their use is often constrained by stringent editing preferences for genomic contexts, off-target effects and restricted editing windows. To expand the repertoire of CBEs, we systematically screened 66 novel cytidine deaminases sourced from various organisms, predominantly from the animal kingdom and benchmarked them in rice protoplasts using the nCas9-BE3 configuration. After selecting candidates in rice protoplasts and further validation in transgenic rice lines, we unveiled a few cytidine deaminases exhibiting high editing efficiencies and wide editing windows. CBEs based on these cytidine deaminases also displayed minimal frequencies of indels and C-to-R (R = A/G) conversions, suggesting high purity in C-to-T base editing. Furthermore, we highlight the highly efficient cytidine deaminase OoA3GX2 derived from Orca (killer whale) for its comparable activity across GC/CC/TC/AC sites, thus broadening the targeting scope of CBEs for robust multiplexed base editing. Finally, the whole-genome sequencing analyses revealed very few sgRNA-dependent and -independent off-target effects in independent T0 lines. This study expands the cytosine base-editing toolkit with many cytidine deaminases sourced from mammals, providing better-performing CBEs that can be further leveraged for sophisticated genome engineering strategies in rice and likely in other plant species.
Introduction
Base editing is a prominent genome editing technology that enables precise nucleotide changes in a programmable manner without relying on double-stranded breaks (DSBs) or donor templates (Komor et al., 2016). Cytosine base editors (CBEs) catalyse C·G to T·A base pair transitions, among other base-editing technologies (Molla et al., 2021). In its third generation – BE3 – the system is composed of a Cas9 nickase (nCas9 or Cas9-D10A) tethered to a single-stranded DNA (ssDNA) specific cytidine deaminase at the N-terminus and a uracil glycosylase inhibitor (UGI) at the C-terminus (Harris et al., 2002; Komor et al., 2016). Upon nCas9 binding to its target site, the R-loop is generated, thereby exposing a few nucleotides of the non-complementary DNA strand, in which the cytidine deaminase performs the deamination of target cytosines into uracil bases. DNA repair or replication ultimately converts uracils to thymines, resulting in C·G to T·A base pair changes at the target DNA. As key elements in CBE systems, cytidine deaminases have been extensively optimized in recent years, overcoming critical limitations, such as off-target effects, unwanted indels and stringent motif preferences (Kim et al., 2017; Thuronyi et al., 2019; Zhou et al., 2019; Zuo et al., 2019; Zong et al., 2018).
Several studies have alarmed the propensity of cytidine deaminases to generate sgRNA-independent off-target effects, showing unintended single-nucleotide variants (SNVs) throughout the target genome due to random interactions between cytidine deaminases and exposed ssDNA (Zhou et al., 2019; Zuo et al., 2019). Although UGI partially relieves the base excision repair upon cytidine deamination, DSBs are also likely to occur, thus generating undesirable indels (Ren et al., 2021). Furthermore, highly efficient cytidine deaminases generally show stringent preferences for specific genomic contexts (Song et al., 2020), which limits the editing scope of these base editors, thus hindering their adoption in multiplex editing approaches and for gRNA library-based semi-saturated targeted mutagenesis for protein evolution (Ren et al., 2021; Kuang et al., 2020; Liu et al., 2020; Li et al., 2020).
Recent studies endeavoured to identify cytidine deaminases that could overcome these drawbacks, thereby decreasing off-target effects and relaxing the preference for genomic contexts (Huang et al., 2023; Xu et al., 2024). A new cytidine deaminase (CD0208P52A) was reported to provide high editing efficiency, low off-target effects and dependence on genomic contexts, and a narrow editing window in mammalian cells (Xu et al., 2024). Similarly, an AlphaFold2-based cytidine deaminase mining revealed compact candidates (Sdd7 and Sdd6) that outperformed editing efficiency of APOBEC/AID BEs, displaying narrow editing windows as demonstrated in soybean (Huang et al., 2023). However, narrow editing windows can limit the genome-targeting scope of BE for targeted mutagenesis and artificial evolution. By comparing a handful of cytidine deaminases from fish and mammals, we recently revealed their divergent properties on base-editing efficiency, purity and specificity (Ren et al., 2021). Hence, exploring cytidine deaminases in multiple organisms is a promising approach to further enrich the CBE toolbox for plant genome editing.
In this study, we screened 66 novel cytidine deaminases from distinct organisms, predominantly of the animal kingdom and benchmarked them in the BE3 configuration against commonly used CBEs in rice protoplasts and transgenic lines. Our study developed many new CBEs that confer efficient base editing in plants and identified a cytidine deaminase, OoA3GX2 that enables wide-scope base editing due to its relaxed substrate context dependence.
Results
Screening novel cytidine deaminases sourced from distinct organisms
To identify new cytidine deaminase candidates, the amino acid sequences of well-known cytidine deaminases used in CBEs (Table S1), such as human APOBEC3A (hA3A) (Ren et al., 2021), PmCDA1 (Shimatani et al., 2017; Tang et al., 2019) and hAID (Ren et al., 2018), were used as queries in Protein BLAST searches. After filtering the resulting BLAST hits, 36 cytidine deaminase coding sequences (mostly sourced from the animal kingdom) were synthesized and coupled to T-DNA vectors harbouring BE3 architecture (Figure 1a). We chose to focus on investigating cytidine deaminases from the animal kingdom because current CBEs nearly exclusively use cytidine deaminases from animals. A high GC-rich region of the OsCGRS55 gene (chr 04: 18502459..18502481) was chosen as a suitable target site (Figure 1b) as we recently showed high efficiency at this site (Ren et al., 2021) with multiple Cs present, which allows for base-editing window assessment. We carried out a first round of rice protoplast transformation, in which we evaluated the base-editing efficiencies of these 36 deaminases compared to the CBE controls, hA3A, PmCDA1 and hAID (Figure 1c). Based on next-generation sequencing (NGS) of PCR amplicons, seven deaminases displayed higher or comparable editing efficiencies to the CBE controls. Interestingly, two cytidine deaminases sourced from marine mammals, DnA3X2 and BasA3G, were the two top-performing deaminases (Figure 1c). Using their amino acid sequences as queries, we subsequently searched and identified 30 additional deaminases predominantly sourced from marine mammals, such as whales and dolphins. We identified at least eight additional cytidine deaminases conferring efficient C-to-T transition rates (~10%) at the target site in rice protoplasts (Figure 1d). As the protein sequences in the second round are phylogenetically closer to the two best-performing deaminases from the first round, it was expected to unveil a higher number of cytidine deaminases outperforming CBE controls.

The best cytidine deaminases identified in the first two rounds were tested side-by-side to confirm their efficiencies at the OsCGRS55 target site (Figure 1e). Strikingly, we found at least five deaminases in their respective BE3 outperforming the most efficient previously established base editor, hA3A-Y130F (21%), showing around 25% of the NGS reads displaying C-to-T conversions. These novel deaminases were BasA3G (29%), MmA3AX1 (29.1%), CdA3G (29.3%), OoA3GX2 (22.8%) and LoA3GX1 (29.6%). Moreover, novel base editors showed various base-editing windows across the OsCGRS55 target site (Figure 1f), which may be related to their structural signatures and interactions with the ssDNA. As expected, all cytidine deaminases in BE3 exhibited extremely low C-to-A editing efficiency (~0.05% to 0.1%) and C-to-G editing efficiency (<0.04%) (Figure S1), confirming these deaminases confer specific C-to-T conversion.
Testing top-performing cytidine deaminases in BE3 for multiplexed C-to-T base editing
We wanted to know whether these new base editors could function efficiently at other genomic sites. To this end, we chose 11 top-performing deaminases to target four endogenous genes in rice, including OsALS (Figure 2a), OsGN1a (Figure 2b), OsGS3 (Figure 2c) and OsGW2 (Figure 2d), which are associated with agronomic traits such as herbicide resistance, grain number, grain size and grain weight, respectively. As positive controls, we included hAID, hA3A-Y130F and TadCBEa (Neugebauer et al., 2023), which was derived from the adenine deaminase TadA-8e and was recently demonstrated for C-to-T base editing in plants (Fan et al., 2024). Across these four target sites, we found similar or higher base-editing efficiencies for most of the new CBEs compared to the three CBE controls, with differences possibly related to the sequence-specific determinants (Arbab et al., 2020; Song et al., 2020). Importantly, BasA3G-, LoA3G1-, DlA3G- and OoA3GX2-derived CBEs were the ones that showed consistent wide base-editing windows in the four target sites (Figure 2e–h). Accordingly, the CBEs based on these four cytidine deaminases BE3 resulted in higher editing efficiencies among the new CBEs tested (Figure 2i), as they can edit more cytosine nucleotides due to wider base-editing windows. We also evaluated editing purity at the target sites. Among all these CBEs, we found that both indel (Figure 2j) and C-to-R (R = A/G) conversion (Figure 2k) frequencies did not show statistical differences compared to wild-type (WT) samples, suggesting these deaminases in BE3 can perform base editing with high purity in their C-to-T editing outcomes.

Multiplexed base editing by novel cytidine deaminases CBEs in transgenic rice lines
To validate that these novel deaminases are capable of multiplex base editing in rice stable lines, we decided to select the five best-performing CBEs for Agrobacterium-mediated transformation of rice calli, including the hA3A-Y130F as the CBE control and a GFP construct as tissue culture control. Approximately 10–15 individual T0 lines were regenerated and selected for genotyping of each construct. Despite some discrepancy in editing activities between protoplasts and stable transgenic lines for LoA3GX1-based CBE, the data showed that some deaminases in CBEs performed efficiently at most of the target sites (Figure 3a–d), especially OoA3GX2 and DlA3G, which display no statistical differences compared to the hA3A-Y130F in the bulked analysis (Figure 3e). In fact, OoA3GX2 (17.6% on average) and DlA3G (27% on average) were the only deaminases in CBEs that displayed much higher C-to-T conversions in the OsGN1a target site than hA3A-Y130F (4.7% on average) (Figure 3b). Consistent with the protoplast data, this data reflects the expanded targeting scope of these deaminases upon coupled to the BE3 architecture.

Regarding the base-editing windows, we confirmed the wide base-editing activity of OoA3GX2 CBE at four target sites (Figure S2), consistent with the data from rice protoplasts. Compared to the rice protoplast data, rice transgenic lines seem to harbour higher levels of indels in the surroundings of the targeted regions (Figure 3f compared to Figure 2j), which may be partially related to the mutagenic effects of plant tissue culture (Ren et al., 2021), as the WT (GFP) lines also present the same trend. Notably, the C-to-R conversions (Figure 3g,h) remained minimal, as they did not show statistically significant differences compared to the WT control.
OoA3GX2 CBE shows a more relaxed targeting cytosine context and confers robust multiplexed editing
Cytidine deaminases usually display preferences for specific genomic contexts surrounding the target sites, in which a strong sequence-activity relationship dictates the base-editing outcomes (Arbab et al., 2020; Song et al., 2020). Therefore, we decided to investigate the editing efficiency of the novel cytidine deaminase CBEs on 5′-NC motifs by combining the data from all the target sites. Contrastingly to the novel deaminases, hA3A-Y130F displayed a stringent preference for 5′-TC motifs (Figure 4a), whereas BasA3G and DlA3G seem to exhibit higher flexibility to edit in different contexts. OoA3GX2, the most promising novel cytidine deaminase CBE of this study, also showed a more balanced preference among GC/CC/TC/AC motifs when compared to hA3A-Y130F CBE (Figure 4b). For instance, hA3A-Y130F CBE targets around 75% of TC motifs, whereas OoA3GX2 targets only 50%, which correspondingly increases the editing rates of low-efficient NC motifs and expands the scope for genomic target sites. While the genomic context dependence of A3A deaminase in CBE was restrained upon Y130F amino acid substitution, flexible cytidine deaminases, such as OoA3GX2, display clusters of DNA-binding residues (Figure S3) that could be functionally investigated under strategies such as alanine scanning (Morrison and Weiss, 2001; Xu et al., 2024).

To finally evaluate the performance of hA3A-Y130F and OoA3GX2 CBEs in multiplexed base editing, we compared the genotypes of the independent T0 lines from these two constructs (Figure 4c,d). Both CBEs could generate biallelic edits, specifically in OsGW2, which is ensured by the intrinsic high efficiency of this target site. However, hA3A-Y130F CBE could only generate one quadruple mutant (at least chimeric edits in each target site), whereas the novel CBE OoA3GX2 generated three plants with edits at four target sites. This is a result of its less stringent preference for 5′-NC motifs, which enabled OoA3GX2 CBE to perform efficient base editing at the four target sites. Impressively, when comparing the percentage of quadruple multiplexed base-edited plants per novel CBE (Figure 4e), we noticed that OoA3GX2 outperformed the hA3A-Y130F by generating twice as many as quadruple mutant plants. Finally, genotypes of the representative OoA3GX2 T0 line (Figure 4f) showed alleles with simultaneous C-to-T conversions at three target sites, except for OsGN1a. These findings suggest that the OoA3GX2-derived CBE may be preferred for more robust multiplexed base editing in plants, with a broader editing range, less stringent motif preference and high purity of editing outcomes.
Genome-wide analysis of off-target effects of OoA3GX2 CBE
To assess the potential off-target effects of OoA3GX2 CBE, we conducted whole-genome sequencing (WGS) of three T0 lines edited by OoA3GX2 CBE (lines #2, 4, 7), three T0 lines edited by hA3A-Y130F CBE (positive controls - lines #4, 7 and 8), and three T0 lines transformed with a GFP construct (as tissue culture and transformation controls). The sequencing analysis yielded comparable sequencing depth and genome coverage data across three experimental groups: OoA3GX2, hA3A-Y130F and GFP. The average depth was 133.87X ± 9.57 for OoA3GX2, 114.28X ± 5.63 for hA3A-Y130F and 119.77X ± 6.67 for GFP. Similar genome coverage across the groups was observed, with an average coverage of 98.23% ± 0.02 for OoA3GX2, 98.23% ± 0.03 for hA3A-Y130F and 98.27% ± 0.03 for GFP.
We examined two sets of data: the number of single-nucleotide variants (SNVs) (Figure 5a,b) and the number of insertions/deletions (indels) (Figure 5c,d). Although the average numbers of SNVs and indels are higher in the OoA3GX2 and hA3A-Y130F groups than GFP (Figure 5a,c), statistical tests indicate that these differences are not statistically significant. The parametric ANOVA test for SNVs yielded an F-statistic of 1.80 with a P-value of 0.244; for indels, the F-statistic was 4.34 with a P-value of 0.068. The non-parametric Kruskal–Wallis tests showed H-statistics of 3.47 (P = 0.177) for SNVs and 4.73 (P = 0.094) for indels. In all cases, the P-values were greater than 0.05. Thus, while OoA3GX2 and hA3A-Y130F groups tended to have higher counts of SNVs and indels on average than GFP plants, these differences are not statistically significant. Further analyses showed the distribution of SNVs and indels across different genomic regions, and only small fractions of these mutations were found in exons (Figure 5b,d). We further analysed SNVs by grouping them into transitions and transversions. While OoA3GX2 and hA3A-Y130F groups showed slightly elevated frequencies of C-to-T substitutions, they are not statistically significant from the GFP lines (Figure S4). These analyses suggest we did not detect substantial genome-wide off-target effects by either OoA3GX2 or hA3A-Y130F CBEs in these T0 lines. These mutations largely resulted from somaclonal variation during tissue culture, and most of such mutations would not cause phenotypic changes as they reside in intergenic and flanking regions in the genome.

However, with this kind of bulk analysis, we cannot rule out that some mutations detected in OoA3GX2 and hA3A-Y130F edited lines could be due to gRNA-dependent off-target effects. To this end, we evaluated all these SNVs and indels and identified mutations in three potential off-target sites by the OsGN1a-targeting gRNA and one potential off-target site by the OsGS3-targeting gRNA (Figure 5e). At the OsGN1a off-target site 1, which has two mismatch mutations to the on-target site, a C-to-G mutation was detected in OoA3G2 line #10 (Figure 5f). Interestingly, biallelic off-target editing by hA3A-Y130F occurred at this off-target site, with one allele being a C-to-T edit and the other allele being a deletion (Figure 5f), suggesting a DNA DSB at this site. At the OsGN1a off-target site 2, which has five mismatch mutations to the on-target site at the 5′-end of the protospacer, a C-to-T mutation was discovered in OoA3GX2 line #2 (Figure 5f). Surprisingly, for the OsGN1a off-target site 3, which also contains five mismatch mutations with two mutations in the seed sequence of the protospacer, a C-to-T mutation was detected in hA3A-Y130F line #9 (Figure 5f). Further, at the OsGS3 off-target site 1 that contains five mismatch mutations, a G insertion was discovered in hA3A-Y130F line #8 (Figure 5g). One base pair insertion at the Cas9 cut site is a hallmark of non-homologous end joining repair events, suggesting that a DSB occurred at this off-target site in hA3A-Y130F line #8. Collectively, our analysis identified a few candidate off-target mutations caused by gRNA-dependent off-target effects of CBEs. Among both CBEs, OoA3GX2 CBE showed less tendency to induce off-target mutations at the predicted off-target sites, suggesting that OoA3GX2 based CBE is very specific for targeted base editing in rice.
Discussion
The effectiveness of cytosine base editing is hampered by intrinsic drawbacks of its main molecular component, the cytidine deaminase. Recently, a few studies (Huang et al., 2023; Xu et al., 2024) have explored putative cytidine deaminase-like proteins and reported them as potential candidates that circumvent such drawbacks of CBEs. Nevertheless, among the few cytidine deaminases in use, hA3A-Y130F, despite its high efficiency and wide use across plant species (Li et al., 2021; Randall et al., 2021; Ren et al., 2021), is highly dependent on genomic contexts, restricting its genomic-targeting scope for multiplex base editing. In this study, we sought to address this issue by mining previously unknown deaminases sourced from the organisms of the phylogenetic tree of life. Remarkably, we found that many of the deaminases sourced from marine mammals outperformed the base-editing efficiencies of well-established CBEs, used in this study as positive controls (i.e. A3A, hA3A-Y130F, hAID and PmCDA1). For example, the deaminase featured in this study, OoA3GX2, is from Orca (killer whale). In the future, it will be interesting to compare these naturally sourced cytidine deaminases with those developed by AI and machine learning, such as Sdd6 and Sdd7 (Huang et al., 2023). Nevertheless, our findings suggest that simple sequence-based approaches may still be suitable for discovering new biotechnological tools for genome engineering, especially when combined with a rapid rice protoplast transient expression system for empirical testing.
During base-editing reactions in the nucleus, the formation of R-loops partially relies on various structural features of the deaminase and its interaction with ssDNA substrates (Bohn et al., 2015; Kouno et al., 2017). Therefore, harnessing unknown deaminases from different biological taxa may yield unique base-editing outcomes. In fact, we found that our novel deaminases displayed differing base-editing working windows as their molecular signatures, which may result from the variable interactions with target ssDNA molecules. For instance, the well-known lamprey-derived deaminase PmCDA1 tends to operate in positions C16 to C19 counting from the PAM (Nishida et al., 2016), whereas a newly engineered version of human APOBEC3 (hA3A) displays higher activity at C13 to C16 relative to the PAM (Yang et al., 2024). These CBEs with relatively narrow editing windows can limit the number of targetable cytosines. Remarkably, here we reported on novel cytidine deaminases in CBEs, such as LoA3GX1, DlA3G and OoA3GX2, which display ultrawide active windows around C7 to C20 relative to the PAM. These CBEs may broaden the genome-targeting scope by making more cytosines accessible within the target sites. In applications such as directed protein evolution, cis-regulatory element perturbation and loss-of-function of plant endogenous target mimics (PeTMs) (Gupta, 2015), the novel broad-window CBEs may offer opportunities aside from the ‘canonical’ working windows with advantageous features.
Deaminases are often limited by their nearest-neighbour preference for specific genomic contexts, which substantially dictates the editing activity, especially in unfavourable regions with difficult access to target cytosines (Arbab et al., 2020). Typically, deaminases from the APOBEC1 family exhibit dependence on 5′-TC motifs to achieve high editing rates (Komor et al., 2016), whereas APOBEC3G (A3G) deaminases tend to be more efficient on 5′-CC dinucleotide motifs (Lee et al., 2020; Liu et al., 2020). Although positions within the activity windows can be edited regardless of dinucleotide motifs (Pallaseni et al., 2022), the genomic context dependence in deaminases still limits the editing efficiency at outside positions. In contrast to the reliance on TC motifs by the widely used hA3A-Y130F, the five best-performing deaminases out of our screen showed more balanced editing activities among the four different dinucleotide motifs. Still, except for DlA3G, these deaminases display preferences in the following order: TC > CC > GC > AC. However, even the most efficient deaminase featured in this study, OoA3GX2, unexpectedly showed a substantial increase in the activity in unfavourable dinucleotide motifs, such as GC (1.8-fold) and AC (3.8-fold), compared to hA3A-Y130F. This reduced context dependence of OoA3GX2 allows it can be adopted for flexible cytosine base editing, reaching genomic regions that were hitherto unfavourable.
Multiplex base editing is often adopted to manipulate multiple traits of agronomic interest in plants (Li et al., 2024; Molla et al., 2021; Ren et al., 2021; Yan et al., 2021). However, many factors can hinder this strategy, such as the NGG-PAM for nCas9 activity, NC dinucleotide motif dependence, targetable cytosines outside the editing window, methylation of CpG islands, low-efficiency sgRNAs, among others. In this study, we used hA3A-Y130F as a state-of-the-art base editor control, which has a well-established performance on many target sites (Li et al., 2021; Ren et al., 2021; Wang et al., 2018; Zhang et al., 2024), including methylated ones. Nevertheless, hA3A-Y130F failed to target four genes (OsGN1a, OsGS3, OsGW2 and OsALS) simultaneously in both rice protoplasts and transgenic lines. Specifically, hA3A-Y130F CBE did not show high base-editing efficiency on the OsGN1a target site, which may be associated with low sgRNA efficiency and the absence of TC dinucleotide motifs. Surprisingly, the OoA3GX2 BE3 produced more than 40% of the reads with C-to-T conversion at a GC motif within the OsGN1a target site, which is typically neglected by APOBEC-like BEs (Kim et al., 2017; Komor et al., 2016). In the other target sites, OoA3GX2 CBE could generate multiple C-to-T conversions throughout its wide editing window in the same allele, reinforcing its usefulness in applications like protein evolution and promoter editing. Notably, the high performance of OoA3GX2 in base editing of OsGN1a, OsGS3, OsGW2 and OsALS simultaneously serves as an example of its flexibility and broader scope due to its low dependence on genomic contexts. This is best demonstrated when OoA3GX2 CBE outperformed hA3A-Y130F CBE with more than twice as many quadruple mutant plants, thus standing out as a suitable cytidine deaminase for multiplex base-editing approaches.
Our WGS-based off-target analyses also revealed OoA3GX2 as a highly specific cytidine deaminase for base editing. Unlike hA3A-Y130F CBE, OoA3GX2 CBE cannot tolerate mismatches at the seed sequence of protospacers. However, it is noteworthy that detecting mutations by WGS can be challenging due to varying mutation frequencies, and increasing sequencing depth is a common approach to enhance mutation-calling accuracy. For mutations with higher frequencies (≥20%), a sequencing depth of at least 200X is generally adequate to identify 95% of mutations accurately (Chen et al., 2020). Furthermore, the off-target mutation rate may be associated with the expression levels of CBEs in the tested plants. Based on a similar WGS analysis pipeline, our data are largely consistent with our previous WGS analysis of CBEs in rice (Ren et al., 2021) and tomato (Randall et al., 2021).
Recently, AlphaFold2-based, structure-guided protein clustering and design were used to engineer next-generate cytidine deaminases. In one study, a novel cytidine deaminase was engineered from a protein from a DddA-like clade that does not process deamination activity, and the resulting CBE was demonstrated for base editing in both human cells and soybean plants (Huang et al., 2023). In another study, by analysing the three-dimensional structure of thousands of untested cytidine deaminases including and that of a subset of experimentally validated ones, the authors discovered new deaminases with high editing efficiencies, diverse editing windows and reduced context dependency (Xu et al., 2024). In our study, we obtained experimental data for 66 novel cytidine deaminases and four well-studied cytidine deaminases. Based on our data, adopting similar approaches that employ AI-assisted structure modelling for further discovery and engineering of more versatile cytidine deaminases to enrich the CBE toolbox may be of interest. The top-performing CBEs in our collection were made available at Addgene, allowing users to further test and use in plant species other than rice. If needed, the available protein sequences of these cytidine deaminases will facilitate the generation of new vectors with codon optimization for optimal expression in other target plant species.
In summary, we identified novel cytidine deaminases for base editing in rice plants. By rapidly assessing these BEs in rice protoplasts, we pinpointed promising deaminase candidates that could outperform state-of-the-art BEs in editing efficiency and displayed differing base-editing windows. After validating them in rice transgenic lines, we showed that OoA3GX2 BE3 could perform robust multiplex base editing by exhibiting low genomic context dependence, ultrawide base-editing window and highly pure base-editing outcomes. These findings warrant future testing of OoA3GX2 and other top-performing deaminases in other species, including humans and dicot plants. Furthermore, these features helped broaden the scope of targetable genomic sites for CBEs and provided a new tool for protein engineering through directed evolution, as well as the fine tuning of gene expression via CRE engineering (e.g. promoter regions) and PeTM loss-of-function, thus bringing new purposes in the use of CBEs.
Methods
Cytidine deaminases data mining
The cytidine deaminase amino acid sequences were prospected by using known cytidine deaminases, such as rAPOBEC1, PmCDA1, hA3A and hAID as queries in Protein BLAST (BLASTp) searches against the Mammalia RefSeq Protein Database (taxid: 40674) in the NCBI. These amino acid sequences are listed in the Table S1. The resulting hits were filtered based on (i) protein sequence length of 150–300 amino acids, (ii) less than 80% of identity to the query protein sequences and (iii) optimal organismal temperature, with priority given to organisms of ambient or near-ambient temperature for hypothesized function in plant species. Multiple alignments of deaminase amino acid sequences were carried out using the MUSCLE program (Edgar, 2004) and used as inputs for the construction of maximum likelihood-phylogenetic trees (with 1000 bootstrap replicates) by IQ-TREE server (Nguyen et al., 2015). A second round of cytidine deaminases were selected for specific APOBEC3 families based upon the efficiency of first round candidates from the family, using the DnA3X2 and BasA3G sequences identified in the first round of candidates as queries in the RefSeq database.
Assembly of vectors
Rice codon-optimized (IDT codon optimization tool) cytidine deaminases coding sequences (CDS) were synthesized by Twist Biosciences as gene fragments and cloned into BsrGI-HF (NEB, catalogue #: R3575*) and BsaI-HFv2 (NEB, catalogue #: R3733*) digested pYPQ265E2 backbone (Addgene ID: 164719), which encodes a BE3 architecture (maize codon-optimized nCas9 fused to UGI) and attL1-attR5 sites. The Gateway-compatible vectors for the top-performing cytidine deaminases are available at Addgene: pYPQ265_BasA3G (Addgene ID: 225145), pYPQ265_EtA3C (Addgene ID: 225146), pYPQ265_OoA3GX2 (Addgene ID: 225147), pYPQ265_DlA3G (Addgene ID: 225148) and pYPQ265_LoA3GX1 (Addgene ID: 225149). Single-guide RNAs (sgRNAs) were cloned into entry vectors previously published in (Lowder et al., 2018) using BsmBI (Thermo scientific, catalogue #: ER045*), which were transferred to a single Gateway-compatible sgRNA expression entry clone with attL5-attL2 sites. The guide RNA oligonucleotides (Table S2) and NCBI accession number of each cytidine deaminase (Tables S3 and S4) are summarized in the supporting information. The T-DNA expression vectors were assembled by a Multisite Gateway LR cloning strategy using attR1-attR2 destination vector pYPQ203 (Addgene no. 86207) for rice.
Rice protoplast transformation
The japonica rice cultivar Kitaake was used in this study for protoplast transformation. Protoplast isolation and transformation were carried out according to a previous report with minor modifications (Sretenovic et al., 2021). Briefly, protoplasts were isolated from 12-day-old rice etiolated stems using enzyme solution, resuspended in 0.55 M sucrose (pH 5.7), and overlayed with W5 solution without mixing. After centrifugation (200 g for 30 min), protoplast cells were collected from the interface between sucrose and W5 solution, washed with W5 solution and resuspended in MMG solution. For each construct, 20 μg of T-DNA plasmid was introduced into 180 μL of rice protoplasts (1 × 106 cells/mL) by PEG-mediated transfection. The transfected protoplasts were incubated in the dark for 48 h at 32 °C before collecting samples for genotyping.
Rice stable transformation
Rice Kitaake mature seeds were plated on N6D medium for callus induction at 32 °C under constant dim light (Pan et al., 2022). For stable transformation, the T-DNA constructs were delivered into rice embryogenic calli using Agrobacterium tumefaciens (EHA105 strain) according to a previous report (Hiei and Komari, 2008) with minor modifications. The transformed plants were selected and regenerated on media containing 50 mg/L hygromycin B. The transgenic rice plants were grown in a greenhouse at 29 °C under a 16-h-light/8-h-dark cycle.
DNA extraction and amplicon deep sequencing
Amplicon deep sequencing was carried out to investigate base-editing outcomes in this study. Cells were collected for PCR for rice protoplasts using the Phire Plant Direct PCR Kit (Thermo Fisher Scientific), amplifying the target regions. For stable transgenic plants, leaf tissue was collected for DNA extraction using the CTAB method (Stewart Jr and Via, 1993), which was amplified using Q5 DNA Polymerase (New England Biolabs). All the PCR products were barcoded using the Hi-TOM primers (Liu et al., 2019), which are listed in Table S5, verified by gel electrophoresis and purified with a QIAQuick PCR Purification Kit (QIAGEN). The purified PCR products were sequenced using the Illumina MiSeq 2 × 250 bp platform (Genewiz, USA).
Mutation analysis
High-quality reads were split with FLASH (https://github.com/ebiggers/flash), merged with CRISPRMatchGUI (https://github.com/zhangtaolab/CRISPRMatchGUI) and analysed with CRISPR RGEN Tools (http://www.rgenome.net/be-analyzer/#!). For genotyping T0 lines, the zygosity was determined as follows: Wild-type plants (editing frequency < 10%), chimeric edits (10% < editing frequency < 30%), monoallelic edits (30% < editing frequency < 75%) and biallelic edits (editing frequency > 75%).
WGS and off-target analysis
Genomic DNA extraction was performed using DNeasy Plant Mini kit (QIAGEN, Germany). All plant samples were sequenced by Illumina® MiSeq® platform (Genewiz, U.S.). Wild-type rice (Oryza sativa L.) Japonica cultivar Kitaake sequencing data were included in our analysis to help with variant filtering and calling (Gurel et al., 2023). The analysis pipeline followed an adapted protocol (Liu et al., 2021). Quality control was conducted using FastQC (v0.11.9), and Skewer (v 0.2.2) was used to remove low-quality reads (Jiang et al., 2014). Cleaned reads were mapped to the rice reference genome - OsativaKitaake_499_v3.0 (Li et al., 2017; Liu et al., 2019) using BWA (0.7.17-r1188) (Li and Drubin, 2009). BAM files were sorted with Samtools (v1.10) (Li et al., 2009), and Picard (v 2.21.9) was utilized to mark duplicate reads. After pre-processing BAM files, the Genome Analysis Toolkit (GATK, v3.8) was used for realignment and recalibration (McKenna et al., 2010). High-confidence variant detection was achieved with four different tools. Single-nucleotide variants (SNVs) were identified using LoFreq (v2.1.2) (Wilm et al., 2012), MuTect2 (Benjamin et al., 2019) and VarScan (v2.4.6) (Koboldt et al., 2012), while indel calling was done with MuTect2, VarScan and Pindel (v0.2.5b9) (Ye et al., 2009). Only SNVs and indels consistently identified by their respective tools were retained. BEDTools (v2.27.1) and BCFtools (v1.17) were used to filter SNVs/indels and process VCF files. The resulting mutations were visualized with IGV software. Cas-OFFinder was employed to predict genome-wide sgRNA-dependent off-target sites, allowing up to 5-bp mismatches, with IGV being used to double-check potential off-target sites.
Author contributions
Y.Q., S.S., M.D. and D.F.C. conceived the study and designed the experiments. D.F.C. and S.S. generated all the molecular constructs. D.F.C., S.S. and Y.C. performed rice protoplast experiments. D.F.C. generated stable rice plants and conducted downstream analyses. M.Z. performed genome-wide off-target analyses. S.C. and S.X. helped with the development of the manuscript. Y.Q. and D.F.C. wrote the paper with input from other authors. All authors read and approved the final manuscript.
Acknowledgements
This work was supported by USDA-NIFA BRAG Program (2018-33522-28789 and 2024-33522-42755) and NSF PGRP Program (IOS-2132693 and IOS-2224203). D.F.C. is a fellow from the São Paulo Research Foundation (FAPESP – Process Number: 2020/07045-3, 2021/13478-2 and 2022/11738-0). S.S. was a fellow of the Foundation for Food and Agriculture Research.
Conflict of interest
Y.Q., S.S., Y.C. and M.D. are inventors on a US Patent Application for discovering novel cytidine deaminases from this study. All other authors declare no competing interests.
Open Research
Data availability statement
The WGS data have been deposited in the Sequence Read Archive (SRA) in the National Center for Biotechnology Information (NCBI) under the accession number BioProject PRJNA1161160 with the SRR numbers (SRR30670606 to SRR30670614).