Studying the features of 57 confirmed CRISPR loci in 29 strains of Escherichia coli
Abstract
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) system is a novel type of innate defense system in prokaryotes for destruction of exogenous elements. To gain further insight into behavior and organization of the system, the extensive analysis of the available sequenced genomes is necessary. The dynamic nature of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. There are a few orderly bioinformatics investigations about the structure of CRISPR sequences in the Escherichia coli strains. In this study, 57 CRISPR loci were selected from 32 Escherichia coli strains to investigate their structural characteristics and potential functions using bioinformatics tools. Our results showed that most strains contained several loci that mainly included conserved direct repeats, while the spacers were highly variable. Moreover, RNA analysis of the sequences indicated that all loci could form stable RNA secondary structures and showed homology mostly with phages compared to plasmids. Only three strains included cas genes around their loci.
Abbreviations
-
- CAS-CRISPR
-
- associated genes
-
- CDR
-
- consensus direct repeat
-
- CRISPR
-
- clustered regularly interspaced short palindromic repeats
-
- MFE
-
- minimum free energy
-
- MSA
-
- multiple sequence alignment
-
- siRNA
-
- small interfering RNA
-
- smRNA
-
- small RNAs
Introduction
Microorganisms are usually invaded by exogenous agents such as viruses, plasmids, and other destructive attacking mobile genetic elements; so they have developed a type of innate defense system to put down these predators 1. Escherichia coli is a very diverse bacterial species and contains strains that exist in different hosts and in a wide range of environments 2. The ability to survive in various environments is related to their high degree of adaptability 3. E. coli strains possess a type of defense system, which has been newly discovered in about 40% of bacteria and most of archaea 4-6, containing two main parts: (i) CRISPR (clustered regularly interspaced short palindromic repeats) and (ii) CAS (CRISPR-associated genes) proteins [3,7,8], which functions as a part of immune system in the fighting with microorganisms, bacteriophages, viruses, and plasmids or any alien DNA [5,95]. A part of this adaptability in E. coli strains can be related to CRISPR sequences 10. These sequences have been constructed from direct repeats of nearly identical sequences interspaced by unique spacers with similar sizes [4,5]. CRISPRs code no product, but different studies have been indicated that they are transcribed to small RNAs (smRNA), acting as small interfering RNA (siRNA) to hamper the entry of alien DNA 11. The number of CRISPR groups and the spacers is greatly variable, but the sequences of direct repeats are greatly conserved. The spacer sequences are mostly similar to plasmids and phages sequences, therefore, it is suggested that the spacers are derived from plasmids or phages 9. As mentioned above, CRISPR complex comprises a part of the immune system, and the spacers play a key role in defense against plasmids and phages 3, 9. Earlier studies of various E. coli strains have demonstrated a vast heterogeneity in spacers and CRISPR/CAS content 12. Although, spacers possess a similar length in one locus, they have varied nucleotide compositions 3. In a specific CRISPR locus, the repeats always contain palindromic structures and may constitute RNA secondary made up loops and stems 13. Moreover, CRISPR loci indicate great polymorphism in different strains, and this property is applied for identification of clinical strains of Mycobacterium tuberculosis, Streptococcus pyogenes, and Campylobacter jejuni 14. CRISPR/Cas system currently have many applications in different fields including: rapid creation of cellular and animal models, research, medicine, and biotechnology. One of the most important applications of CRISPR system is in genome editing field. It provides a targeted and efficient change in a variety of eukaryotic and special mammalian species. Using this system, DNA sequences in the endogenous genome and their functional products are now simply changed or modulated in nearly any organism of choice 15. For this reason, CRISPR/Cas system is the topic of numerous studies due to their interesting RNA-based action mechanism 16. Presently, instead of some expensive and lengthy practical steps 17, the on-hand bioinformatics tools may help investigators in various areas 18-20 to begin their experiments with some projections. To gain further insight about the behavior and organization of CRISPRs, the extensive analysis of available sequenced genomes is necessary. The systematic study of CRISPR structures may help us to find out more potential roles for these sequences in bacteria. Collectively, a few orderly bioinformatics investigations are available about the structure of CRISPR in the E. coli strains. In this study, we used 57 identified and confirmed CRISPR loci to investigate their structural characteristics and potential functions in E. coli strains using bioinformatics tools. CRISPR loci were categorized based on similarity between direct repeats and evolutionary relationships of spacers. Moreover, we searched the homology of spacer sequences with bacteriophage genomes and plasmids, and investigated the presence of the cas genes around CRISPR loci, in addition to predicted RNA secondary structures of the direct repeats and their stabilities.
Materials and methods
Sequence collection
Different E. coli strains genome were searched through National Center for Biotechnology Information (NCBI) nucleotide database (http://www.ncbi.nlm.nih.gov/) with default parameters; then E. coli CRISPR loci (Table 1) were searched by CRISPR finder server (E-value ≤0.1) (http://crispr.u-psud.fr/Server/). CRISPR finder allows us to identify the structures with the basic characteristics of CRISPRs 21.
Strain | CRISPR id | Number of CRISPR | Number of spacers | CRISPR length | DR length |
---|---|---|---|---|---|
E. coli BIDMC 19A | NZ_KI929698_3, 4 | 2 | 21, 10 | 1312, 638 | 29, 28 |
E. coliK12AG100 | NZ_LN832404_4, 5 | 2 | 12, 6 | 762, 393 | 29, 28 |
E. coli O104:H4 str. C227-11 | NZ_CP011331_1 | 1 | 13 | 821 | 29 |
E. coli strain BIDMC112 | NZ_KQ087963_2, 4, 5 | 3 | 4, 8, 5 | 273, 516, 333 | 29, 29, 29 |
E. coli strain ED1a | NC_011745_2, 3 | 2 | 17, 13 | 1037, 809 | 28, 28 |
E. coli 53638 | NZ_AAKB02000001_2,6,7 | 3 | 3, 9, 8 | 204, 577, 516 | 28, 29, 29 |
E.coliK12DH10B | NC_010473_4,5 | 2 | 12, 6 | 762, 393 | 29, 28 |
E.coli strain DH1Ec169 | NZ_CP012127_4,5 | 2 | 12, 6 | 762, 393 | 29, 28 |
E.coli 157F8092B_41 | NZ_AVCD01000005_1 | 1 | 3 | 211 | 29 |
E. coli FAP1 | NZ_CP009578_4, 5 | 2 | 5, 4 | 333, 272 | 29, 29 |
E. coli ATCC 8739 | NC_010468_1,2,3,5 | 4 | 12, 15, 21, 3 | 762, 943, 1309, 204 | 29, 29, 29, 28 |
E. coli K-12 MC4100 | NZ_HG738867_5, 6 | 2 | 6, 12 | 393, 762 | 28, 29 |
E. coli strain E455 | NZ_JEND02000002_1, 2 | 2 | 14, 6 | 882, 394 | 29, 29 |
E. coli FHI71 | NZ_LM996841_1, 2 | 2 | 17, 8 | 1365, 516 | 29, 29 |
E. coli BIDMC 2B | NZ_KI929774_1, 2 | 2 | 21, 13 | 1312, 822 | 29, 29 |
E. coli strain 48 | NZ_JPQG01000003_2, 3 | 2 | 14, 15 | 882, 938 | 29, 29 |
E. coli strain IH53473 | NZ_LFZH01000009_2 | 1 | 6 | 393 | 28 |
E. coli LF82 | NC_011993_2, 3 | 2 | 9, 22 | 567, 1349 | 28, 28 |
E.coliFHI87 | NZ_LM997016_2 | 1 | 9 | 575 | 27 |
E. coli BIDMC 17A | NZ_KI929714_2,3 | 2 | 13, 21 | 822, 1312 | 29, 29 |
E. coli strain BIDMC104 | NZ_KQ087916_1, 2 | 2 | 11, 9 | 701, 577 | 29, 29 |
E. coli strain IH57218 | LFZJ01000005_2, 3 | 2 | 6, 5 | 394, 333 | 29, 29 |
E. coli K-12 strain ER3454 | NZ_CP010438_4, 5 | 2 | 12, 6 | 762, 393 | 29, 28 |
E. coli GM4792 | NZ_CP011342_8, 9 | 2 | 6, 12 | 393, 762 | 28, 29 |
E. coli K-12 ER3440 | NZ_CP010439_4,5 | 2 | 12, 6 | 762, 393 | 29, 28 |
E. coli strain BIDMC106 | NZ_KQ087951_1, 2 | 2 | 9, 5 | 574, 333 | 26, 29 |
E. coli strain 6409 | NZ_CP010371_4, 5 | 2 | 10, 6 | 639, 393 | 29, 28 |
E. coli PMV-1 | NC_022370_2, 3 | 2 | 8, 6 | 507, 388 | 28, 28 |
E. coli HS | NC_009800_1 | 1 | 3 | 204 | 28 |
57 CDRs | 566 spacers |
Analysis method
The grouping of CRISPR sequences was performed based on similarity between Consensus Direct Repeat (CDR) sequences of CRISPR loci for each group. Moreover, clustering of groups was done by multiple sequence alignment (MSA) using MEGA4 software. The basis of clustering was that the groups with a similar sequence were located in one cluster. In addition, classification of the spacers was performed based on evolutionary relationships with depicting phylogenetic tree using MEGA4 software. The RNA secondary structures and minimum free energy (MFE) of the direct repeats for each group were investigated using RNA fold web server (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). These structures are described using a loop-based energy model and the dynamic programming algorithm introduced by Zuker 22. Homologous sequences with the spacers were searched by NCBI blastn 23. Cas genes in vicinity of CRISPR loci were searched in the CRISPR database blast (http://crispr.u-psud.fr/crispr/BLAST/CRISPRsBlast.php). In order to find the cas genes, the spacers were blasted against the Genbank databases with a cutoff of 0.1 for the E-value (=0.1) and a matching length of at least 70% of the queried spacer size 21.
Data validation
To predict RNA secondary structures and calculate MFE of the direct repeats in CRISPR loci, RNA fold web server was used. In this server, current limits are 7500 nt for section function calculations and 10,000 nt for minimum free energy only predictions. CRISPR finder server was used to achieve CRISPR loci with the last update on 2014/8/5. CRISPR database contains 150 and 2612 analyzed genomes and 563 and 3502 CRISPRs for archea and bacteria, respectively.
Results
CRISPR loci of Escherichia coli in the CRISPR database
Among the various strains of E. coli in NCBI database, we selected the strains that had CRISPR sequences in the CRISPR database. Finally, 29 different E. coil strains were chosen for further study. Some strains had both confirmed and questionable sequences, from which we used only the confirmed sequences for studying. Only five strains had one CRISPR locus, accounting for 17.2%. The other strains (82.8%) possessed 2–4 loci in their genome. The number of spacers were at a range of 3–22, typically 28–34 bp and the number of direct repeats were between 4 and 23, typically 28–29 bp.
Repeat sequences
Grouping of direct repeats
Since Direct Repeat (DR) sequences in one CRISPR locus are highly similar or identical, in each locus, consensus DR (CDR) sequences were selected for multi sequence alignment analysis. Based on the alignment, 57 CRISPR loci in 29 E. coli strains were divided into 13 groups, each with the same CDR sequences (Table 2).
Group | Number of CDR | DR consensus | Percentage (%) |
---|---|---|---|
1 | 9 | GTGTTCCCCGCGCCAGCGGGGATAAACCG | 15.8 |
2 | 5 | GAGTTCCCCGCGCCAGCGGGGATAAAGCG | 8.8 |
3 | 11 | CGGTTTATCCCCGCTGGCGCGGGGAACAC | 19.3 |
4 | 7 | GTTCACTGCCGTACAGGCAGCTTAGAAA | 12.3 |
5 | 9 | CGGTTTATCCCCGCTGGCGCGGGGAACTC | 15.8 |
6 | 6 | GGTTTATCCCCGCTGGCGCGGGGAACAC | 10.5 |
7 | 2 | GAGTTCCCCGCGCTAGCGGGGATAAACCG | 3.5 |
8 | 1 | GAGTTCCCGGCGCCAGCGGGGATAAACCG | 1.8 |
9 | 2 | GTGTTCCCCGCGCCAGCGGGGATAAACC | 3.5 |
10 | 1 | GAGTTCCCCGCGCCAGCGGGGATAAACC | 1.8 |
11 | 2 | TTTCTAAGCTGCCTGTACGGCAGTGAAC | 3.5 |
12 | 1 | TTTATCCCCGCTGGCGCGGGGAACAC | 1.8 |
13 | 1 | GTTCCCCGCGCCAGCGGGGATAAACCG | 1.8 |
Clustering of direct repeats
For clustering, one CDR of each group was selected as a representative in order to perform multi sequence alignment. The 13 groups were divided into four clusters (Fig. 1). The groups of 1, 2, 7–9, 10, and 13 were placed in the first cluster, all with a common sequence: GTTCCCCGCGC(C/T)AGCGGGGATAAACC. The groups of 3 and 5 were placed in the second cluster, all with a common sequence, including: CGGTTTATCCCCGCTGGCGCGGGGAAC(T/A)C. Moreover, the groups of 4 and 11 were placed in the third cluster, all with a common sequence, including: (G/T)TTC(A/T) (C/A)(T/A)GC(C/T)G(T/C,A/C,C/T,A/G,G/T,G/A)C(A/G)GC(T/A,T/G,A/T)AA(A/C).

Lastly, the groups of 6 and 12 were placed in the fourth cluster, all with a common sequence: T(T/A)T(A/C,T/C)CC(C/G)C(G/T,C/G,T/G,G/C)GCG(C/G)GG(G/A,G/A)AC.
Direct repeats of RNA secondary structure
The RNA secondary structure and MFE were depicted using direct repeat sequences for each of 13 groups through RNA fold web server (Fig. 2). In all groups, RNA secondary structures were composed of two rings at both ends and a stem in the middle. The stem length in group 4 and 11 was 5 and 6 bp, respectively, while in the other groups it was 7 bp.

RNA secondary structure stability of direct repeats
The MFE of groups 4 and 11 (ΔG>−10 kcal mol−1) was more than other groups, which means that RNA secondary structures of the mentioned groups are likely less stable than those of the other groups; since the number of base pairs in stem length of these groups are lower than other groups. The MFE of groups 4 and 11 were −8.60 kcal mol−1 and −9.10 kcal mol−1, respectively. So the RNA secondary structure of group 11 with 6 bps stem length is more stable than group 4 with 5 bps. The other groups with stem length of 7 bps had an MFE lower than groups 4 and 11; therefore, these groups are more stable than groups 4 and 11.
Spacers
Multi sequence alignment of spacers
Totally 566 spacers were found in 57 CRISPR loci in 29 strains of E. coli. The sequence of spacers (Table 3) in each locus was highly different. As a defense system, this diversity is required. Based on the results of MSA, spacers of the CRISPR loci were classified into 37 groups (Fig. 3). Moreover, based on MSA, no conserved nucleotide was found in the spacers of different CRISPR loci.
Spacer ID | Sequence of spacer | Similar phage GI | Similar plasmid |
---|---|---|---|
LFZJ01000005_2 | GCAAAAACCGGGCAATCGCAAAAAGGCGTAAT | 725950134, 712914205 | pEKO1101, pRK1 |
712914839 | |||
NC_009800_1 | AACCTACCGTCTTGGCTAGCGGTTGCAGCGAAC | 713322302 | — |
NC_010468_1 | TTCCGCGACCCGGCGATAAGGGAAGATGGGTG | 712914205 | pEC_B24 |
NC_010473_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 725950304, 418488631 | pNDM-1_Dok01,pAPEC1990_61 |
372199367, 712913174 | pPG010208 | ||
712914839, 422934783 | pNDM10505 | ||
356870600, 849251248 | pUMNK88 | ||
281199644 | |||
NC_011745_2 | ATGCAGCGTTTGTCACTAAAACACTGGTCAAC | 155370093, 543170177 | pHUSEC2011-3 |
pG-09EL50 | |||
NC_011993_2 | AGCAGCTTTCCAGCGAGCGCGGTTAACTCACT | 510953439, 640883453 | pVR50H |
NC_022370_2 | TGACGCCATATGCAGATCATTGAGGCGAAACC | 725950304, 698029054 | — |
NZ_AAKB02000001_2 | ATGGTGGGTGGAGTATGTTACCTGTGAA | 510953439*, 418488631 | p1ColV5155 |
372199367 | pACN001-B | ||
NZ_CP009578_4 | AAAACCAAACTTCTCCATAAATTCCATAGCCG | 712914205, 640884271 | pO111_1 |
408905841, 448260273 | pEQ2 | ||
372199367 | |||
NZ_CP010371_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 418488631, 725950304 | pNDM102337 |
372199367, 712913174 | pAPEC1990_61 | ||
NZ_CP010438_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 725950304, 418488631 | p6409-202.186kb |
372199367, 422934783 | plasmid pSCEC2 | ||
pAPEC1990_61 | |||
NZ_CP010439_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 725950304, 712914839 | p6409-202.186kb |
712913174 | pSCEC2 | ||
NZ_CP011331_1 | GGAACTGGCGCTGCTGGAGCAAAACCCGGTAT | 408905841,712914839 | pVZ321-thrLABC |
593780594 | |||
NZ_CP011342_8 | GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT | 510953439,712914205 | plasmid pRK1 |
pEKO1101 | |||
NZ_CP012127_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 725950304, 418488631 | p6409-202.186kb |
pSCEC2 | |||
NZ_HG738867_5 | GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT | 510953439, 712914839 | pRK1 |
712914205, 725950134 | 378715377 | ||
NZ_JEND02000002_1 | TGTCGGACACCATAATGATACTAAGTGTCGGA | 712914205, 849121398 | pHUSEC2011-1 |
NZ_JPQG01000003_2 | TAATGAGTCAGGGGAATACCGAATATTTTATA | 698029054, 712914839 | pSMS35_8 |
448260273 | pO104 | ||
pIS15_43, pEQ2 | |||
NZ_KI929698_3 | CTCAGCGGCAAAAAATACGATCTCGCCGGTGT | 712914839, 849060471 | pUMNF18_IncFV |
NZ_KI929714_2 | GCCGGAAAATATTCATGATGGGGGTGGTTATGG | 388570360 | pECN580 |
730984989 | pKPC-LKEc | ||
NZ_KI929774_1 | CTCAGCGGCAAAAAATACGATCTCGCCGGTGT | 712914839 | — |
NZ_KQ087916_1 | AAAACCAAACTTCTCCATAAATTCCATAGCCG | 418487051 | pEQ2 |
NZ_KQ087951_1 | GTCAATAGGCGGCGTCCCGTAGCCGTCCCCTTCGG | 510953439 | — |
NZ_KQ087963_2 | ACATGAATGTCGGTTCAGACCGTGTTTTTACC | 422934783 | pO111_2 DNA |
NZ_LFZH01000009_2 | GACAGAACGGCCTCAGTAGTCTCGTCAGGCTCC | — | pCFSAN029787_01 |
NZ_LM996841_1 | AAAACCAAACTTCTCCATAAATTCCATAGCCG | 712914205 | pO111_1 DNA |
640884271 | pEQ1 | ||
NZ_LM997016_2 | GCCATCAGCTATAACACGCGCCGCTTCATCAAGA | 682123018, 593780594 | p3PCN033 |
NZ_LN832404_4 | CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC | 418488631, 712914839 | pSCEC2 |
pPG010208 |

The homologous investigation of spacers
For each CRISPR loci in 29 strains of E. coli, the homologous sequences with spacers were searched by NCBI blast. Almost all spacers had a degree of similarity with some phage genomes and plasmids.
Cas genes near CRISPR loci
Cas genes were searched from 10,000 bp upstream to 10,000 bp downstream the CRISPR loci in the CRISPR database blast. Six cas proteins were found only in three strains (E. coli PMV-1, CRISPR ID: NC_022370_2, E. coli ATCC 8739, CRISPR ID: NC_010468 in fwd strand, and E. coli HS, CRISPR ID: NC_009800_1 in rev strand). The CRISPR of E. coli (PMV-1) was subtype I-F/YPEST. CRISPR's associated proteins in the vicinity of the CRISPR of E. coli PMV-1 were endonuclease Cas1, helicase Cas3, protein Csy1, protein Csy2, protein Csy3, and protein Cas6/Csy4. Cas genes in strain E. coli (ATCC 8739) were cas1, cas2, cas3, cas5, cse1, cse2, cse3, cse4, and cas genes in E. coli strain HS were cas1, cse2, cse3, cse4, cas5 subtype E. coli (Fig 4).

Discussion
CRISPR system was initially identified in E. coli and has been recently discovered in many bacteria and most archaea. This system provides acquired resistance against viruses, plasmids, and bacteriophages, likely through an RNA interference-like mechanism 5, 24. CRISPR loci were searched in the circular chromosome of E. coli strains using CRISPRs web server. Usually, these sequences are found on circular chromosomes of bacteria and archea; however, some kinds of them have been detected on several plasmids 25. CRISPR web server is the first proprietary online server in order to find and analyze CRISPR sequences and cas genes, which are located in vicinity of them 25. At the moment, CRISPR database contains 150 analyzed genomes and 563 CRISPRs for archea and 2612 analyzed genomes and 3502 CRISPRs for bacteria. A cas gene database has been characterized by Haft et al. 26; however, no information has been mentioned about CRISPR sequences. Most E. coli strains include several questionable CRISPR loci in their genome; however, only confirmed sequences were used in this study. Numerous spacers and direct repeats were found in most CRISPR sequences. Additionally, in most strains the length of the sequences were long; in this context, Horvath et al. studies indicated that the longer CRISPR sequences are probably more active than the short sequences 10. So this characteristic can be used to differentiate active CRISPR sequences from the non-active ones in some strains.
Since the CRISPR systems are responsive to the environment, it likely plays a main role in host adaptation with its surroundings, so it can explain the stability of specific bacterial strains in different ecosystems 10. Perhaps this is one of the reasons for the fact that E. coli strains are able to survive in different environments and hosts. Moreover, the evidences resulted from studies of Sulfolobus conjugative plasmids support the idea that plasmids containing repeat clusters are more stable in host cells; so, it can reduce the stability of some pathogenic strains through inactivation of specific CRISPR loci in host or in a particular environment 27.
On the genome of a specific species, several CRISPR loci can exist separately. For example, in Methanocaldococcus jannaschii, 18 loci have been identified on its genome 6, 10. So far, two CRISPR systems have been identified in E. coli strains, in each of them, two types have been detected that contain CRISPR 1 and -2 and Ypest system; moreover, the last system includes CRISPR 3 and -4 3. In this study, 82.8% of strains included 2–4 loci in their genome that probably two systems could exist in these strains, while the remaining strains contained one locus on their chromosome that appears to be presented only one system.
The similarities of spacer sequences were also investigated with plasmids and phages genome in NCBI database. Based on previous studies, the spacer sequences have been originated from foreign elements such as phages and plasmids 8, 28, 29. Actually, the ability of CRISPR system in acquiring new spacers from phages and plasmids and consequently, defense against them in the future is a unique feature of this system 7. Our results are consistent with these findings and indicate that the spacers are derived from foreign elements. The higher number of bacteriophages versus plasmids that were similar to the spacers, represents the high amount of phages attack in the evolutionary pathway of strains and the significance of phages in the acquisition process of new spacers. This property can be applied in order to identify some strains using phages that the spacers were derived from them.
A total of 566 spacers were found in CRISPR loci of E. coli strains, and only three strains included 3–5 spacers while the other strains contained more than five spacers. The average length of spacers were 31 bp, ranging from 28 to 34. Diversity in the length and sequence of spacers affects the activity of CRISPR systems in bacteria 10, 30. In Di et al. studies, CRISPR loci containing a more number of the spacers with length of 30 bp were more active than loci containing a less number with length of 36 bp, which indicates the effect of the number and length of the spacer on activity of CRISPR loci 30. In our study, all strains possess spacers with an average length of 31 bp; therefore, CRISPR loci in our selected strains are likely more active than previously studied strains with shorter spacers. In addition, no conserved nucleotide was observed in the spacers of strains. Actually, these findings indicate the specificity of CRISPR/Cas system in responding to extrachromosomal elements.
There can be one or several modified nucleotides in direct repeats of different CRISPR loci, but they are typically conserved. When a CRISPR locus receives a new spacer, internal spacers are usually deleted probably by homologous recombination between CRISPR direct repeats in order to help in limiting the size of CRISPR sequences. The repeats can undergo polymorphism, particularly in the terminal repeat, in which sequence degeneracy has been observed at their 3′ ends 10. This observation is especially essential for the correct interpretation and location of CRISPR loci, since the last spacer/repeat unit, containing the terminal repeats are often lost. The repeat location seems to be consistent with the location of nearby cas genes. In addition, variations within the repeat sequences can be seen throughout a CRISPR locus 10. In most studied E. coli strains, direct repeats were almost conserved; for this reason, most of the groups were located in the first cluster. So it can be inferred that polymorphism in this cluster has less likely happened than other clusters, which indicates that the presence of cas genes around the CRISPR loci in this cluster is probably less.
The RNA secondary structure and the MFE of the direct repeats were also investigated. CRISPR repeats have a partially palindromic character. Therefore, they can constitute stable hairpin-like secondary forms 10. Moreover, it seems that CRISPR sequences are transcribed to a single-stranded RNA molecule. With the progress of transcription, two sequential single-stranded RNA can interact and form secondary structures by pairing head to foot 10. In all 29 strains, RNA secondary structures of directs repeats included a low MFE (ΔG<−10 kcal mol−1, except for groups 4 and 11), therefore, they can create a stable structure. The structures with a lower minimum free energy are more stable than those with higher MFE value 31. In this context, Kunin et al. indicated that stem-loop structures of some direct repeats probably act to facilitate the contact between the foreign RNA or DNA targeting spacer and cas-encoded proteins 13. Moreover, stability of RNA secondary structures may impress the function of CRISPR loci 5.
Furthermore, the cas genes that are located in vicinity of CRISPR loci were searched through CRISPR database. According to our search, the cas genes were found only in three strains. These findings are similar to Yang et al. studies 5, which was performed in 32 Staphylococcus aureus strains; only two strains included cas genes in vicinity of their CRISPR loci. CRISPR/Cas system can be transferred among different but related species 5; so the cas genes may be transferred to E. coli strains PMV-1, ATCC 8739, HS from other species. It can be considered that the CRISPR systems of the other E. coli strains are inactive at present, because when cas genes are deactivated in a certain CRISPR locus or are not present, the ability of this locus to supply resistance and integrate new spacers is lost 10.
In the studied E. coli strains, direct repeats were highly conserved in one locus, while the spacers were variable. Since the spacers are used in order to protect against different exogenous elements, they must be variable. According to the variable nature of the spacers, they can be used to identify strains. In general, the dynamic characteristic of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. Our research indicated that direct repeats are not conserved completely in different strains, and may be different in one or several nucleotides. The nature of repeat sequences affects the activity of CRISPR system through formation of stable RNA secondary structures. Moreover, the cas genes may not be present in all CRISPR systems of E. coli strains.
Acknowledgments
This study was supported by a Grant from the Research Council of Shiraz University of Medical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran.
Conflict of interest
The authors declare that they have no conflict of interest.