Volume 56, Issue 6 pp. 645-653
Research Paper
Full Access

Studying the features of 57 confirmed CRISPR loci in 29 strains of Escherichia coli

Seyyed Soheil Rahmatabadi

Seyyed Soheil Rahmatabadi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author
Navid Nezafat

Navid Nezafat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author
Manica Negahdaripour

Manica Negahdaripour

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author
Nasim Hajighahramani

Nasim Hajighahramani

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author
Mohammad Hossein Morowvat

Mohammad Hossein Morowvat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author
Younes Ghasemi

Corresponding Author

Younes Ghasemi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

Correspondence: Younes Ghasemi, Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

E-mail: [email protected]

Phone/Fax: +98 7112426729

Search for more papers by this author
First published: 12 February 2016
Citations: 19

Abstract

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) system is a novel type of innate defense system in prokaryotes for destruction of exogenous elements. To gain further insight into behavior and organization of the system, the extensive analysis of the available sequenced genomes is necessary. The dynamic nature of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. There are a few orderly bioinformatics investigations about the structure of CRISPR sequences in the Escherichia coli strains. In this study, 57 CRISPR loci were selected from 32 Escherichia coli strains to investigate their structural characteristics and potential functions using bioinformatics tools. Our results showed that most strains contained several loci that mainly included conserved direct repeats, while the spacers were highly variable. Moreover, RNA analysis of the sequences indicated that all loci could form stable RNA secondary structures and showed homology mostly with phages compared to plasmids. Only three strains included cas genes around their loci.

Abbreviations

  • CAS-CRISPR
  • associated genes
  • CDR
  • consensus direct repeat
  • CRISPR
  • clustered regularly interspaced short palindromic repeats
  • MFE
  • minimum free energy
  • MSA
  • multiple sequence alignment
  • siRNA
  • small interfering RNA
  • smRNA
  • small RNAs
  • Introduction

    Microorganisms are usually invaded by exogenous agents such as viruses, plasmids, and other destructive attacking mobile genetic elements; so they have developed a type of innate defense system to put down these predators 1. Escherichia coli is a very diverse bacterial species and contains strains that exist in different hosts and in a wide range of environments 2. The ability to survive in various environments is related to their high degree of adaptability 3. E. coli strains possess a type of defense system, which has been newly discovered in about 40% of bacteria and most of archaea 4-6, containing two main parts: (i) CRISPR (clustered regularly interspaced short palindromic repeats) and (ii) CAS (CRISPR-associated genes) proteins [3,7,8], which functions as a part of immune system in the fighting with microorganisms, bacteriophages, viruses, and plasmids or any alien DNA [5,95]. A part of this adaptability in E. coli strains can be related to CRISPR sequences 10. These sequences have been constructed from direct repeats of nearly identical sequences interspaced by unique spacers with similar sizes [4,5]. CRISPRs code no product, but different studies have been indicated that they are transcribed to small RNAs (smRNA), acting as small interfering RNA (siRNA) to hamper the entry of alien DNA 11. The number of CRISPR groups and the spacers is greatly variable, but the sequences of direct repeats are greatly conserved. The spacer sequences are mostly similar to plasmids and phages sequences, therefore, it is suggested that the spacers are derived from plasmids or phages 9. As mentioned above, CRISPR complex comprises a part of the immune system, and the spacers play a key role in defense against plasmids and phages 3, 9. Earlier studies of various E. coli strains have demonstrated a vast hetero­geneity in spacers and CRISPR/CAS content 12. Although, spacers possess a similar length in one locus, they have varied nucleotide compositions 3. In a specific CRISPR locus, the repeats always contain palindromic structures and may constitute RNA secondary made up loops and stems 13. Moreover, CRISPR loci indicate great polymorphism in different strains, and this property is applied for identification of clinical strains of Mycobacterium tuberculosis, Streptoco­ccus pyogenes, and Campylobacter jejuni 14. CRISPR/Cas system currently have many applications in different fields including: rapid creation of cellular and animal models, research, medicine, and biotechnology. One of the most important applications of CRISPR system is in genome editing field. It provides a targeted and efficient change in a variety of eukaryotic and special mammalian species. Using this system, DNA sequences in the endogenous genome and their functional products are now simply changed or modulated in nearly any organism of choice 15. For this reason, CRISPR/Cas system is the topic of numerous studies due to their interesting RNA-based action mechanism 16. Presently, instead of some expensive and lengthy practical steps 17, the on-hand bioinformatics tools may help investigators in various areas 18-20 to begin their experiments with some projections. To gain further insight about the behavior and organization of CRISPRs, the extensive analysis of available sequenced genomes is necessary. The systematic study of CRISPR structures may help us to find out more potential roles for these sequences in bacteria. Collectively, a few orderly bioinformatics investigations are available about the structure of CRISPR in the E. coli strains. In this study, we used 57 identified and confirmed CRISPR loci to investigate their structural characteristics and potential functions in E. coli strains using bioinformatics tools. CRISPR loci were categorized based on similarity between direct repeats and evolutionary relationships of spacers. Moreover, we searched the homology of spacer sequences with bacteriophage genomes and plasmids, and investigated the presence of the cas genes around CRISPR loci, in addition to predicted RNA secondary structures of the direct repeats and their stabilities.

    Materials and methods

    Sequence collection

    Different E. coli strains genome were searched through National Center for Biotechnology Information (NCBI) nucleotide database (http://www.ncbi.nlm.nih.gov/) with default parameters; then E. coli CRISPR loci (Table 1) were searched by CRISPR finder server (E-value ≤0.1) (http://crispr.u-psud.fr/Server/). CRISPR finder allows us to identify the structures with the basic characteristics of CRISPRs 21.

    Table 1. Characterstic of E. coli strains CRISPR loci used in study
    Strain CRISPR id Number of CRISPR Number of spacers CRISPR length DR length
    E. coli BIDMC 19A NZ_KI929698_3, 4 2 21, 10 1312, 638 29, 28
    E. coliK12AG100 NZ_LN832404_4, 5 2 12, 6 762, 393 29, 28
    E. coli O104:H4 str. C227-11 NZ_CP011331_1 1 13 821 29
    E. coli strain BIDMC112 NZ_KQ087963_2, 4, 5 3 4, 8, 5 273, 516, 333 29, 29, 29
    E. coli strain ED1a NC_011745_2, 3 2 17, 13 1037, 809 28, 28
    E. coli 53638 NZ_AAKB02000001_2,6,7 3 3, 9, 8 204, 577, 516 28, 29, 29
    E.coliK12DH10B NC_010473_4,5 2 12, 6 762, 393 29, 28
    E.coli strain DH1Ec169 NZ_CP012127_4,5 2 12, 6 762, 393 29, 28
    E.coli 157F8092B_41 NZ_AVCD01000005_1 1 3 211 29
    E. coli FAP1 NZ_CP009578_4, 5 2 5, 4 333, 272 29, 29
    E. coli ATCC 8739 NC_010468_1,2,3,5 4 12, 15, 21, 3 762, 943, 1309, 204 29, 29, 29, 28
    E. coli K-12 MC4100 NZ_HG738867_5, 6 2 6, 12 393, 762 28, 29
    E. coli strain E455 NZ_JEND02000002_1, 2 2 14, 6 882, 394 29, 29
    E. coli FHI71 NZ_LM996841_1, 2 2 17, 8 1365, 516 29, 29
    E. coli BIDMC 2B NZ_KI929774_1, 2 2 21, 13 1312, 822 29, 29
    E. coli strain 48 NZ_JPQG01000003_2, 3 2 14, 15 882, 938 29, 29
    E. coli strain IH53473 NZ_LFZH01000009_2 1 6 393 28
    E. coli LF82 NC_011993_2, 3 2 9, 22 567, 1349 28, 28
    E.coliFHI87 NZ_LM997016_2 1 9 575 27
    E. coli BIDMC 17A NZ_KI929714_2,3 2 13, 21 822, 1312 29, 29
    E. coli strain BIDMC104 NZ_KQ087916_1, 2 2 11, 9 701, 577 29, 29
    E. coli strain IH57218 LFZJ01000005_2, 3 2 6, 5 394, 333 29, 29
    E. coli K-12 strain ER3454 NZ_CP010438_4, 5 2 12, 6 762, 393 29, 28
    E. coli GM4792 NZ_CP011342_8, 9 2 6, 12 393, 762 28, 29
    E. coli K-12 ER3440 NZ_CP010439_4,5 2 12, 6 762, 393 29, 28
    E. coli strain BIDMC106 NZ_KQ087951_1, 2 2 9, 5 574, 333 26, 29
    E. coli strain 6409 NZ_CP010371_4, 5 2 10, 6 639, 393 29, 28
    E. coli PMV-1 NC_022370_2, 3 2 8, 6 507, 388 28, 28
    E. coli HS NC_009800_1 1 3 204 28
    57 CDRs 566 spacers

    Analysis method

    The grouping of CRISPR sequences was performed based on similarity between Consensus Direct Repeat (CDR) sequences of CRISPR loci for each group. Moreover, clustering of groups was done by multiple sequence alignment (MSA) using MEGA4 software. The basis of clustering was that the groups with a similar sequence were located in one cluster. In addition, classification of the spacers was performed based on evolutionary relationships with depicting phylogenetic tree using MEGA4 software. The RNA seco­ndary structures and minimum free energy (MFE) of the direct repeats for each group were investigated using RNA fold web server (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). These structures are described using a loop-based energy model and the dynamic programming algorithm introduced by Zuker 22. Homologous sequences with the spacers were searched by NCBI blastn 23. Cas genes in vicinity of CRISPR loci were searched in the CRISPR database blast (http://crispr.u-psud.fr/crispr/BLAST/CRISPRsBlast.php). In order to find the cas genes, the spacers were blasted against the Genbank databases with a cutoff of 0.1 for the E-value (=0.1) and a matching length of at least 70% of the queried spacer size 21.

    Data validation

    To predict RNA secondary structures and calculate MFE of the direct repeats in CRISPR loci, RNA fold web server was used. In this server, current limits are 7500 nt for section function calculations and 10,000 nt for minimum free energy only predictions. CRISPR finder server was used to achieve CRISPR loci with the last update on 2014/8/5. CRISPR database contains 150 and 2612 analyzed genomes and 563 and 3502 CRISPRs for archea and bacteria, respectively.

    Results

    CRISPR loci of Escherichia coli in the CRISPR database

    Among the various strains of E. coli in NCBI database, we selected the strains that had CRISPR sequences in the CRISPR database. Finally, 29 different E. coil strains were chosen for further study. Some strains had both confirmed and questionable sequences, from which we used only the confirmed sequences for studying. Only five strains had one CRISPR locus, accounting for 17.2%. The other strains (82.8%) possessed 2–4 loci in their genome. The number of spacers were at a range of 3–22, typically 28–34 bp and the number of direct repeats were between 4 and 23, typically 28–29 bp.

    Repeat sequences

    Grouping of direct repeats

    Since Direct Repeat (DR) sequences in one CRISPR locus are highly similar or identical, in each locus, consensus DR (CDR) sequences were selected for multi sequence alignment analysis. Based on the alignment, 57 CRISPR loci in 29 E. coli strains were divided into 13 groups, each with the same CDR sequences (Table 2).

    Table 2. Grouping of direct repeats based on similarity between them and the number and percentage of CDRs located in each of groups
    Group Number of CDR DR consensus Percentage (%)
    1 9 GTGTTCCCCGCGCCAGCGGGGATAAACCG 15.8
    2 5 GAGTTCCCCGCGCCAGCGGGGATAAAGCG 8.8
    3 11 CGGTTTATCCCCGCTGGCGCGGGGAACAC 19.3
    4 7 GTTCACTGCCGTACAGGCAGCTTAGAAA 12.3
    5 9 CGGTTTATCCCCGCTGGCGCGGGGAACTC 15.8
    6 6 GGTTTATCCCCGCTGGCGCGGGGAACAC 10.5
    7 2 GAGTTCCCCGCGCTAGCGGGGATAAACCG 3.5
    8 1 GAGTTCCCGGCGCCAGCGGGGATAAACCG 1.8
    9 2 GTGTTCCCCGCGCCAGCGGGGATAAACC 3.5
    10 1 GAGTTCCCCGCGCCAGCGGGGATAAACC 1.8
    11 2 TTTCTAAGCTGCCTGTACGGCAGTGAAC 3.5
    12 1 TTTATCCCCGCTGGCGCGGGGAACAC 1.8
    13 1 GTTCCCCGCGCCAGCGGGGATAAACCG 1.8

    Clustering of direct repeats

    For clustering, one CDR of each group was selected as a representative in order to perform multi sequence alignment. The 13 groups were divided into four clusters (Fig. 1). The groups of 1, 2, 7–9, 10, and 13 were placed in the first cluster, all with a common sequence: GTTCCCCGCGC(C/T)AGCGGGGATAAACC. The groups of 3 and 5 were placed in the second cluster, all with a common sequence, including: CGGTTTATCCCCGCTGGCGCGGGGAAC(T/A)C. Moreover, the groups of 4 and 11 were placed in the third cluster, all with a common sequence, including: (G/T)TTC(A/T) (C/A)(T/A)GC(C/T)G(T/C,A/C,C/T,A/G,G/T,G/A)C(A/G)GC(T/A,T/G,A/T)AA(A/C).

    Details are in the caption following the image
    The clustering result of the groups based on the common base pairs in CDRs using multi sequence alignment. One CDR of each group was selected as a representative in order to perform multi sequence alignment for clustering of the groups. ★ Represented the similar base pairs. The numbers from 1 to 13 represented the 13 groups of CDRs.

    Lastly, the groups of 6 and 12 were placed in the fourth cluster, all with a common sequence: T(T/A)T(A/C,T/C)CC(C/G)C(G/T,C/G,T/G,G/C)GCG(C/G)GG(G/A,G/A)AC.

    Direct repeats of RNA secondary structure

    The RNA secondary structure and MFE were depicted using direct repeat sequences for each of 13 groups through RNA fold web server (Fig. 2). In all groups, RNA secondary structures were composed of two rings at both ends and a stem in the middle. The stem length in group 4 and 11 was 5 and 6 bp, respectively, while in the other groups it was 7 bp.

    Details are in the caption following the image
    The RNA secondary structures and the MFE of the direct repeats of 13 groups. The numbers indicate MFE that the structures with a lower MFE are more stable than those with higher MFE value.

    RNA secondary structure stability of direct repeats

    The MFE of groups 4 and 11 (ΔG>−10 kcal mol−1) was more than other groups, which means that RNA secondary structures of the mentioned groups are likely less stable than those of the other groups; since the number of base pairs in stem length of these groups are lower than other groups. The MFE of groups 4 and 11 were −8.60 kcal mol−1 and −9.10 kcal mol−1, respectively. So the RNA secondary structure of group 11 with 6 bps stem length is more stable than group 4 with 5 bps. The other groups with stem length of 7 bps had an MFE lower than groups 4 and 11; therefore, these groups are more stable than groups 4 and 11.

    Spacers

    Multi sequence alignment of spacers

    Totally 566 spacers were found in 57 CRISPR loci in 29 strains of E. coli. The sequence of spacers (Table 3) in each locus was highly different. As a defense system, this diversity is required. Based on the results of MSA, spacers of the CRISPR loci were classified into 37 groups (Fig. 3). Moreover, based on MSA, no conserved nucleotide was found in the spacers of different CRISPR loci.

    Table 3. Genetic elements showing similarity to spacer sequences
    Spacer ID Sequence of spacer Similar phage GI Similar plasmid
    LFZJ01000005_2 GCAAAAACCGGGCAATCGCAAAAAGGCGTAAT 725950134, 712914205 pEKO1101, pRK1
    712914839
    NC_009800_1 AACCTACCGTCTTGGCTAGCGGTTGCAGCGAAC 713322302
    NC_010468_1 TTCCGCGACCCGGCGATAAGGGAAGATGGGTG 712914205 pEC_B24
    NC_010473_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 725950304, 418488631 pNDM-1_Dok01,pAPEC1990_61
    372199367, 712913174 pPG010208
    712914839, 422934783 pNDM10505
    356870600, 849251248 pUMNK88
    281199644
    NC_011745_2 ATGCAGCGTTTGTCACTAAAACACTGGTCAAC 155370093, 543170177 pHUSEC2011-3
    pG-09EL50
    NC_011993_2 AGCAGCTTTCCAGCGAGCGCGGTTAACTCACT 510953439, 640883453 pVR50H
    NC_022370_2 TGACGCCATATGCAGATCATTGAGGCGAAACC 725950304, 698029054
    NZ_AAKB02000001_2 ATGGTGGGTGGAGTATGTTACCTGTGAA 510953439*, 418488631 p1ColV5155
    372199367 pACN001-B
    NZ_CP009578_4 AAAACCAAACTTCTCCATAAATTCCATAGCCG 712914205, 640884271 pO111_1
    408905841, 448260273 pEQ2
    372199367
    NZ_CP010371_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 418488631, 725950304 pNDM102337
    372199367, 712913174 pAPEC1990_61
    NZ_CP010438_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 725950304, 418488631 p6409-202.186kb
    372199367, 422934783 plasmid pSCEC2
    pAPEC1990_61
    NZ_CP010439_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 725950304, 712914839 p6409-202.186kb
    712913174 pSCEC2
    NZ_CP011331_1 GGAACTGGCGCTGCTGGAGCAAAACCCGGTAT 408905841,712914839 pVZ321-thrLABC
    593780594
    NZ_CP011342_8 GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT 510953439,712914205 plasmid pRK1
    pEKO1101
    NZ_CP012127_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 725950304, 418488631 p6409-202.186kb
    pSCEC2
    NZ_HG738867_5 GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT 510953439, 712914839 pRK1
    712914205, 725950134 378715377
    NZ_JEND02000002_1 TGTCGGACACCATAATGATACTAAGTGTCGGA 712914205, 849121398 pHUSEC2011-1
    NZ_JPQG01000003_2 TAATGAGTCAGGGGAATACCGAATATTTTATA 698029054, 712914839 pSMS35_8
    448260273 pO104
    pIS15_43, pEQ2
    NZ_KI929698_3 CTCAGCGGCAAAAAATACGATCTCGCCGGTGT 712914839, 849060471 pUMNF18_IncFV
    NZ_KI929714_2 GCCGGAAAATATTCATGATGGGGGTGGTTATGG 388570360 pECN580
    730984989 pKPC-LKEc
    NZ_KI929774_1 CTCAGCGGCAAAAAATACGATCTCGCCGGTGT 712914839
    NZ_KQ087916_1 AAAACCAAACTTCTCCATAAATTCCATAGCCG 418487051 pEQ2
    NZ_KQ087951_1 GTCAATAGGCGGCGTCCCGTAGCCGTCCCCTTCGG 510953439
    NZ_KQ087963_2 ACATGAATGTCGGTTCAGACCGTGTTTTTACC 422934783 pO111_2 DNA
    NZ_LFZH01000009_2 GACAGAACGGCCTCAGTAGTCTCGTCAGGCTCC pCFSAN029787_01
    NZ_LM996841_1 AAAACCAAACTTCTCCATAAATTCCATAGCCG 712914205 pO111_1 DNA
    640884271 pEQ1
    NZ_LM997016_2 GCCATCAGCTATAACACGCGCCGCTTCATCAAGA 682123018, 593780594 p3PCN033
    NZ_LN832404_4 CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC 418488631, 712914839 pSCEC2
    pPG010208
    Details are in the caption following the image
    Grouping result of spacer sequences based on evolutionary relationship. The numbers from 1 to 37 represented the 37 groups. Strains locating in one group indicate most evolutionary similarity. The evolutionary distance scale is 0.5. The spacer ID represented the corresponding spacers.

    The homologous investigation of spacers

    For each CRISPR loci in 29 strains of E. coli, the homologous sequences with spacers were searched by NCBI blast. Almost all spacers had a degree of similarity with some phage genomes and plasmids.

    Cas genes near CRISPR loci

    Cas genes were searched from 10,000 bp upstream to 10,000 bp downstream the CRISPR loci in the CRISPR database blast. Six cas proteins were found only in three strains (E. coli PMV-1, CRISPR ID: NC_022370_2, E. coli ATCC 8739, CRISPR ID: NC_010468 in fwd strand, and E. coli HS, CRISPR ID: NC_009800_1 in rev strand). The CRISPR of E. coli (PMV-1) was subtype I-F/YPEST. CRISPR's associated proteins in the vicinity of the CRISPR of E. coli PMV-1 were endonuclease Cas1, helicase Cas3, protein Csy1, protein Csy2, protein Csy3, and protein Cas6/Csy4. Cas genes in strain E. coli (ATCC 8739) were cas1, cas2, cas3, cas5, cse1, cse2, cse3, cse4, and cas genes in E. coli strain HS were cas1, cse2, cse3, cse4, cas5 subtype E. coli (Fig 4).

    Details are in the caption following the image
    The cas genes in vicinity of CRISPR loci. Cas genes are searched from 10,000 bp upstream to 10,000 bp downstream the CRISPR sequence. In CRISPRs system of E. coli strain PMV-1 and strain ATCC 8739 the cas genes have located in different situations to 10,000 bp in upstream of CRISPR sequence while in strain HS have located in dowmstream of the CRISPR sequence.

    Discussion

    CRISPR system was initially identified in E. coli and has been recently discovered in many bacteria and most archaea. This system provides acquired resistance against viruses, plasmids, and bacteriophages, likely through an RNA interference-like mechanism 5, 24. CRISPR loci were searched in the circular chromosome of E. coli strains using CRISPRs web server. Usually, these sequences are found on circular chromosomes of bacteria and archea; however, some kinds of them have been detected on several plasmids 25. CRISPR web server is the first proprietary online server in order to find and analyze CRISPR sequences and cas genes, which are located in vicinity of them 25. At the moment, CRISPR database contains 150 analyzed genomes and 563 CRISPRs for archea and 2612 analyzed genomes and 3502 CRISPRs for bacteria. A cas gene database has been characterized by Haft et al. 26; however, no information has been mentioned about CRISPR sequences. Most E. coli strains include several questionable CRISPR loci in their genome; however, only confirmed sequences were used in this study. Numerous spacers and direct repeats were found in most CRISPR sequences. Additionally, in most strains the length of the sequences were long; in this context, Horvath et al. studies indicated that the longer CRISPR sequences are probably more active than the short sequences 10. So this characteristic can be used to differentiate active CRISPR sequences from the non-active ones in some strains.

    Since the CRISPR systems are responsive to the environment, it likely plays a main role in host adaptation with its surroundings, so it can explain the stability of specific bacterial strains in different ecosystems 10. Perhaps this is one of the reasons for the fact that E. coli strains are able to survive in different environments and hosts. Moreover, the evidences resulted from studies of Sulfolobus conjugative plasmids support the idea that plasmids containing repeat clusters are more stable in host cells; so, it can reduce the stability of some pathogenic strains through inactivation of specific CRISPR loci in host or in a particular environment 27.

    On the genome of a specific species, several CRISPR loci can exist separately. For example, in Methanocaldococcus jannaschii, 18 loci have been identified on its genome 6, 10. So far, two CRISPR systems have been identified in E. coli strains, in each of them, two types have been detected that contain CRISPR 1 and -2 and Ypest system; moreover, the last system includes CRISPR 3 and -4 3. In this study, 82.8% of strains included 2–4 loci in their genome that probably two systems could exist in these strains, while the remaining strains contained one locus on their chromosome that appears to be presented only one system.

    The similarities of spacer sequences were also investigated with plasmids and phages genome in NCBI database. Based on previous studies, the spacer sequences have been originated from foreign elements such as phages and plasmids 8, 28, 29. Actually, the ability of CRISPR system in acquiring new spacers from phages and plasmids and consequently, defense against them in the future is a unique feature of this system 7. Our results are consistent with these findings and indicate that the spacers are derived from foreign elements. The higher number of bacteriophages versus plasmids that were similar to the spacers, represents the high amount of phages attack in the evolutionary pathway of strains and the significance of phages in the acquisition process of new spacers. This property can be applied in order to identify some strains using phages that the spacers were derived from them.

    A total of 566 spacers were found in CRISPR loci of E. coli strains, and only three strains included 3–5 spacers while the other strains contained more than five spacers. The average length of spacers were 31 bp, ranging from 28 to 34. Diversity in the length and sequence of spacers affects the activity of CRISPR systems in bacteria 10, 30. In Di et al. studies, CRISPR loci containing a more number of the spacers with length of 30 bp were more active than loci containing a less number with length of 36 bp, which indicates the effect of the number and length of the spacer on activity of CRISPR loci 30. In our study, all strains possess spacers with an average length of 31 bp; therefore, CRISPR loci in our selected strains are likely more active than previously studied strains with shorter spacers. In addition, no conserved nucleotide was observed in the spacers of strains. Actually, these findings indicate the specificity of CRISPR/Cas system in responding to extrachromosomal elements.

    There can be one or several modified nucleotides in direct repeats of different CRISPR loci, but they are typically conserved. When a CRISPR locus receives a new spacer, internal spacers are usually deleted probably by homologous recombination between CRISPR direct repeats in order to help in limiting the size of CRISPR sequences. The repeats can undergo polymorphism, particularly in the terminal repeat, in which sequence degeneracy has been observed at their 3′ ends 10. This observation is especially essential for the correct interpretation and location of CRISPR loci, since the last spacer/repeat unit, containing the terminal repeats are often lost. The repeat location seems to be consistent with the location of nearby cas genes. In addition, variations within the repeat sequences can be seen throughout a CRISPR locus 10. In most studied E. coli strains, direct repeats were almost conserved; for this reason, most of the groups were located in the first cluster. So it can be inferred that polymorphism in this cluster has less likely happened than other clusters, which indicates that the presence of cas genes around the CRISPR loci in this cluster is probably less.

    The RNA secondary structure and the MFE of the direct repeats were also investigated. CRISPR repeats have a partially palindromic character. Therefore, they can constitute stable hairpin-like secondary forms 10. Moreover, it seems that CRISPR sequences are transcribed to a single-stranded RNA molecule. With the progress of transcription, two sequential single-stranded RNA can interact and form secondary structures by pairing head to foot 10. In all 29 strains, RNA secondary structures of directs repeats included a low MFE (ΔG<−10 kcal mol−1, except for groups 4 and 11), therefore, they can create a stable structure. The structures with a lower minimum free energy are more stable than those with higher MFE value 31. In this context, Kunin et al. indicated that stem-loop structures of some direct repeats probably act to facilitate the contact between the foreign RNA or DNA targeting spacer and cas-encoded proteins 13. Moreover, stability of RNA secondary structures may impress the function of CRISPR loci 5.

    Furthermore, the cas genes that are located in vicinity of CRISPR loci were searched through CRISPR database. According to our search, the cas genes were found only in three strains. These findings are similar to Yang et al. studies 5, which was performed in 32 Staphylococcus aureus strains; only two strains included cas genes in vicinity of their CRISPR loci. CRISPR/Cas system can be transferred among different but related species 5; so the cas genes may be transferred to E. coli strains PMV-1, ATCC 8739, HS from other species. It can be considered that the CRISPR systems of the other E. coli strains are inactive at present, because when cas genes are deactivated in a certain CRISPR locus or are not present, the ability of this locus to supply resistance and integrate new spacers is lost 10.

    In the studied E. coli strains, direct repeats were highly conserved in one locus, while the spacers were variable. Since the spacers are used in order to protect against different exogenous elements, they must be variable. According to the variable nature of the spacers, they can be used to identify strains. In general, the dynamic characteristic of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. Our research indicated that direct repeats are not conserved completely in different strains, and may be different in one or several nucleotides. The nature of repeat sequences affects the activity of CRISPR system through formation of stable RNA secondary structures. Moreover, the cas genes may not be present in all CRISPR systems of E. coli strains.

    Acknowledgments

    This study was supported by a Grant from the Research Council of Shiraz University of Medical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran.

      Conflict of interest

      The authors declare that they have no conflict of interest.

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.