Volume 56, Issue 6 pp. 645-653

Research Paper

Full Access

Studying the features of 57 confirmed CRISPR loci in 29 strains of Escherichia coli

Seyyed Soheil Rahmatabadi,

Seyyed Soheil Rahmatabadi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Navid Nezafat,

Navid Nezafat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Manica Negahdaripour,

Manica Negahdaripour

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Nasim Hajighahramani,

Nasim Hajighahramani

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Mohammad Hossein Morowvat,

Mohammad Hossein Morowvat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Younes Ghasemi,

Corresponding Author

Younes Ghasemi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

Correspondence: Younes Ghasemi, Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

E-mail: [email protected]

Phone/Fax: +98 7112426729

Search for more papers by this author

Seyyed Soheil Rahmatabadi,

Seyyed Soheil Rahmatabadi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Navid Nezafat,

Navid Nezafat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Manica Negahdaripour,

Manica Negahdaripour

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Nasim Hajighahramani,

Nasim Hajighahramani

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Mohammad Hossein Morowvat,

Mohammad Hossein Morowvat

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Search for more papers by this author

Younes Ghasemi,

Corresponding Author

Younes Ghasemi

Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

Pharmaceutical Science Research Center, Shiraz University of Medical Science, Shiraz, Iran

Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran

Correspondence: Younes Ghasemi, Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran

E-mail: [email protected]

Phone/Fax: +98 7112426729

Search for more papers by this author

First published: 12 February 2016

https://doi.org/10.1002/jobm.201500707

Citations: 19

Share a link

Email
Wechat
Bluesky

Abstract

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) system is a novel type of innate defense system in prokaryotes for destruction of exogenous elements. To gain further insight into behavior and organization of the system, the extensive analysis of the available sequenced genomes is necessary. The dynamic nature of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. There are a few orderly bioinformatics investigations about the structure of CRISPR sequences in the Escherichia coli strains. In this study, 57 CRISPR loci were selected from 32 Escherichia coli strains to investigate their structural characteristics and potential functions using bioinformatics tools. Our results showed that most strains contained several loci that mainly included conserved direct repeats, while the spacers were highly variable. Moreover, RNA analysis of the sequences indicated that all loci could form stable RNA secondary structures and showed homology mostly with phages compared to plasmids. Only three strains included cas genes around their loci.

Abbreviations

CAS-CRISPR: associated genes
CDR: consensus direct repeat
CRISPR: clustered regularly interspaced short palindromic repeats
MFE: minimum free energy
MSA: multiple sequence alignment
siRNA: small interfering RNA
smRNA: small RNAs

Introduction

Microorganisms are usually invaded by exogenous agents such as viruses, plasmids, and other destructive attacking mobile genetic elements; so they have developed a type of innate defense system to put down these predators 1. Escherichia coli is a very diverse bacterial species and contains strains that exist in different hosts and in a wide range of environments 2. The ability to survive in various environments is related to their high degree of adaptability 3. E. coli strains possess a type of defense system, which has been newly discovered in about 40% of bacteria and most of archaea 4-6, containing two main parts: (i) CRISPR (clustered regularly interspaced short palindromic repeats) and (ii) CAS (CRISPR-associated genes) proteins [3,7,8], which functions as a part of immune system in the fighting with microorganisms, bacteriophages, viruses, and plasmids or any alien DNA [5,95]. A part of this adaptability in E. coli strains can be related to CRISPR sequences 10. These sequences have been constructed from direct repeats of nearly identical sequences interspaced by unique spacers with similar sizes [4,5]. CRISPRs code no product, but different studies have been indicated that they are transcribed to small RNAs (smRNA), acting as small interfering RNA (siRNA) to hamper the entry of alien DNA 11. The number of CRISPR groups and the spacers is greatly variable, but the sequences of direct repeats are greatly conserved. The spacer sequences are mostly similar to plasmids and phages sequences, therefore, it is suggested that the spacers are derived from plasmids or phages 9. As mentioned above, CRISPR complex comprises a part of the immune system, and the spacers play a key role in defense against plasmids and phages 3, 9. Earlier studies of various E. coli strains have demonstrated a vast heterogeneity in spacers and CRISPR/CAS content 12. Although, spacers possess a similar length in one locus, they have varied nucleotide compositions 3. In a specific CRISPR locus, the repeats always contain palindromic structures and may constitute RNA secondary made up loops and stems 13. Moreover, CRISPR loci indicate great polymorphism in different strains, and this property is applied for identification of clinical strains of Mycobacterium tuberculosis, Streptococcus pyogenes, and Campylobacter jejuni 14. CRISPR/Cas system currently have many applications in different fields including: rapid creation of cellular and animal models, research, medicine, and biotechnology. One of the most important applications of CRISPR system is in genome editing field. It provides a targeted and efficient change in a variety of eukaryotic and special mammalian species. Using this system, DNA sequences in the endogenous genome and their functional products are now simply changed or modulated in nearly any organism of choice 15. For this reason, CRISPR/Cas system is the topic of numerous studies due to their interesting RNA-based action mechanism 16. Presently, instead of some expensive and lengthy practical steps 17, the on-hand bioinformatics tools may help investigators in various areas 18-20 to begin their experiments with some projections. To gain further insight about the behavior and organization of CRISPRs, the extensive analysis of available sequenced genomes is necessary. The systematic study of CRISPR structures may help us to find out more potential roles for these sequences in bacteria. Collectively, a few orderly bioinformatics investigations are available about the structure of CRISPR in the E. coli strains. In this study, we used 57 identified and confirmed CRISPR loci to investigate their structural characteristics and potential functions in E. coli strains using bioinformatics tools. CRISPR loci were categorized based on similarity between direct repeats and evolutionary relationships of spacers. Moreover, we searched the homology of spacer sequences with bacteriophage genomes and plasmids, and investigated the presence of the cas genes around CRISPR loci, in addition to predicted RNA secondary structures of the direct repeats and their stabilities.

Materials and methods

Sequence collection

Different E. coli strains genome were searched through National Center for Biotechnology Information (NCBI) nucleotide database (http://www.ncbi.nlm.nih.gov/) with default parameters; then E. coli CRISPR loci (Table 1) were searched by CRISPR finder server (E-value ≤0.1) (http://crispr.u-psud.fr/Server/). CRISPR finder allows us to identify the structures with the basic characteristics of CRISPRs 21.

Table 1. Characterstic of E. coli strains CRISPR loci used in study

Strain	CRISPR id	Number of CRISPR	Number of spacers	CRISPR length	DR length
E. coli BIDMC 19A	NZ_KI929698_3, 4	2	21, 10	1312, 638	29, 28
E. coliK12AG100	NZ_LN832404_4, 5	2	12, 6	762, 393	29, 28
E. coli O104:H4 str. C227-11	NZ_CP011331_1	1	13	821	29
E. coli strain BIDMC112	NZ_KQ087963_2, 4, 5	3	4, 8, 5	273, 516, 333	29, 29, 29
E. coli strain ED1a	NC_011745_2, 3	2	17, 13	1037, 809	28, 28
E. coli 53638	NZ_AAKB02000001_2,6,7	3	3, 9, 8	204, 577, 516	28, 29, 29
E.coliK12DH10B	NC_010473_4,5	2	12, 6	762, 393	29, 28
E.coli strain DH1Ec169	NZ_CP012127_4,5	2	12, 6	762, 393	29, 28
E.coli 157F8092B_41	NZ_AVCD01000005_1	1	3	211	29
E. coli FAP1	NZ_CP009578_4, 5	2	5, 4	333, 272	29, 29
E. coli ATCC 8739	NC_010468_1,2,3,5	4	12, 15, 21, 3	762, 943, 1309, 204	29, 29, 29, 28
E. coli K-12 MC4100	NZ_HG738867_5, 6	2	6, 12	393, 762	28, 29
E. coli strain E455	NZ_JEND02000002_1, 2	2	14, 6	882, 394	29, 29
E. coli FHI71	NZ_LM996841_1, 2	2	17, 8	1365, 516	29, 29
E. coli BIDMC 2B	NZ_KI929774_1, 2	2	21, 13	1312, 822	29, 29
E. coli strain 48	NZ_JPQG01000003_2, 3	2	14, 15	882, 938	29, 29
E. coli strain IH53473	NZ_LFZH01000009_2	1	6	393	28
E. coli LF82	NC_011993_2, 3	2	9, 22	567, 1349	28, 28
E.coliFHI87	NZ_LM997016_2	1	9	575	27
E. coli BIDMC 17A	NZ_KI929714_2,3	2	13, 21	822, 1312	29, 29
E. coli strain BIDMC104	NZ_KQ087916_1, 2	2	11, 9	701, 577	29, 29
E. coli strain IH57218	LFZJ01000005_2, 3	2	6, 5	394, 333	29, 29
E. coli K-12 strain ER3454	NZ_CP010438_4, 5	2	12, 6	762, 393	29, 28
E. coli GM4792	NZ_CP011342_8, 9	2	6, 12	393, 762	28, 29
E. coli K-12 ER3440	NZ_CP010439_4,5	2	12, 6	762, 393	29, 28
E. coli strain BIDMC106	NZ_KQ087951_1, 2	2	9, 5	574, 333	26, 29
E. coli strain 6409	NZ_CP010371_4, 5	2	10, 6	639, 393	29, 28
E. coli PMV-1	NC_022370_2, 3	2	8, 6	507, 388	28, 28
E. coli HS	NC_009800_1	1	3	204	28
		57 CDRs	566 spacers

Analysis method

The grouping of CRISPR sequences was performed based on similarity between Consensus Direct Repeat (CDR) sequences of CRISPR loci for each group. Moreover, clustering of groups was done by multiple sequence alignment (MSA) using MEGA4 software. The basis of clustering was that the groups with a similar sequence were located in one cluster. In addition, classification of the spacers was performed based on evolutionary relationships with depicting phylogenetic tree using MEGA4 software. The RNA secondary structures and minimum free energy (MFE) of the direct repeats for each group were investigated using RNA fold web server (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). These structures are described using a loop-based energy model and the dynamic programming algorithm introduced by Zuker 22. Homologous sequences with the spacers were searched by NCBI blastn 23. Cas genes in vicinity of CRISPR loci were searched in the CRISPR database blast (http://crispr.u-psud.fr/crispr/BLAST/CRISPRsBlast.php). In order to find the cas genes, the spacers were blasted against the Genbank databases with a cutoff of 0.1 for the E-value (=0.1) and a matching length of at least 70% of the queried spacer size 21.

Data validation

To predict RNA secondary structures and calculate MFE of the direct repeats in CRISPR loci, RNA fold web server was used. In this server, current limits are 7500 nt for section function calculations and 10,000 nt for minimum free energy only predictions. CRISPR finder server was used to achieve CRISPR loci with the last update on 2014/8/5. CRISPR database contains 150 and 2612 analyzed genomes and 563 and 3502 CRISPRs for archea and bacteria, respectively.

Results

CRISPR loci of Escherichia coli in the CRISPR database

Among the various strains of E. coli in NCBI database, we selected the strains that had CRISPR sequences in the CRISPR database. Finally, 29 different E. coil strains were chosen for further study. Some strains had both confirmed and questionable sequences, from which we used only the confirmed sequences for studying. Only five strains had one CRISPR locus, accounting for 17.2%. The other strains (82.8%) possessed 2–4 loci in their genome. The number of spacers were at a range of 3–22, typically 28–34 bp and the number of direct repeats were between 4 and 23, typically 28–29 bp.

Repeat sequences

Grouping of direct repeats

Since Direct Repeat (DR) sequences in one CRISPR locus are highly similar or identical, in each locus, consensus DR (CDR) sequences were selected for multi sequence alignment analysis. Based on the alignment, 57 CRISPR loci in 29 E. coli strains were divided into 13 groups, each with the same CDR sequences (Table 2).

Table 2. Grouping of direct repeats based on similarity between them and the number and percentage of CDRs located in each of groups

Group	Number of CDR	DR consensus	Percentage (%)
1	9	GTGTTCCCCGCGCCAGCGGGGATAAACCG	15.8
2	5	GAGTTCCCCGCGCCAGCGGGGATAAAGCG	8.8
3	11	CGGTTTATCCCCGCTGGCGCGGGGAACAC	19.3
4	7	GTTCACTGCCGTACAGGCAGCTTAGAAA	12.3
5	9	CGGTTTATCCCCGCTGGCGCGGGGAACTC	15.8
6	6	GGTTTATCCCCGCTGGCGCGGGGAACAC	10.5
7	2	GAGTTCCCCGCGCTAGCGGGGATAAACCG	3.5
8	1	GAGTTCCCGGCGCCAGCGGGGATAAACCG	1.8
9	2	GTGTTCCCCGCGCCAGCGGGGATAAACC	3.5
10	1	GAGTTCCCCGCGCCAGCGGGGATAAACC	1.8
11	2	TTTCTAAGCTGCCTGTACGGCAGTGAAC	3.5
12	1	TTTATCCCCGCTGGCGCGGGGAACAC	1.8
13	1	GTTCCCCGCGCCAGCGGGGATAAACCG	1.8

Clustering of direct repeats

For clustering, one CDR of each group was selected as a representative in order to perform multi sequence alignment. The 13 groups were divided into four clusters (Fig. 1). The groups of 1, 2, 7–9, 10, and 13 were placed in the first cluster, all with a common sequence: GTTCCCCGCGC(C/T)AGCGGGGATAAACC. The groups of 3 and 5 were placed in the second cluster, all with a common sequence, including: CGGTTTATCCCCGCTGGCGCGGGGAAC(T/A)C. Moreover, the groups of 4 and 11 were placed in the third cluster, all with a common sequence, including: (G/T)TTC(A/T) (C/A)(T/A)GC(C/T)G(T/C,A/C,C/T,A/G,G/T,G/A)C(A/G)GC(T/A,T/G,A/T)AA(A/C).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The clustering result of the groups based on the common base pairs in CDRs using multi sequence alignment. One CDR of each group was selected as a representative in order to perform multi sequence alignment for clustering of the groups. ★ Represented the similar base pairs. The numbers from 1 to 13 represented the 13 groups of CDRs.

Lastly, the groups of 6 and 12 were placed in the fourth cluster, all with a common sequence: T(T/A)T(A/C,T/C)CC(C/G)C(G/T,C/G,T/G,G/C)GCG(C/G)GG(G/A,G/A)AC.

Direct repeats of RNA secondary structure

The RNA secondary structure and MFE were depicted using direct repeat sequences for each of 13 groups through RNA fold web server (Fig. 2). In all groups, RNA secondary structures were composed of two rings at both ends and a stem in the middle. The stem length in group 4 and 11 was 5 and 6 bp, respectively, while in the other groups it was 7 bp.

RNA secondary structure stability of direct repeats

The MFE of groups 4 and 11 (ΔG>−10 kcal mol⁻¹) was more than other groups, which means that RNA secondary structures of the mentioned groups are likely less stable than those of the other groups; since the number of base pairs in stem length of these groups are lower than other groups. The MFE of groups 4 and 11 were −8.60 kcal mol⁻¹ and −9.10 kcal mol⁻¹, respectively. So the RNA secondary structure of group 11 with 6 bps stem length is more stable than group 4 with 5 bps. The other groups with stem length of 7 bps had an MFE lower than groups 4 and 11; therefore, these groups are more stable than groups 4 and 11.

Spacers

Multi sequence alignment of spacers

Totally 566 spacers were found in 57 CRISPR loci in 29 strains of E. coli. The sequence of spacers (Table 3) in each locus was highly different. As a defense system, this diversity is required. Based on the results of MSA, spacers of the CRISPR loci were classified into 37 groups (Fig. 3). Moreover, based on MSA, no conserved nucleotide was found in the spacers of different CRISPR loci.

Table 3. Genetic elements showing similarity to spacer sequences

Spacer ID	Sequence of spacer	Similar phage GI	Similar plasmid
LFZJ01000005_2	GCAAAAACCGGGCAATCGCAAAAAGGCGTAAT	725950134, 712914205	pEKO1101, pRK1
		712914839
NC_009800_1	AACCTACCGTCTTGGCTAGCGGTTGCAGCGAAC	713322302	—
NC_010468_1	TTCCGCGACCCGGCGATAAGGGAAGATGGGTG	712914205	pEC_B24
NC_010473_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	725950304, 418488631	pNDM-1_Dok01,pAPEC1990_61
		372199367, 712913174	pPG010208
		712914839, 422934783	pNDM10505
		356870600, 849251248	pUMNK88
		281199644
NC_011745_2	ATGCAGCGTTTGTCACTAAAACACTGGTCAAC	155370093, 543170177	pHUSEC2011-3
			pG-09EL50
NC_011993_2	AGCAGCTTTCCAGCGAGCGCGGTTAACTCACT	510953439, 640883453	pVR50H
NC_022370_2	TGACGCCATATGCAGATCATTGAGGCGAAACC	725950304, 698029054	—
NZ_AAKB02000001_2	ATGGTGGGTGGAGTATGTTACCTGTGAA	510953439*, 418488631	p1ColV5155
		372199367	pACN001-B
NZ_CP009578_4	AAAACCAAACTTCTCCATAAATTCCATAGCCG	712914205, 640884271	pO111_1
		408905841, 448260273	pEQ2
		372199367
NZ_CP010371_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	418488631, 725950304	pNDM102337
		372199367, 712913174	pAPEC1990_61
NZ_CP010438_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	725950304, 418488631	p6409-202.186kb
		372199367, 422934783	plasmid pSCEC2
			pAPEC1990_61
NZ_CP010439_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	725950304, 712914839	p6409-202.186kb
		712913174	pSCEC2
NZ_CP011331_1	GGAACTGGCGCTGCTGGAGCAAAACCCGGTAT	408905841,712914839	pVZ321-thrLABC
		593780594
NZ_CP011342_8	GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT	510953439,712914205	plasmid pRK1
			pEKO1101
NZ_CP012127_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	725950304, 418488631	p6409-202.186kb
			pSCEC2
NZ_HG738867_5	GGCAAAAACCGGGCAATCGCAAAAAGGCGTAAT	510953439, 712914839	pRK1
		712914205, 725950134	378715377
NZ_JEND02000002_1	TGTCGGACACCATAATGATACTAAGTGTCGGA	712914205, 849121398	pHUSEC2011-1
NZ_JPQG01000003_2	TAATGAGTCAGGGGAATACCGAATATTTTATA	698029054, 712914839	pSMS35_8
		448260273	pO104
			pIS15_43, pEQ2
NZ_KI929698_3	CTCAGCGGCAAAAAATACGATCTCGCCGGTGT	712914839, 849060471	pUMNF18_IncFV
NZ_KI929714_2	GCCGGAAAATATTCATGATGGGGGTGGTTATGG	388570360	pECN580
		730984989	pKPC-LKEc
NZ_KI929774_1	CTCAGCGGCAAAAAATACGATCTCGCCGGTGT	712914839	—
NZ_KQ087916_1	AAAACCAAACTTCTCCATAAATTCCATAGCCG	418487051	pEQ2
NZ_KQ087951_1	GTCAATAGGCGGCGTCCCGTAGCCGTCCCCTTCGG	510953439	—
NZ_KQ087963_2	ACATGAATGTCGGTTCAGACCGTGTTTTTACC	422934783	pO111_2 DNA
NZ_LFZH01000009_2	GACAGAACGGCCTCAGTAGTCTCGTCAGGCTCC	—	pCFSAN029787_01
NZ_LM996841_1	AAAACCAAACTTCTCCATAAATTCCATAGCCG	712914205	pO111_1 DNA
		640884271	pEQ1
NZ_LM997016_2	GCCATCAGCTATAACACGCGCCGCTTCATCAAGA	682123018, 593780594	p3PCN033
NZ_LN832404_4	CAGCGTCAGGCGTGAAATCTCACCGTCGTTGC	418488631, 712914839	pSCEC2
			pPG010208

The homologous investigation of spacers

For each CRISPR loci in 29 strains of E. coli, the homologous sequences with spacers were searched by NCBI blast. Almost all spacers had a degree of similarity with some phage genomes and plasmids.

Cas genes near CRISPR loci

Cas genes were searched from 10,000 bp upstream to 10,000 bp downstream the CRISPR loci in the CRISPR database blast. Six cas proteins were found only in three strains (E. coli PMV-1, CRISPR ID: NC_022370_2, E. coli ATCC 8739, CRISPR ID: NC_010468 in fwd strand, and E. coli HS, CRISPR ID: NC_009800_1 in rev strand). The CRISPR of E. coli (PMV-1) was subtype I-F/YPEST. CRISPR's associated proteins in the vicinity of the CRISPR of E. coli PMV-1 were endonuclease Cas1, helicase Cas3, protein Csy1, protein Csy2, protein Csy3, and protein Cas6/Csy4. Cas genes in strain E. coli (ATCC 8739) were cas1, cas2, cas3, cas5, cse1, cse2, cse3, cse4, and cas genes in E. coli strain HS were cas1, cse2, cse3, cse4, cas5 subtype E. coli (Fig 4).

Discussion

CRISPR system was initially identified in E. coli and has been recently discovered in many bacteria and most archaea. This system provides acquired resistance against viruses, plasmids, and bacteriophages, likely through an RNA interference-like mechanism 5, 24. CRISPR loci were searched in the circular chromosome of E. coli strains using CRISPRs web server. Usually, these sequences are found on circular chromosomes of bacteria and archea; however, some kinds of them have been detected on several plasmids 25. CRISPR web server is the first proprietary online server in order to find and analyze CRISPR sequences and cas genes, which are located in vicinity of them 25. At the moment, CRISPR database contains 150 analyzed genomes and 563 CRISPRs for archea and 2612 analyzed genomes and 3502 CRISPRs for bacteria. A cas gene database has been characterized by Haft et al. 26; however, no information has been mentioned about CRISPR sequences. Most E. coli strains include several questionable CRISPR loci in their genome; however, only confirmed sequences were used in this study. Numerous spacers and direct repeats were found in most CRISPR sequences. Additionally, in most strains the length of the sequences were long; in this context, Horvath et al. studies indicated that the longer CRISPR sequences are probably more active than the short sequences 10. So this characteristic can be used to differentiate active CRISPR sequences from the non-active ones in some strains.

Since the CRISPR systems are responsive to the environment, it likely plays a main role in host adaptation with its surroundings, so it can explain the stability of specific bacterial strains in different ecosystems 10. Perhaps this is one of the reasons for the fact that E. coli strains are able to survive in different environments and hosts. Moreover, the evidences resulted from studies of Sulfolobus conjugative plasmids support the idea that plasmids containing repeat clusters are more stable in host cells; so, it can reduce the stability of some pathogenic strains through inactivation of specific CRISPR loci in host or in a particular environment 27.

On the genome of a specific species, several CRISPR loci can exist separately. For example, in Methanocaldococcus jannaschii, 18 loci have been identified on its genome 6, 10. So far, two CRISPR systems have been identified in E. coli strains, in each of them, two types have been detected that contain CRISPR 1 and -2 and Ypest system; moreover, the last system includes CRISPR 3 and -4 3. In this study, 82.8% of strains included 2–4 loci in their genome that probably two systems could exist in these strains, while the remaining strains contained one locus on their chromosome that appears to be presented only one system.

The similarities of spacer sequences were also investigated with plasmids and phages genome in NCBI database. Based on previous studies, the spacer sequences have been originated from foreign elements such as phages and plasmids 8, 28, 29. Actually, the ability of CRISPR system in acquiring new spacers from phages and plasmids and consequently, defense against them in the future is a unique feature of this system 7. Our results are consistent with these findings and indicate that the spacers are derived from foreign elements. The higher number of bacteriophages versus plasmids that were similar to the spacers, represents the high amount of phages attack in the evolutionary pathway of strains and the significance of phages in the acquisition process of new spacers. This property can be applied in order to identify some strains using phages that the spacers were derived from them.

A total of 566 spacers were found in CRISPR loci of E. coli strains, and only three strains included 3–5 spacers while the other strains contained more than five spacers. The average length of spacers were 31 bp, ranging from 28 to 34. Diversity in the length and sequence of spacers affects the activity of CRISPR systems in bacteria 10, 30. In Di et al. studies, CRISPR loci containing a more number of the spacers with length of 30 bp were more active than loci containing a less number with length of 36 bp, which indicates the effect of the number and length of the spacer on activity of CRISPR loci 30. In our study, all strains possess spacers with an average length of 31 bp; therefore, CRISPR loci in our selected strains are likely more active than previously studied strains with shorter spacers. In addition, no conserved nucleotide was observed in the spacers of strains. Actually, these findings indicate the specificity of CRISPR/Cas system in responding to extrachromosomal elements.

There can be one or several modified nucleotides in direct repeats of different CRISPR loci, but they are typically conserved. When a CRISPR locus receives a new spacer, internal spacers are usually deleted probably by homologous recombination between CRISPR direct repeats in order to help in limiting the size of CRISPR sequences. The repeats can undergo polymorphism, particularly in the terminal repeat, in which sequence degeneracy has been observed at their 3′ ends 10. This observation is especially essential for the correct interpretation and location of CRISPR loci, since the last spacer/repeat unit, containing the terminal repeats are often lost. The repeat location seems to be consistent with the location of nearby cas genes. In addition, variations within the repeat sequences can be seen throughout a CRISPR locus 10. In most studied E. coli strains, direct repeats were almost conserved; for this reason, most of the groups were located in the first cluster. So it can be inferred that polymorphism in this cluster has less likely happened than other clusters, which indicates that the presence of cas genes around the CRISPR loci in this cluster is probably less.

The RNA secondary structure and the MFE of the direct repeats were also investigated. CRISPR repeats have a partially palindromic character. Therefore, they can constitute stable hairpin-like secondary forms 10. Moreover, it seems that CRISPR sequences are transcribed to a single-stranded RNA molecule. With the progress of transcription, two sequential single-stranded RNA can interact and form secondary structures by pairing head to foot 10. In all 29 strains, RNA secondary structures of directs repeats included a low MFE (ΔG<−10 kcal mol⁻¹, except for groups 4 and 11), therefore, they can create a stable structure. The structures with a lower minimum free energy are more stable than those with higher MFE value 31. In this context, Kunin et al. indicated that stem-loop structures of some direct repeats probably act to facilitate the contact between the foreign RNA or DNA targeting spacer and cas-encoded proteins 13. Moreover, stability of RNA secondary structures may impress the function of CRISPR loci 5.

Furthermore, the cas genes that are located in vicinity of CRISPR loci were searched through CRISPR database. According to our search, the cas genes were found only in three strains. These findings are similar to Yang et al. studies 5, which was performed in 32 Staphylococcus aureus strains; only two strains included cas genes in vicinity of their CRISPR loci. CRISPR/Cas system can be transferred among different but related species 5; so the cas genes may be transferred to E. coli strains PMV-1, ATCC 8739, HS from other species. It can be considered that the CRISPR systems of the other E. coli strains are inactive at present, because when cas genes are deactivated in a certain CRISPR locus or are not present, the ability of this locus to supply resistance and integrate new spacers is lost 10.

In the studied E. coli strains, direct repeats were highly conserved in one locus, while the spacers were variable. Since the spacers are used in order to protect against different exogenous elements, they must be variable. According to the variable nature of the spacers, they can be used to identify strains. In general, the dynamic characteristic of CRISPR loci is possibly valuable for typing and relative analyses of strains and microbial population. Our research indicated that direct repeats are not conserved completely in different strains, and may be different in one or several nucleotides. The nature of repeat sequences affects the activity of CRISPR system through formation of stable RNA secondary structures. Moreover, the cas genes may not be present in all CRISPR systems of E. coli strains.

Acknowledgments

This study was supported by a Grant from the Research Council of Shiraz University of Medical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran.

Conflict of interest

The authors declare that they have no conflict of interest.

References

1 Terns, R.M., Terns, M.P., 2014. CRISPR-based technologies: prokaryotic defense weapons repurposed. Trends Genet., 30, 111–118.
10.1016/j.tig.2014.01.003
CAS PubMed Web of Science® Google Scholar
2 Lobersli, I., Haugum, K., Lindstedt, B.A., 2012. Rapid and high resolution genotyping of all Escherichia coli serotypes using 10 genomic repeat-containing loci. J. Microbiol. Methods, 88, 134–139.
10.1016/j.mimet.2011.11.003
CAS PubMed Web of Science® Google Scholar
3 Dang, T.N., Zhang, L., Zollner, S., Srinivasan, U., et al., 2013. Uropathogenic Escherichia coli are less likely than paired fecal E. coli to have CRISPR loci. Infect. Genet Evol., J. Mol. Epidemiol. Evolution. Genet. Infect. Dis., 19, 212–218.
10.1016/j.meegid.2013.07.017
CAS PubMed Web of Science® Google Scholar
4 Wakefield, N., Rajan, R., Sontheimer, E.J., 2015. Primary processing of CRISPR RNA by the endonuclease Cas6 in Staphylococcus epidermidis. FEBS Lett., 20, 3197–3204.
10.1016/j.febslet.2015.09.005
CAS Web of Science® Google Scholar
5 Yang, S., Liu, J., Shao, F., Wang, P., et al., 2015. Analysis of the features of 45 identified CRISPR loci in 32 Staphylococcus aureus. Biochem. Biophys. Res. Commun., 464, 894–900.
10.1016/j.bbrc.2015.07.062
CAS PubMed Web of Science® Google Scholar
6 Lillestol, R.K., Redder, P., Garrett, R.A., Brugger, K., 2006. A putative viral defence mechanism in archaeal cells. Archaea, 2, 59–72.
10.1155/2006/542818
PubMed Google Scholar
7 Goren, M.G., Yosef, I., Auster, O., Qimron, U., 2012. Experimental definition of a clustered regularly interspaced short palindromic duplicon in Escherichia coli. J. Microbiol. Methods, 423, 14–16.
Web of Science® Google Scholar
8 Sapranauskas, R., Gasiunas, G., Fremaux, C., Barrangou, R., et al., 2011. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucl. Acids Res., 39, 9275–9282.
10.1093/nar/gkr606
CAS PubMed Web of Science® Google Scholar
9 Pul, U., Wurm, R., Arslan, Z., Geissen, R., et al., 2010. Identification and characterization of E. coli CRISPR-cas promoters and their silencing by H-NS. Mol. Microbiol., 75, 1495–1512.
10.1111/j.1365-2958.2010.07073.x
CAS PubMed Web of Science® Google Scholar
10 Horvath, P., Romero, D.A., Coute-Monvoisin, A.C., Richards, M., et al., 2008. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J. Bacteriol., 190, 1401–1412.
10.1128/JB.01415-07
CAS PubMed Web of Science® Google Scholar
11 Grissa, I., Vergnaud, G., Pourcel, C., 2007. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinform., 8, 172.
10.1186/1471-2105-8-172
CAS PubMed Web of Science® Google Scholar
12 Diez-Villasenor, C., Almendros, C., Garcia-Martinez, J., Mojica, F.J., 2010. Diversity of CRISPR loci in Escherichia coli. Microbiol., 156, 1351–1361.
10.1099/mic.0.036046-0
CAS PubMed Web of Science® Google Scholar
13 Kunin, V., Sorek, R., Hugenholtz, P., 2007. Evolutionary conservation of sequence and secondary structures in CRISPR repeats. Gen. Biol., 8, 1–7.
10.1186/gb-2007-8-4-r61
CAS Web of Science® Google Scholar
14 Bolotin, A., Quinquis, B., Sorokin, A., Ehrlich, S.D., 2005. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiol., 151, 2551–2561.
10.1099/mic.0.28048-0
CAS PubMed Web of Science® Google Scholar
15 Hsu, P.D., Lander, E.S., Zhang, F., 2014. Development and applications of CRISPR-Cas9 for genome engineering. Cell, 157, 1262–1278.
10.1016/j.cell.2014.05.010
CAS PubMed Web of Science® Google Scholar
16 Bondy-Denomy, J., Davidson, A.R., 2014. To acquire or resist: the complex biological effects of CRISPR-Cas systems. Trends Microbiol., 22, 218–225.
10.1016/j.tim.2014.01.007
CAS PubMed Web of Science® Google Scholar
17 Gholami, A., Shahin, S., Mohkam, M., Nezafat, N., et al., 2015. Cloning, characterization and bioinformatics analysis of novel cytosine deaminase from Escherichia coli AGH09. Int. J. Peptide Res. Ther., 21, 365–374.
10.1007/s10989-015-9465-9
CAS Web of Science® Google Scholar
18 Ghasemi, Y., Dabbagh, F., Rasoul-Amini, S., Borhani Haghighi, A., et al., 2012. The possible role of hsps on behçet's disease. A bioinformatic approach. Comput. Biol. Med., 42, 1079–1085.
10.1016/j.compbiomed.2012.08.009
CAS PubMed Web of Science® Google Scholar
19 Nezafat, N., Ghasemi, Y., Javadi, G., Khoshnoud, M.J., et al., 2014. A novel multi-epitope peptide vaccine against cancer. an in silico approach. J. Theor. Biol., 349, 121–134.
10.1016/j.jtbi.2014.01.018
CAS PubMed Web of Science® Google Scholar
20 Zamani, M., Nezafat, N., Negahdaripour, M., Dabbagh, F., et al., 2015. In silico evaluation of different signal peptides for the secretory production of human growth hormone in E. coli. Int. J. Pep. Res. Ther., 21, 261–268.
10.1007/s10989-015-9454-z
CAS Web of Science® Google Scholar
21 Grissa, I., Vergnaud, G., Pourcel, C., 2007. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucl. Acids Res., 35, 52–57.
10.1093/nar/gkm360
PubMed Web of Science® Google Scholar
22 Zuker, M., Stiegler, P., 1981. Optimal computer folding of lare RNA sequences using thermodynamics and auxiliary information. Nucl. Acids Res., 9, 133–148.
10.1093/nar/9.1.133
CAS PubMed Web of Science® Google Scholar
23 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., et al., 1990. Basic local alignment search tool. J. Mol. Biol., 215, 403–410.
10.1016/S0022-2836(05)80360-2
CAS PubMed Web of Science® Google Scholar
24 Touchon M., Charpentier S., Clermont O., Rocha E.P., et al., CRISPR distribution within the Escherichia coli species is not suggestive of immunity-associated diversifying selection. J. Bacteriol. 193 2011. 2460–2467.
10.1128/JB.01307-10
CAS PubMed Web of Science® Google Scholar
25 Grissa, I., Bouchon, P., Pourcel, C., Vergnaud, G., 2008. On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing. Biochem., 90, 660–668.
10.1016/j.biochi.2007.07.014
CAS PubMed Web of Science® Google Scholar
26 Haft, D.H., Selengut, J., Mongodin, E.F., Nelson, K.E., 2005. A guild of forty-five CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput. Biol., 6, 60.
Google Scholar
27 Greve, B., Jensen, S., Brugger, K., Zillig, W., et al., 2004. Genomic comparison of archaeal conjugative plasmids from Sulfolobus. Archaea, 1, 231–239.
10.1155/2004/151926
CAS PubMed Google Scholar
28 Mojica, F.J., Diez-Villasenor, C., Garcia-Martinez, J., Soria, E., 2005. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol. Evol., 60, 174–182.
10.1007/s00239-004-0046-3
CAS PubMed Web of Science® Google Scholar
29 Pourcel, C., Salvignol, G., Vergnaud, G., 2005. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiol., 151, 653–663.
10.1099/mic.0.27437-0
CAS PubMed Web of Science® Google Scholar
30 Di, H., Ye, L., Yan, H., Meng, H., et al., 2014. Comparative analysis of CRISPR loci in different Listeria monocytogenes lineages. Biochem. Biophys. Res. Commun., 454, 399–403.
10.1016/j.bbrc.2014.10.018
CAS PubMed Web of Science® Google Scholar
31 Wolfsheimer, S., Hartmann, A.K., 2010. Minimum-free-energy distribution of RNA secondary structures: entropic and thermodynamic properties of rare events. Phys. Rev. E, 82, 021902.
10.1103/PhysRevE.82.021902
PubMed Web of Science® Google Scholar

Citing Literature

Volume56, Issue6

June 2016

Pages 645-653

Studying the features of 57 confirmed CRISPR loci in 29 strains of Escherichia coli