
CRISPRs Finder program online
FAQs | Help | About CRISPRFinder | Contact Us | NEWS | Examples | IGM |
![]() |
- 1 Fasta Format
- 2 CRISPR definition
- 3 CRISPRFinder description
- 4 Output Format
- 5 External links
- 6 References
FASTA Format
Definition
The first line starts with a greater than sign ">" and contains a name or other identifier for the sequence. This is the sequence header and must be in a single line. The remaining lines contain the sequence data. The sequence can be in upper or lower case letters. Anything other than letters (numbers for example) is ignored. Multiple sequences can be present in the same file as long as each sequence has its own header.
Supported Nucleic acid code
A --> adenosine C --> cytidine G --> guanine T --> thymidine U --> uridine R --> G A (purine) Y --> T C (pyrimidine) K --> G T (keto) | M --> A C (amino) S --> G C (strong) W --> A T (weak) B --> G T C D --> G A T H --> A C T V --> G C A N --> A G C T (any) - gap of indeterminate length |
Any other characters will be deleted.
Example
The FASTA format is a plain text format which looks something like this:
>Escherichia coli UTI89|886538|887045
GTTCACTGCCGTACAGGCAGCTTAGAAA TGACGCCATATGCAGATCATTGAGGCGAAACC
GTTCACTGCCGTACAGGCAGCTTAGAAA ACGTTCGCACCGGTCAGGGTACTGCGCAGCGT
GTTCACTGCCGTACAGGCAGCTTAGAAA GAAACCAGAGCGCCCGCATAAAACAGGCACAA
GTTCACTGCCGTACAGGCAGCTTAGAAA GCCAGCATAAAACCGCCTTTGATATTTTATTG
GTTCACTGCCGTACAGGCAGCTTAGAAA TCAGCCGGAGGCTCTCAATTTCAGCCGCGCGG
GTTCACTGCCGTACAGGCAGCTTAGAAA AGCACGGCTGCGGGGAATGGCTCAATCTCTGC
GTTCACTGCCGTACAGGCAGCTTAGAAA TGATGGCGCAGCAGTCCTCCCTCCTGCCGCCA
GTTCACTGCCGTACAGGCAGCTTAGAAA CTGAACGTTGAAGAGTGCGACCGTCTCTCCTT
GTTCACTGCCGTACAGGCAGTATTCACA
CRISPR definition
CRISPR structure
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) present a curious repeat structure found in many prokaryotic genomes. They show characteristics of both tandem and interspaced repeats. They have been described in a wide range of prokaryotes, including the majority of Archae and many Eubacteria (Jansen 2002 [4]). A CRISPR locus is mainly characterized by :
- Direct Repeat (DRs) and Spacers : A CRISPR is a succession of 21-47bp sequences called Direct Repeat (DRs) separated by unique sequences of a similar length (spacers). Sometimes, at one end of the CRISPR, the DR is not totally conserved, it is called degenerate DR.
- A leader sequence : the CRISPR locus is generally flanked on one side by a common leader sequence of 200-350 bp,
- A family of Cas genes : CRISPR-associated genes are genes always found closely linked to the repetitive sequences.
![]() |
DRs and Spacers
In a given strain several CRISPRs can be found with a single or different DR sequences but only one of each kind is associated with the cas genes. The spacers in the different CRISPR are different.
The nature of the unique sequences is still largely unknown but several recent studies identified some of them as fragments of foreign DNA mostly of viral origin (Bolotin 2005 [1]) ; (Mojica 2005 [6] ) ; (Pourcel 2005 [7]) .
It is proposed that these spacers derive from phages and subsequently help protect the cell from infection.
Cas Genes
Some genes called cas for CRISPR-associated are found in the vicinity of CRISPRs (Jansen 2002 [4]). Their exact number is not known and apparently varies from one strain to another. However a core of 4 genes is regularly identified. Phylogenetic studies performed on the CAS protein suggest that CRISPRs are acquired by horizontal transfer (Godde 2006[2]); (Haft 2005 [3]). This is further shown by their presence on megaplasmids.
Leader sequence
Different observations suggest that CRISPR loci are transcribed into small RNA possibly from the leader acting as a promoter, and that these might play a role of siRNA to block the entry of foreign sequences (Tang 2002 [8]); (Makarova 2006 [5]).
CRISPRFinder description
Maximal repeats
A maximal repeat is a repeat with no possible extension to the right or the left without incurring a mismatch.
![]() |
Program structure
The main idea of the CRISPRFinder program is to find possible CRISPR localizations and then to check if these regions contain a cluster that meets CRISPR structure.
- 1)Possible localizations Finding possible CRISPR localizations is fulfilled by detecting maximal repeats (see paragraph below). This step is fulfilled by the VMatch package. Default parameters used are the following : a repeat length of 23 to 55 bp a gap size between repeats of 25 to 60 bp. one nucleotide mismatch between repeats
- 2)CRISPR features The criteria a CRISPR should fit are the following:
- the spacer size compared to the DR size.
This filter is mainly added to eliminate structures having for example a 45 bp DR and a 20 bp spacer.
By default, the spacer size should be from 0,6* to 2,5* the DR size. - the spacers should be not identical.
This filter is set to eliminate tandem repeats. The spacers comparison is made by aligning them (using default parameters of the ClustaW program). Spacers similarity percentage is calculated with the function percentage_identity() of the (Bio)perl interface ( AlignIO methods, ClustaW interface ).
By default, this parameter is set to 60%.
Parameter description
Questionable CRISPRs
There are two kinds of "questionable" CRISPRs:
- Small CRISPRs, i.e structures having only two or three DRs
- Structures where the repeated motifs (DR in CRISPR) are not 100% identical.
One way to "critically investigate" is to see if the questionable CRISPR seems to be within a coding sequence. CRISPR are usually non-coding, and do not belong to genes. An other way is to check the internal conservation of the candidate DRs, and the divergence of the candidate spacers. More definitive evidence might be provided by the typing of a collection of strains from this species. Some bench work is needed there.
External links
References
- ↑M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Treeswith Enhanced Suffix Arrays. Journal of Discrete Algorithms, 2:53-86, 2004.
- ↑ Bolotin, A., B. Quinquis, A. Sorokin, and S. D. Ehrlich. 2005. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151:2551-61.
- ↑ Godde, J. S., and A. Bickerton. 2006. The repetitive DNA elements called CRISPRs and their associated genes: evidence of horizontal transfer among prokaryotes. J Mol Evol 62:718-29.
- ↑Stefan Kurtz, Chris Schleiermacher: REPuter: Fast Computation of Maximal Repeats in Complete Genomes. Bioinformatics 15(5), pages 426-427, 1999.
- ↑ Haft, D. H., J. Selengut, E. F. Mongodin, and K. E. Nelson. 2005. A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes. PLoS Comput Biol 1:e60.
- ↑ Jansen, R., J. D. van Embden, W. Gaastra, and L. M. Schouls. 2002. Identification of a novel family of sequence repeats among prokaryotes. Omics 6:23-33.
- ↑ Makarova, K. S., N. V. Grishin, S. A. Shabalina, Y. I. Wolf, and E. V. Koonin. 2006. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct 1:7.
- ↑ Mojica, F. J., C. Diez-Villasenor, J. Garcia-Martinez, and E. Soria. 2005. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60:174-82.
- ↑ Pourcel, C., G. Salvignol, and G. Vergnaud. 2005. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151:653-63.
- ↑ Tang, T. H., J. P. Bachellerie, T. Rozhdestvensky, M. L. Bortolin, H. Huber, M. Drungowski, T. Elge, J. Brosius, and A. Huttenhofer. 2002. Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus. Proc Natl Acad Sci U S A 99:7536-41.