CRISPRs Web service User's manual

Summary

1 Site description

2.1 Data summary
2.2 CRISPRdb Navigation
2.3 CRISPRs Utilities
2.4 BLAST CRISPRs
2.5 FlankAlign
2.6 My CRISPRdb
2.7 CRISPRtionary

3 Colour code
4 Questionable CRISPRs
5 Useful links

Site description

Data summary

A number of tools are available here:

The CRISPR database itself, in which microbial genomes have been pre-processed in search for CRISPRs structures. This database can be browsed in an intuitive way.
The second tool is the crispr finder tool. You can use this page to analyse your own data.
The third utility provides a global overview of CRISPRs present in the database. Lists of known DRs and spacers, and of CRISPRs ordered according to the number of spacers, are provided, with links to detailed information for the corresponding CRISPRs. Sequence alignments of selected DR regions can be produced using CLUSTAL W, together with dendrograms.
The BLAST CRISPR page will be of use to try to validate a questionable CRISPR. From this page, a candidate DR region (or spacer) can be compared to all DRs (or spacers) characterised so far from clear-cut CRISPR structures present in the database.
The FlankAlign link is useful to compare CRISPR flanking sequences, for instance when looking for the homolog of a CRISPR locus in other strains, or when trying to validate a questionable CRISPR, by searching for a leader sequence.
MyCRISPRdb allows you to store your own data. Your data will be available together with that of the main database for comparison with the CRISPRcompar tool.
The last resource is the "spacers dictionary". This is a very helpfull tool to analyse sequence from multiple alleles derived from the same locus. Such data would be produced for instance when investigating the diversity of CRISPR within a species by sequencing the locus in different isolates. This tool can then be used to automatically number spacers, produce a "dictionary", and code the alleles using this dictionary. Sample files are provided to illustrate how this works and what it does ( Pestis Dictionary and CRISPR_YP1_Pestis).

CRISPRdb Navigation

The first database page displays the list of prokaryotic public genomes (stored in the database and updated frequently). The species may be displayed in two ways :

a) View the strains alphabetical browser
b) View the strains taxonomy browser

The colour code indicates whether a CRISPR has been detected or not: strains without a CRISPR are coloured in yellow, strains having at least one CRISPR are coloured in pink and strains having only questionable CRISPRs are in orange.

Upon selecting a strain name, a page displays the strain properties, the available genomes (chromosome and plasmids) and indicates how many CRISPRs have been found. The colour code is the same as above.
The button leads to the CRISPRs properties page giving more details on the found CRISPRs. The button Find cas genes leads to a table showing the list of CRISPR-associated genes annotated in all the genomes (plasmids or chromosomes) in the corresponding taxon and their position on the genome.
Example

The CRISPRs properties page, indicates the CRISPR's id together with its position on the genome, the number of spacers and the consensus DR sequence.

Querying a CRISPR locus leads to a page containing all its properties : the DR consensus shown in yellow, the spacers shown in different colours, their positions in the genome.

The left flanking sequence is given by default. It is the sequence that ends at the begin position of the CRISPR and has a specified length (100 bp by default).
The user may change this value to get longer or shorter sequences by modifying the length value

and then clicking on flanking_sequences button.
The right flanking sequence is given by default. It is the sequence that begins from the end position of the CRISPR and has a specified length (100 bp by default). The user may change this value to get longer or shorter sequences by modifying the length value and clicking on flanking_sequences button.
CRISPR sequence is the sequence from the first nucleotide in the first DR to the last nucleotide of the last DR. To get this sequence or to modify its start or end positions, the user should click on the flanking sequence button.

CRISPR Utilities

This page provides a global overview of CRISPRs present in the database and offers the possibility of downloading some overview files.

Exhaustive DR List and DRs alignement

Align DR sequences

View and blast repeated Spacers list

E-value <= 0.1
the hit sequence size >= 70% of the queried sequence size

Number of spacers in each CRISPR
Download files

Exhaustive DR list. Only DRs of confirmed CRISPRs are reported,
Exhaustive Spacers list. Only spacers of confirmed CRISPRs are reported,
List of genomes having confirmed CRISPRs,
List of genomes without CRISPRs.

BLAST CRISPRs

The BLAST against the CRISPRdb database finds regions of local similarity between the introduced nucleotide sequence(s) and the catalogue of DRs or (and) spacers of confirmed CRISPRs.
It is used as follows:

Introduce the query sequence and select Direct Repeats or (and) Spacers.
The result is displayed in the next page and for more details, user may click on the sequence id to display more information about the related CRISPR.

FlankAlign

Purpose of the page
Example of use

Aquifex aeolicus VF5

NC_000918

Aquifex aeolicus VF5

NC_000918

Reverse selection

RevComp (opposite flanking sequence)

iii)

Align flanking sequences

my CRISPRdb

Create your login
Find and store your CRISPRs
Your private space

main database section

CRISPRtionary

The Spacer Dictionnary Creator use is illustrated by an example based on Yersinia pestis sample data (see related paper ).
The user should introduce:

a dictionnary file which is an excel file containing a sheet with three columns in the currently used format (will be reduced to two in the future): a column for the spacer id, a column for alias names (may be empty) and a column for the spacer sequence (as an example see the file).
The tool can be run without loading any dictionnary file.
a fasta file with multiple alleles sequences (see the example).

Next, the user has to select the sheet (of the excel dicitonnary to be used in the analysis.
Then, each sequence of the fasta file will be analyzed by CRISPRFinder and the related CRISPR will be detected. In some cases, no CRISPR is detected because the Direct Repeat sequences are too diverged but this problem will be treated in the next step.
As the sequences are generally short or may contain some sequencing errors, the DR of each cluster may be not defined accurately in this step, so the user should select manually one DR among the obtained DRs listed as shown by the following figure. When identical DRs are detected in the database, a link for more details on it appears next to the DR sequence. This may help the user in selecting the appropriate DR sequence.

Finally, the selected DR will be blasted against all the introduced sequences and all the CRISPRs will be shown (even degenerated ones that did not appear in the previous page). An id will be assigned to each spacer. The id is either selected from the dictionnary (when the spacer is already in the dictionnary) or assigned a number and added to the dictionnary (if it is not listed in the dictionnary).

Colour code

The colour code indicates whether a CRISPR has been detected or not:

a)Yellow
b)Pink
c)Orange

Species without a CRISPR are coloured in yellow, species having at least one CRISPR are coloured in pink and species having only questionable CRISPRs are in orange.

Questionable CRISPRs

There are two kinds of "questionable" CRISPRs:

Small CRISPRs, i.e structures having only two or three DRs
Similar structures like particlar kinds of tandem repeats (not eliminated) or structure where the repeated motifs (DR in CRISPR) are not 100% identical.

They stop being questionable if the DR consensus is found elsewhere in the database in a convincing CRISPR.
Many of these structures are not true CRISPRs, and they need to be critically investigated. One way to "critically investigate" is to see if the questionable CRISPR seems to be within a coding sequence. CRISPR are usually non-coding, and do not belong to genes. An other way is to check the internal conservation of the candidate DRs, and the divergence of the candidate spacers. More definitive evidence might be provided by the typing of a collection of strains from this species. Some bench work is needed there.

Useful links

Cas genes (TIGR genome properties query page)
CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. (C. Pourcel, G. Salvignol, and G. Vergnaud. 2005,Microbiology 151:653-63)