RSA-tools - Tutorials

The aim of these tutorials is to give a theoretical and practical introduction to the Regulatory Sequence Analysis Tools (RSAT) software suite. The most convenient way to follow the tutorial is to display the current page in a separate window, and to use the tools with the current one.

(click here to open a new window with the tutorial pages)

The RSAT home page displays two frames. The frame on the left contains a menu, presenting the available tools. Each time you click on a tool name, the right frame displays the form for the corresponding tool.

The tools are organized in a modular way : rather than having a single form for the complete analysis, we found it more convenient to present separate forms for the successive steps of a given analysis. A typical analysis will thus consist in using successivbely different tools (for example sequence retrieval -> motif discovery -> pattern matching -> feature-map). For this purpose, the tools are interconnected, allowing you to send automatically the result of one request as input for the next request (piping). The links between tools are illustrated in the flow chart below. An advantage of this modular organization is that you can either follow a full pipeline throught the tools, or directly enter at any step of an analysis with external data of your own.

We will analyze some practical examples to get familiar with the different tools, and the way they are interconnected.

The tutorial contains different parts, illustrating the typical situations that can be encountered when analysing regulatory sequences :

  1. Pattern matching: you know the regulatory motif (e.g. the consensus for a transcriptional factor), and you are interested by one or several particular sequences (e.g. promoters of a gene of interest, or binding fragments obtained from ChIP-on-chip experiments): you look for the matching positions within the sequences.
  2. Genome-scale pattern matching: you know the regulatory motif, and you would like to scan the genome to detect genes having this motif in their regulatory regions, which may be considered as potential target genes for the transcription factor of interest.
  3. Motif discovery (or pattern discovery). You know the sequences, you ignore the regulatory motif : you dispose of a set of functionally related regulatory sequences (e.g. promoters of co-expressed genes, or peaks collected from ChIP-seq experiments), and you suspect that they are enriched in binding site for one or seveal transcription factors. You thus want to detect a motif "ab initio" from the sequences.


    Representations of transcription factor binding motifs

  1. String-based representations
  2. Position-specific scoring matrices (PSSM)
  3. Sequence logos
  4. Sequence retrieval

  5. from RSAT
  6. from EnsEMBL
  7. Pattern matching

  8. dna-pattern: string-based pattern matching
  9. patser: matrix-based pattern matching (obsolete)
  10. Detailed protocol for matrix-scan:
    Turatsinze, J.V., Thomas-Chollier, M., Defrance, M. and van Helden, J. (2008) Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc, 3, 1578-1588. Pubmed 18802439
  11. Motif discovery

    String-based motif discovery

  12. Counting word occurrences in DNA sequences.
  13. oligo-analysis: detection of over-represented oligonucleotides (words).
  14. dyad-analysis: detection of over-represented spaced pairs of oligonucleotides.
  15. position-analysis: detection of words having a positional bias in sequences aligned on some reference position.
  16. Detailed protocol for string-based motif discovery:
    Defrance, M., Janky, R., Sand, O. and van Helden, J. (2008) Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nature Protocols 3, 1589-1603. Pubmed 18802440
  17. Comparison and clustering of PSSM

  18. compare-matrices
  19. matrix-clustering
  20. Building control sets

  21. Random models
  22. Selecting random genes
  23. Generating random sequences
  24. Applications

  25. Microarray analysis: prediction of regulatory motifs from clusters of co-expressed genes.
  26. Collecting peak sequences from the Galaxy Web site.
  27. peak-motifs: motif detection in full-size datasets of ChIP-seq peak sequences.
  28. Combining RSAT and NeAT to predict metabolic pathways and their regulation.

Last update 15 Jan 2012 - by