|
================================================================
RSSVM (RNA Sampler + SVM): A Support Vector Machine based RNA Motif
Identifier
(version 1.0, Aug 2008)
Copyright (C) 2005-2007
Xing Xu, Yongmei Ji, Gary D. Stormo
Department of Genetics, Washington University in Saint Louis
================================================================
COPYRIGHT:
----------
This RSSVM software is written by Xing Xu at the Gary Stormo lab in
Department of Genetics, Washington University, School of Medicine.
RSSVM is distributed freely to the scientific community "as is" in the
hope that it will be useful, but without any warranty. If you are a
commercial user or want to commercialize it, please contact the authors.
INTRODUCTION:
-------------
RSSVM stands for RNA Sampler + Support Vector Machine,
which is a new computational program developed by Xing Xu, Yongmei Ji,
and Gary Stormo at Washington University in Saint Louis.
RSSVM
employs Support Vector Machines (SVM) for efficient
identification of functional RNA motifs from random RNA structures. It
uses a set of distinctive features to represent the common RNA secondary
structure and structural alignment predicted by RNA Sampler (check its
detail here), a tool for accurate RNA common structure prediction, and
is trained with functional RNAs from a variety of bacterial RNA
motif/gene families covering a wide range of sequence identities.
The
details of the algorithm was described in the paper "Discovering
cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines"
(PLoS Computational Biology).
The
program is written in Perl and C. When tested on a large number of known
and random RNA motifs, RSSVM shows a significantly higher sensitivity
than other leading RNA identification programs while maintaining the
same false positive rate. RSSVM performs particularly well on sets with
low sequence identities. The combination of RNA Sampler and RSSVM
provides a new, fast and efficient pipeline for large-scale discovery of
regulatory RNA motifs.
To better understand how RSSVM
works, please read the paper.
Discovering cis-Regulatory RNAs
in Shewanella Genomes by Support Vector Machines
INSTALLATION:
-------------
Programs required for running RSSVM
1. Install RNAz (This RNAz is a
modified version to calculate the z-score of a set of sequences).
Download
RNAz_modified.
2. Install LIBSVM.
Download the latest
LIBSVM at the
author's website .
3. Install
RNA Sampler.
Download
RNA Sampler
from the next door O - O!
4. Install
Perl-5.8.8
or above.
Thanks the authors of RNAz,
LIBSVM and RNA Sampler for making their codes available.
Install RSSVM:
To compile this package you need
a C compiler.
1. Download this RSSVM-v1.0.tar package, and unzip it to a local directory $DIR
>tar -xzf RSSVM-v1.0.tar
2. Edit the driver file "RSSVM_driver.pl" to update the
paths for
RNAz, LibSVM and RSSVM.
In the driver script "RSSVM_driver.pl", edit the
following lines
"
my $RNAZ_PTH = "your path for
RNAz modified version";
my $LIBSVM_PTH = "your path for LibSVM";
my $RSSVM_PTH = "your path for RSSVM";
"
3. Test if RSSVM works
>perl
RSSVM_driver.pl
If the Usage messages are output, it means the
installation is correct,
otherwise go back and check the installation steps.
USAGE:
------
Usage: RSSVM_driver.pl [-p
path] [-q fasta file]
Required Parameters:
[-p <Absolute path of input sequence file and
corresponding basepairing probability matrix files>]
[-q <name of the input fasta file>]
EXAMPLES:
---------
In the /example directory, there is an example output file of RNA Sampler:
tRNA.RSout.
Go to the directory RSSVM_V1.0/example, see "*.cmdline"
files for some examples
of running RSSVM at command line.
>../RSSVM_driver.pl -p /home2/xingxu/tools/RSSVM-V1.0/example
-q tRNA.RSout
Tips:
The sequence name of the each sequence in fasta file should have no
space in it.
RESULTS
Example:
----------------
| |
# RSSVM 1.0 #
Input file: /home2/xingxu/public_html/RSSVM/data/RSSVM_V1.0/example/tRNA.RSout
N = 6 # Number of
Sequences
ID = 40.830 # Mean pairwise identity
Z = -2.027 # Mean z score
SCI = 0.864 # Structure conservation index based on common
structures predicted by RNA Sampler
I = 0.770 # Information content of alignments of
stem regions
MI = 0.655 # Mutual information of alignments of stem
regions
SVM RNA-class probability: 1.000 # The P-value
of SVM classification which can be used as a cutoff
Prediction: RNA
# P-value > 0.5 (More stringent P-value cutoffs
can be used to
reduce false positives) |
|
BUGS, QUESTIONS, COMMENTS:
--------------------------
Please email questions, comments and suggestions to:
Xing Xu, xingxu AT genetics.wustl.edu,
Yongmei Ji, yji AT genetics.wustl.edu,
Gary D. Stormo, stormo AT genetics.wustl.edu.
RSSVM is still under active development. Please check its latest version
at http://ural.wustl.edu/~xingxu/RSSVM/
A stand alone version of RSSVM
and its web server is coming soon
|