Sca - Usage
Usage
Start the application either by clicking on "Start Application" or
download it and start with
java -jar sca.jar
in a terminal.
Input
The header lines of the fasta sequence files must have the following
structure
>gi|13383270|gb|AB049155| /<u>Avian</u>/2(PB1)/H9N2/Japan/<u>1997</u>/// Influenza A virus ...
Important are the underlined fields - the family field (/Avian/) and
the year field (/1997/). The number of "/" signs up to the year
field is significant here!
Interaction
The graph can adapted to needs. Mostly using the wheel and click
into the graph, see
-
Select an area: By keeping the control key down and the left mouse
button pressed and moving the mouse you can select a portion of
the displayed area that will be taken as the new displayed area.
As long as you let the buttons pressed a rectangle is displayed
that shows the borders of the selected area. Unpressing the mouse
button redisplays the selected area.
-
Move the graph: Pressing the left mouse button and moving the
mouse lets you move the graph to the disired position.
-
Scale the graph: Just turning the wheel lets you scale the
display.
-
Resize the graph: Pressing "R" key rezises the graph to its
optimum display. That means all points are visible and the scale
of the axis pairs is as minimal as possible.
-
Colorize the graph: [Edit -> Change Colors] Changing the color of
a displayed host name.
-
Change the formula: [Edit -> Set Formula] Change the calculated
value for each sequence. Detailed information see below.
The formula
The formula can be entered in a dialog. It is a (short) numerical
term and consists of numbers, the fundamental mathematical operators
"+,-, *,/", round braces and specific functions to count the numbers
of codons or nucleotid sequences. The calculation of "*,/" is done
as usual (before "+,-") and the formula is always calculated for a
single source sequence, not for the sum of all.
The specific functions are:
-
#C(...): Number of codons. This function may have multiple
arguments, delimited by commas. Every argument must have 3 letters
of A,C,G,T or the wildcard ".", which may represent any nucleotid.
#C(.CC) will match every codon, that ends with CC. The empty
brace is for counting all codons in the sequence.
-
#N(...): Number of nucleotid (sub)sequences. There is only one
argument possible, which is the counted (sub)sequence and consists
of the nucleotids A,C,G,T, the wildcard characters "." (for a
single nucleotid) and "_" (for multiple nucleotids, even 0!). The
empty brace counts the nucleotids in the sequence.
-
#A(...): Number of amino acids. The arguments are 1 letter in
size and describe the name of the acids. Using #A() is the same
as #C().
An example
Assume you want to display the GC-content of the set of sequences.
This can be done using the formula: "#C(..C,..G) / #C()", which
means: divide the number of codons ending with C oder G by the total
number of codons in the sequence. If you want to have it in percent
take "#C(..C,..G) / #C() * 100".
2013-06-25 07:57