Execute

After downloading the application you can start it with

java -jar /path/to/sca.jar

in a terminal.

Input

The header lines of the fasta sequence files must have the following structure

>gi|13383270|gb|AB049155| /<u>Avian</u>/2(PB1)/H9N2/Japan/<u>1997</u>/// Influenza A virus ...

Important are the underlined fields - the family field (/Avian/) and the year field (/1997/). The number of "/" signs up to the year field is significant here!

Interaction

The graph can adapted to needs. Mostly using the wheel and click into the graph, see

Select an area: By keeping the control key down and the left mouse button pressed and moving the mouse you can select a portion of the displayed area that will be taken as the new displayed area. As long as you let the buttons pressed a rectangle is displayed that shows the borders of the selected area. Unpressing the mouse button redisplays the selected area.
Move the graph: Pressing the left mouse button and moving the mouse lets you move the graph to the disired position.
Scale the graph: Just turning the wheel lets you scale the display.
Resize the graph: Pressing "R" key rezises the graph to its optimum display. That means all points are visible and the scale of the axis pairs is as minimal as possible.
Colorize the graph: [Edit -> Change Colors] Changing the color of a displayed host name.
Change the formula: [Edit -> Set Formula] Change the calculated value for each sequence. Detailed information see below.

The formula

The formula can be entered in a dialog. It is a (short) numerical term and consists of numbers, the fundamental mathematical operators "+,-, *,/", round braces and specific functions to count the numbers of codons or nucleotid sequences. The calculation of "*,/" is done as usual (before "+,-") and the formula is always calculated for a single source sequence, not for the sum of all.

The specific functions are:

#C(...): Number of codons. This function may have multiple arguments, delimited by commas. Every argument must have 3 letters of A,C,G,T or the wildcard ".", which may represent any nucleotid. #C(.CC) will match every codon, that ends with CC. The empty brace is for counting all codons in the sequence.
#N(...): Number of nucleotid (sub)sequences. There is only one argument possible, which is the counted (sub)sequence and consists of the nucleotids A,C,G,T, the wildcard characters "." (for a single nucleotid) and "_" (for multiple nucleotids, even 0!). The empty brace counts the nucleotids in the sequence.
#A(...): Number of amino acids. The arguments are 1 letter in size and describe the name of the acids. Using #A() is the same as #C().

An example

Assume you want to display the GC-content of the set of sequences. This can be done using the formula: "#C(..C,..G) / #C()", which means: divide the number of codons ending with C oder G by the total number of codons in the sequence. If you want to have it in percent take "#C(..C,..G) / #C() * 100".

2025-06-05 10:02

LEGAL DISCLOSURE

DATA PROTECTION POLICY