Institute of Bioinformatics Münster

Start the application either by clicking on "Start Application" or download it and start with

java -jar sca.jarin a terminal.

The header lines of the fasta sequence files must have the following structure

>gi|13383270|gb|AB049155| /<u>Avian</u>/2(PB1)/H9N2/Japan/<u>1997</u>/// Influenza A virus ...Important are the underlined fields - the family field (/Avian/) and the year field (/1997/). The number of "/" signs up to the year field is significant here!

The graph can adapted to needs. Mostly using the wheel and click into the graph, see

- Select an area: By keeping the control key down and the left mouse button pressed and moving the mouse you can select a portion of the displayed area that will be taken as the new displayed area. As long as you let the buttons pressed a rectangle is displayed that shows the borders of the selected area. Unpressing the mouse button redisplays the selected area.
- Move the graph: Pressing the left mouse button and moving the mouse lets you move the graph to the disired position.
- Scale the graph: Just turning the wheel lets you scale the display.
- Resize the graph: Pressing "R" key rezises the graph to its optimum display. That means all points are visible and the scale of the axis pairs is as minimal as possible.
- Colorize the graph: [Edit -> Change Colors] Changing the color of a displayed host name.
- Change the formula: [Edit -> Set Formula] Change the calculated value for each sequence. Detailed information see below.

The formula can be entered in a dialog. It is a (short) numerical term and consists of numbers, the fundamental mathematical operators "+,-, *,/", round braces and specific functions to count the numbers of codons or nucleotid sequences. The calculation of "*,/" is done as usual (before "+,-") and the formula is always calculated for a single source sequence, not for the sum of all.

The specific functions are:- #C(...): Number of codons. This function may have multiple arguments, delimited by commas. Every argument must have 3 letters of A,C,G,T or the wildcard ".", which may represent any nucleotid. #C(.CC) will match every codon, that ends with CC. The empty brace is for counting all codons in the sequence.
- #N(...): Number of nucleotid (sub)sequences. There is only one argument possible, which is the counted (sub)sequence and consists of the nucleotids A,C,G,T, the wildcard characters "." (for a single nucleotid) and "_" (for multiple nucleotids, even 0!). The empty brace counts the nucleotids in the sequence.
- #A(...): Number of amino acids. The arguments are 1 letter in size and describe the name of the acids. Using #A() is the same as #C().

Assume you want to display the GC-content of the set of sequences. This can be done using the formula: "#C(..C,..G) / #C()", which means: divide the number of codons ending with C oder G by the total number of codons in the sequence. If you want to have it in percent take "#C(..C,..G) / #C() * 100".

2013-06-25 07:57