Target genome | Reference genome | Number of predicted genes |
---|---|---|
P. californicus | A. melliferra | 711 |
P. californicus | C. floridanus | 2,207 |
P. californicus | S. invicta | 3,151 |
P. californicus | P. barbatus | 8,944 |
P. californicus | Summary of unique predictions from A. melliferra, C. floridanus, S. invicta, and P. barbatus | 15,013 |
The GeneModelMapper (GeMoMa) iis an annotation pipeline used to find protein coding genes while using the genome of an already annotated relative species as a reference, genome. First it extracts protein-coding exons from well-annotated reference genomes (see Figure 1). After this, individual exons are matched to locations on the target genome by tblastn. GeMoMa tries to match the resulting models from a tblastn run to the genome by using additional RNA sequences from the target species. The resulting model of this step can have a huge variance depending on the quality and completeness of the reference species. Four different species (A. melliferra, C.floridanus, S. invicta, and P. barbatus) which are relative to P. californicus (see Overview) were used as reference genomes (see Table in Results). The final step includes the merging of the different runs by using different reference genomes and filtering them with a final annotation filter. Finally, all resulting annotations are filtered for identical duplicated genes.
The final GeMoMa gene prediction is based on species which are relatives on different levels. The most relative species to P. californicus is P. barbatus, then S. invicta, and the least is C.floridanus. The smaller the pylogenetic distance of the species is to P. californicus, the more the predictions are based on them as a reference organism.
Download GeMoMa annotation: GeMoMa_annotation.gff
Download Protein sequences: GeMoMa_proteins.fasta
Download Transcript annotation: GeMoMa_transcript.fasta