Genomic signature refers to the characteristic frequency of oligonucleotides in a genome or sequence.
Genomes are not a random sequence of bases and neither it is their oligonucleotide composition. Oligonucleotides may be present
in the genome more (over-representation) or less (under-representation) often than expected from its nucleotide composition
due to the evolutionary presure within the cell internal enviroment. As the internal biochemical
enviroment of phylogenetically related organisms is similar,
oligonucleotide composition of phylogenetically related organisms is similar.
This website has been optimized to compare octanucleotide composition of sequences by computing the
Genomic Signature Distance (see video).
To select this specific method, we performed various experiments that are described in the documents below.
We searched many combinations of frequencies and distances to find the one with highest performance, and
found that the best candidates was the Genomic Signature Distance.
- Searching the oligonucleotide composition of bacterial genomes confirms that:
- Genomic signature is similar for members of the same bacterial species (check
this example for Bacillus spp.).
- Eventhough distances among members of the same species is very small, searching distances based in longer
oligonucleotides allows to cluster together related genomes.
A nice example is the clustering of strains belonging to the same Salmonella enterica serotype here
(randomly selected 500 genomes; source: Ensembl Genomes).
- The clustering of the marine cyanobacterium Prochlorococcus is also a nice example.
- When two or more chromosomes are present within a cell, the genomic signatures of those elements are extremely similar
- The longer a plasmid present within a prokaryotic cell, the smaller the distance between the chromosome and the plasmid
- Computing distances for genomic signatures:
- This website computes the Genomic Signature Distance
among oligonucleotide frequencies from different sequences. In this document
is described how to compute faster the Genomic Signature Distances.
- To learn why the Genomic Signature Distance was selected to compute oligonucleotide frequencies see this video
or you can check the documents bellow.
- Different types of oligonucleotide frequencies and distances are described in this file
- The calculation of different combinations of frequencies and distances, with the aim of correctly assignly
DNA sequences to their sources or to the most similar genomes, is described in this file
- Influence of G+C content of sequences in distance is described in this file
- Software used by this service
The website was developed with PHP running on an Apache server with Linux operating system.
For some computationally intensive tasks, C programs are used.
- Recent improvements
More info here.
Send your comments to Joseba.Bikandieus
This project was originaly funded by the Spanish Ministry for Education and Science under the Programme
CONSOLIDER-INGENIO 2010 and more recently by from European Food Safety Authority (EFSA),
grant agreement GP/EFSA/AFSCO/2015/01/CT2 (New approaches in identifying and characterizing microbial and chemical hazards)
and from the Government of the Basque Country.