GScompare
Comparing oligonucleotide-based genomic signatures among sequences
Information | Tools | Create Personal Account | Home
Comparison of Genomes: Ensembl Genomes Release 35 (April 2017; 44,039 genomes).

Genomic signature refers to the characteristic frequency of oligonucleotides in a genome or sequence.

Genomes are not a random sequence of bases and neither it is their oligonucleotide composition. Oligonucleotides may be present in the genome more (over-representation) or less (under-representation) often than expected from its nucleotide composition due to the evolutionary presure within the cell internal enviroment. As the internal biochemical enviroment of phylogenetically related organisms is similar, oligonucleotide composition of phylogenetically related organisms is similar.

This website has been optimized to compare oligonucleotide composition of sequences by computing the Genomic Signature Distance (see video). To select this specific method, we performed various experiments that are described in the documents below. We searched many combinations of frequencies and distances to find the one with highest performance, and found that the best candidates was the Genomic Signature Distance.

  • General recommendations:
    • To compare complete bacterial genomes, we recommned computing Genomic Signature Distance for hepto- or octanucleotides.
    • Comparing composition of longer oligonucleotides among sequences means using more information and being more accurate, but
    • Comparing composition of longer oligonucleotides among short sequences may be useless.
       
  • Searching the oligonucleotide composition of bacterial genomes confirms that:
    • Genomic signature is similar for members of the same bacterial species (check this example for Bacillus spp.).
    • Eventhough distances among members of the same species is very small, searching distances based in longer oligonucleotides allows to cluster together related genomes. A nice example is the clustering of strains belonging to the same Salmonella enterica serotype here (randomly selected 500 genomes; source: Ensembl Genomes).
    • The clustering of the marine cyanobacterium Prochlorococcus is also a nice example.
    • When two or more chromosomes are present within a cell, the genomic signatures of those elements are extremely similar (more).
    • The longer a plasmid present within a prokaryotic cell, the smaller the distance between the chromosome and the plasmid (more).
       
  • Computing distances for genomic signatures:
    • This website computes the Genomic Signature Distance among oligonucleotide frequencies from different sequences. In this document is described how to compute faster the Genomic Signature Distances.
    • To learn why the Genomic Signature Distance was selected to compute oligonucleotide frequencies see this video or you can check the documents bellow.
      • Different types of oligonucleotide frequencies and distances are described in this file
      • The calculation of different combinations of frequencies and distances, with the aim of correctly assignly DNA sequences to their sources or to the most similar genomes, is described in this file
      • Influence of G+C content of sequences in distance is described in this file

       
  • Software used by this service
    The website was developed with PHP running on an Apache server with Linux operating system. For some computationally intensive tasks, C programs are used.
     
  • Recent improvements
    More info here.
     
  • Contact
    Send your comments to Joseba.Bikandieus
     
  • Acknowledment
    This project was originaly funded by the Spanish Ministry for Education and Science under the Programme CONSOLIDER-INGENIO 2010 and more recently by from European Food Safety Authority (EFSA), grant agreement GP/EFSA/AFSCO/2015/01/CT2 (New approaches in identifying and characterizing microbial and chemical hazards) and from the Government of the Basque Country.