Alignment-free sequence comparison tools available for research purposes
Category Name Features Implementation Authors
Pairwise & multiple sequence comparisons ALF Calculate pairwise similarity scores (using N2 measure) for sequences in fasta file Software (C++) Göke et al. (2012)
decaf+py Package: 13 word-based measures, 5 Lempel-Ziv complexity-based measures, Average Common Substring and W-metric Software (Python) Ren et al. (2013)
multiAlignFree Multiple alignment-free sequence comparison using 5 word-based statistics R package
NASC Non-Aligned Sequence Comparison: 4 word-based measures (e.g. Mahalonobis distance); 2 IT-based measures (Kolmogorov complexity) Matlab framework Vinga & Almeida (2003)
Whole-genome phylogeny CAFE Alignment-free analysis platform for studying the relationships among genomes and metagenomes (offer 28 word-based dissimilarity measures) Software (C) Lu et al. (2017)
CVTree3 Phylogeny reconstruction from whole genome sequences based on word composition Web service
DLTree Automated whole genome/proteome-based phylogenetic analysis based on alignment-free dynamical language method. Web service Wu et al. (2017)
FFP Feature Frequency Profile-based measures for whole genome/proteome comparisons (from viral to mammalian scale) Software (C/Perl)
jD2Stat (JIWA) Generation of the distance matrix using 𝐷2S statistics to extract k-mers from large-scale unaligned genome sequences Software (JAVA) Chan et al. (2014)
kSNP v3 Word-based identification of SNPs in a set of genome sequences, and estimation of phylogenetic trees based upon those SNPs Software (C)
kr Efficient word-based estimation of mutation distances from unaligned genomes Software (C) Haubold et al. (2009)
FSWM fast approach to estimate phylogenetic distances between large genomic sequences based on inexact word matches Software (C++)
Web service
Leimeister et al. (2017)
kmacs k-mismatch average common substring approach to alignment-free sequence comparison Leimeister & Morgenstern (2014)
Spaced Fast alignment-free sequence comparison using spaced-word frequencies (a few minutes for pair of eukaryotic genomes of a few hundred Mb) Leimeister et al. (2014)
SlopeTree Whole genome phylogeny that corrects for Horizontal Gene Transfer Software (C++) Bromberg et al. (2016)
Underlying Approach Phylogeny of whole genomes using composition of subwords (Underlying Approach) Software (JAVA) Comin & Verzotto (2012)
Sequence similarity search tool RAFTS3 Searches of similar protein sequences against a protein database (>300 times faster than BLAST) Matlab Vialle et al. (2016)
Next Generation Sequencing AAF Phylogeny directly from unassembled genome sequence data Software (Python) Fan et al. (2012)
ChimeRScope Prediction of fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads Software (Java) Li et al. (2017)
d2-tools Word-based comparison (d2S measure) of metatranscriptomic samples from NGS reads Software (Python/R) Wang et al. (2012)
FastGT Calling common SNVs (single nucleotide variants) directly from raw sequencing reads Software (C) Pajuste et al. (2017)
FOCUS Identification of species in metagenomic samples based on composition usage Software (Python) Pham et al. (2017)
GSM Estimation of abundances of microbial genomes in metagenomic samples Software (Python) Silva et al. (2014)
kallisto Abundance quantification of transcripts from RNA-seq data Software (C++) Bray et al. (2016)
LAVA NGS-based computational SNP array for calling SNPs in dbSNP and Affymetrix’s Genome-Wide Human SNP Array. Software (C) Shajii et al. (2016)
MICADo Detection of mutations in targeted NGS data (distinguish patients’ specific mutations) Software (Python) Rudewicz et al. (2016)
RNA-Skim RNA-Seq quantification at transcript-level (transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer) Software (C++) Zhang & Wang (2014)
Sailfish Alignment-free word-based estimation of isoform abundances from a set of reference sequences and RNA-seq reads Software (C++) Patro et al (2014)
Salmon Patro et al (2015)
stringMLST An assembly- and alignment-free program capable of rapidly typing bacterial isolates directly from raw sequence reads Software (Python) Gupta et al. (2017)
QCluster Clustering of reads with alignment-free measures and quality values Software (C++) Comin et al. (2015)
Taxonomer User-friendly metagenomics analysis tools; detection of microorganism from next generation sequencing data in real-time Web service Flygare et al. (2016)
Annotation of long noncoding RNA lncScore Prediction of long noncoding RNA from assembled novel transcripts Software (Python) Zhao et al. (2016)
FEELnc Prediction of lncRNAs from RNA-seq samples based on a Random Forest model trained with multi k-mer frequencies and relaxed open reading frames. Software (Perl/R) Wucher et al. (2017)
Horizontal Gene Transfer (HGT) alfy Alignment-free local homology calculation for detecting horizontal gene transfer Software (C)
rush Detection of recombination between two unaligned DNA sequences Software (C) Haubold et al. (2013)
Smash Identification and visualization of genomic rearrangements between pairs of DNA sequences Software (C) Pratas et al. (2015)
TF-IDF Detection of HGT regions and the transfer direction in nucleotide/protein sequences Software (C++)
Regulatory elements D2Z Identification of functionally related homologous regulatory elements Software (Perl) Kantorovitz et al. (2007)
MatrixREDUCE Prediction of functional regulatory targets of TFs by predicting the total affinity of each promoter and orthologous promoters Software (Python) Ward & Bussemaker (2008)
RRS Detection of functionally similar group of enhancers and their regions Software (Perl/C) Koohy et al. (2010)
Sequence clustering d2_cluster Clustering EST and full-length cDNA sequences Software (C) Burke et al. (1999)
d2-vlmc Word-based clustering of metatranscriptomic samples using variable length Markov chains Software (Python) Liao et al. (2016)
mBKM Clustering of DNA sequences using Shannon entropy and Euclidean distance Software (Java) Wei et al. (2012)
kClust Large-scale clustering of protein sequences (down to 20-30% sequence identity) Software (C++) Hauser et al. (2013)
Other COMET Rapid classification of HIV-1 nucleotide sequences into subtypes based on prediction by partial matching compression Web service Struck et al. (2014)
HabiSign Comparison of metagenomes and identification of habitat-specific sequences Web service Ghosh et al. (2011)
MetaFast Calculating statistics of metagenome sequences and the distances between them Software (Java) Ulyantsev et al. (2016)
VaxiJen Antigen prediction based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Web service Doytchinova & Flower (2007)
VirHostMatcher Prediction of hosts from metagenomic viral sequences based on oligonucleotide frequency Software (C++) Ahlgren et al. (2017)