Bioinformatics develops methods and software tools for understanding biological data. Databases become the heart of molecular biology. After the advent of Next generation sequencing, Databases are increasing. Three of the most common free databases are found in:
- National Center for Biotechnology Information (NCBI)
- European Molecular Biology Laboratory (EMBL)
- DNA Databank of Japan (DDBJ)
Bankit is used for small submission while sequin is used for larger submission.
There are many sequence storage formats to store DNA sequences. The commonly used DNA sequence formats are GenBank and FASTA. The conversion of one format to the another can be done using a program called READSEQ
Some important information about the sections and terms used can be learn from here
Pair wise sequence alignment is the procedure of arranging two sequences of DNA, RNA, or protein to identify regions of similarities between them. The similarities can be as identical, similar, homologous, paralogs, orthologs or xenologs. Heuristics alignment find multiple candidate alignments quickly. They work by being given a query sequence and then find similar sequences (i.e. targets) in a database of sequences.
Basic Local Alignment Search Tool (BLAST) finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. The FASTA does similar operations.
Multiple Sequence Alignments (MSA) is the alignment of three or more sequences. The CLUSTALO is a useful tool or MSA.
Alternative MSA tools include:
- BIOEDIT: http://www.mbio.ncsu.edu/BioEdit/bioedit.html
- CINEMA: http://130.88.97.239/CINEMA2.1/
- DALIGN-T: http://dialign-tx.gobics.de/index
- DCA: https://bibiserv.cebitec.uni-bielefeld.de/dca/submission.html
- KALIGN: http://msa.sbc.su.se/cgi-bin/msa.cgi
- KALIGNVU: http://msa.sbc.su.se/cgi-bin/msa.cgi
- MAFFT: https://toolkit.tuebingen.mpg.de/mafft
- PRANK: http://www.ebi.ac.uk/goldman-srv/prank/prank/
- PRALINE: http://www.ibi.vu.nl/programs/pralinewww/
- PROBCONS: http://probcons.stanford.edu/
- SEAVIEW: http://doua.prabi.fr/software/seaview
Phylogenetics can be done as:-
- Distance-based using clustering – UPGMA, Neighbor-Joining
- Distance-based using optimal search criteria – Minimum Evolution
- Character-based using optimal search criteria – Maximum Parsimony, Maximum Likelihood
PHYLIP can be used for inferring phylogenetics. Neighbur joining approach can be done with the help of
Gene prediction tool include
Tools For Eukaryotes:
HHMGENE (http://www.cbs.dtu.dk/services/HMMgene/)
GENEID (http://genome.crg.es/geneid.html)
NETGENE2 (http://www.cbs.dtu.dk/services/NetGene2/)
GENEMARK (http://exon.gatech.edu/GeneMark/)
GENSCAN (http://genes.mit.edu/GENSCAN.html)
GENLANG (http://arete.ibb.waw.pl/PL/html/gene_lang.html)
AUGUSTUS (http://bioinf.uni-greifswald.de/augustus/submission.php)
Tools For Prokaryotes
EASYGENE (http://www.cbs.dtu.dk/services/EasyGene/)
GENEMARK(http://opal.biology.gatech.edu/genemark/gmhmmp.cgi)
AMIGENE(http://www.genoscope.cns.fr/agc/tools/amigene/Form/form.php)
PRODIGAL (http://prodigal.ornl.gov/)
A list of Comparative-based Gene Prediction Online Tools include:-
AUGUSTUS (http://bioinf.uni-greifswald.de/augustus/)
GRAILEXP (https://omictools.com/gene-recognition-and-analysis-internet-link-tool )
SGP2 (http://genome.crg.es/software/sgp2/sgp2.html )
GENVIEW2 (http://zeus2.itb.cnr.it/~webgene/wwwgene.html)
TWINSCAN (http://mblab.wustl.edu/software.html)
GENEWISE2 (http://www.ebi.ac.uk/Tools/psa/genewise/)
GENOMESCAN (http://genes.mit.edu/genomescan.html)
GENEMACHINE (https://genemachine.nhgri.nih.gov/)
Benchmark helps us to decide which tool to use.
NCBI Genome stores data for genomic information. The most common tools for genomic comparison are as follows:-
- UCSC Genome Browser: http://genome.ucsc.edu/cgi-bin/hgGateway
- Ensembl Project Browser: http://www.ensembl.org/index.html
- VISTA Browser: http://pipeline.lbl.gov/cgi-bin/gateway2
- ECR Browser: https://ecrbrowser.dcode.org/