PAM (Dayhoff) matrices
The point accepted mutation (PAM) substitution model, also known as the Dayhoff substitution model, is an amino acid substitution model derived from empirica...
The point accepted mutation (PAM) substitution model, also known as the Dayhoff substitution model, is an amino acid substitution model derived from empirica...
Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, a...
Like the forward algorithm, we can use the backward algorithm to calculate the marginal likelihood of a hidden Markov model (HMM). Also like the forward algo...
The Viterbi algorithm identifies a single path of hidden Markov model (HMM) states. This is the path which maximizes the joint probability of the observed da...
The Viterbi algorithm is used to efficiently infer the most probable “path” of the unobserved random variable in an HMM. In the CpG islands case, this is the...
Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, a...
In this example we will calculate the likelihood \(P(D|T,h)\) where \(D\) is a single site, \(T\) is a rooted tree topology, and \(h\) is the node heights fo...
The Sankoff algorithm can efficiently calculate the parsimony score of a tree topology. Felsenstein’s pruning algorithm can efficiently calculate the probabi...
Long branch attraction is the phenomenon where two branches which are in truth not sisters are inferred to be sister branches when using maximum parsimony in...
The likelihood of a tree is the probability of a multiple sequence alignment or matrix of trait states (commonly known as a character matrix) given a tree to...
Certain mutations are more surprising than others. DNA is composed of a string of nucleotides, which are either pyrimadines (cytosine or thymine) or purines ...
The principle behind maximum parsimony based inference is to explain the data using the smallest cost. In its most basic form, all events are given equal cos...
Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, a...
Neighbor joining is similar to UPGMA/WPGMA, but infers unrooted trees. As a consequence, and unlike UPGMA/WPGMA, it does not require that the multiple sequen...
Two related methods for infer phylogenetic trees from multiple sequence alignments (MSAs) are the Unweighted Pair Group Method with Arithmetic Mean (UPMGA) a...
The most commonly used nucleotide substitution models for phylogenetic reconstruction belong to the general time reversible (GTR) family. The main reason for...
The crudest way to compare two trees is to somehow quantify the difference between them as a single number. There are many methods of quantifying the differe...
Bifurcating trees have found enormous utility in biology. Depending on what the tips of a tree represent, they can be used as a model to help understand vari...
Hidden Markov models (HMMs) are a class of Markov models where the same states of a random variable (e.g. the four nucleotides of DNA) can be generated by di...
Markov processes model the change in random variables along a time dimension, and obey the Markov property. Informally, the Markov property requires that the...
When computing PSSMs, averaging does not take into account the amount of information available in a multiple sequence alignment (MSA) used to construct a PSS...
Bayesian statistics is the practice of updating the probability of the value of some parameter θ of model M being the true value, based on observations (D fo...
The probability of a value can mean its probability of being the true value of some parameter θ, conditional on some model M or ensemble of models. It can al...
Global and local alignment tools like BLAST are typically used to search for and align homologous sequences which are related by common descent. However we m...
Hash tables are a very good general solution for providing quick access to values (e.g. k-mer positions) based on keys (e.g. k-mers). There is an even more e...
To access elements in an array or matrix, programming languages typically implement one of two addressing schemes. One of the schemes uses one-based numberin...
BLAST and FASTA are based on building an index from the query sequences, and scanning the database sequences for k-mers which are present in the query index....
Pure Smith–Waterman is unpopular because of its poor scaling. Given two sequences of unequal lengths n and m, we have to compute nm intermediate values. We t...
Amino acids, nucleotides or any other evolutionary character are replaced by others at some rate. For example, imagine an evolutionary sequence with three po...
There are many ways to compute score matrices for amino acid alignment. Typically score matrices are based on the log-odds of amino acid replacements, and so...
The Smith–Waterman algorithm is basically the Needleman–Wunsch, but with a few modifications to turn it into a local alignment algorithm. These are:
Global alignment algorithms are used to align two globally similar sequences. Commonly global alignment is used to identify first whether two protein, DNA or...
Recursion is where a single function inside a computer program calls itself. This turns out to be an extremely versatile tool in the computer science toolbox...
Important note: The information contained in the course syllabus, other than the absence policies, may be subject to change with reasonable advance notice, a...