ProbCons

1

In bioinformatics and proteomics, ProbCons is an open source software for probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.

Algorithm

The following describes the basic outline of the ProbCons algorithm.

Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters x_i and y_i are paired in a^* an alignment that is generated by the model. (Where is equal to 1 if x_i and y_i are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

The accuracy of an alignment a^* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence. Calculate expected accuracy of each sequence: This yields a maximum expected accuracy (MEA) alignment:

Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences \mathcal{S} are now re-estimated using all intermediate sequences z: This step can be iterated.

Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

Edit article