RNA structures are essential in many biological processes and are often conserved in evolution. Examples of such conserved structures are found in tRNA (1), rRNA (2,3), tmRNA (4), RNase P RNA (5) and SRP RNA (6). Many computational methods have been developed for predicting RNA structures. That two given columns form a pair together. This can be incorporated in the prediction by extending a stem if immediate neighbours can form base pairs. Another obvious change is to remove non-standard base pairs from individual sequences. This approach does not change relative prior distributions of allowed structures. This method was successfully used in the RNA part of the non-coding RNA gene finding algorithm by Rivas and Eddy (18). When treating gaps as unknown nucleotides, a gapped sequence position should have probability one for any nucleotide. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. RNA sequences with a probabilistic model for secondary structures.

The KH-99 algorithm uses a stochastic context-free grammar (SCFG) to produce a prior probability distribution of RNA structures. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. This work improves the KH-99 algorithm primarily by making it faster and more robust toward alignment errors. It assumes an alignment and gives one common structural prediction for all the sequences. That a given column is involved in a pair. As a side note, no loops of length two are allowed in this implementation, as opposed to the KH-99 algorithm. It was made from the KH-99 algorithm by summing the loop rate matrix and a reduction of the base-pair rate matrix to single positions. In the KH-99 algorithm, the tree was estimated through a maximum likelihood method using the SCFG model. The interpretation of this is most obvious in terms of sequencing errors, but the method works for alignment errors and structure differences, too. A much faster method is to estimate the tree first. The tree is calculated from pairwise distances using the neighbour joining algorithm (20) and adjusting branch lengths to maximum likelihood estimates. In Pfold, pairwise distances between sequences are calculated using maximum likelihood. Column probabilities are calculated using the likelihood approach by Felsenstein (14). The evolution of column pairs is modelled using a rate matrix for base-pairs (i.e. a 16 by 16 matrix).

