The University of Edinburgh's Home Page
Sculptures by Phyllis Bone

Protein Sequencing

Protein sequencing was a technique developed relatively early on in molecular phylogenetics. As the primary structures of proteins are genetically determined, the degree of similarity in the amino acid sequences of the proteins reflects the genealogy between two species. For example, one of the most studied amino-acid sequences was that of cytochrome c, an ancient protein common to all aerobic organisms. The more differences in the sequence of the protein, the more distantly related the species. The amino acid sequence of human cytochrome c is identical to that of the chimpanzee, but differs from the dog's by 13 amino acids, and from the tuna by 31 amino acids (Campbell 1999).

The actual process of sequencing the proteins is achieved using gel electrophoresis, a technique first developed in the mid-1960's. Gel electrophoresis separates protein molecules by size and charge, and consequently allows small changes in their amino acid sequences to be detected. Once the sequence of amino acids that make up a particular protein is known, it can be compared with sequences of the protein from other species. From the degree of similarity between species, phylogenetic trees can be hypothesized, as Miyamoto and Goodman did for the orders of Eutheria in 1986. By translating the amino acid data for a- and b-haemoglobins, myoglobin, lens a-crystallin A, fibrinopeptides A and B, cytochrome c and ribonuclease into mRNA sequences, following the genetic code, the sequence data was arranged in an extended tandem alignment and then analysed as if each species was represented by a singlegigantic polypeptide. The species were then directly compared and trees were constructed which minimised the total number of nucleotide replacements (NR). This method of tree construction was developed by Goodman and Moore in the 1970's (Goodman et al 1979), and in this study produced several most-parsimonious trees.