Wednesday, May 07, 2008

On Codon Usage

Each amino acid in a protein sequence is represented by a 3-letter 'word' (codon) in the genetic code. Since there are 4 'letters' (A,C,G,T) there are 64 potential words to represent 20 amino acids, plus stop codons. The code is unambiguous - each codon represents only a single amino acid. It also has redundancies - most amino acids are represented by multiple codons (glycine, for example, can be represented 4 different ways). One might think that the diversity of life on the planet would come with a diverse difference in codon usage. This is not the case. There are differences in codon preference, both within and across species but usage is almost universal. For example, in humans the triplet ATC (20.8 codons/1000 codons) is preferred over the triplet ATA (7.5 codons/1000 codons) and in the yeast S. cerevisiae ATT is preffered to both of those (30.1 codons/1000 codons). However, in each of those cases - and virtually every other species - all three of those triplets code for isoleucine. Codon preference is related to abundance of the respective transfer RNA. (Larry Moran touches upon codon bias and why mutations that change the codon but not the amino acid may not be neutral in an article here)

There is experimental evidence for a universal genetic code:
mRNAs can be correctly translated by the protein synthesizing machinery of very different species. For example, human hemoglobin mRNA is correctly translated by a wheat-germ extract [...] bacteria efficiently express recombinant DNA molecules encoding human proteins such as insulin.
(Stryer, L. Biochemistry 3rd Ed. p 108)
A universal code is the basis of many techniques (and headaches) in the lab. For example, in vitro protein synthesis can involve rabbit reticulocyte lysates (or wheat germ, as above) translating non-rabbit proteins. Non-mouse sequences can be used to introduce genes into mice. E. coli is often used for recombinant protein production. In this latter case, the difference in codon preference between E. coli and other species is a common problem for high level recombinant expression (eg. if a codon is preferred in humans - CCC for proline - but not in E. coli, this limiting tRNA could hinder protein production).

That the genetic code is universal is not entirely true; some inter-species differences are being discovered. There are some species, such as ciliated protozoa have slight variations (in ciliates, TAA and TAG are glutamine rather than stop codons). Mitochondria are another important exception.

Mitochondria carry their own circular DNA which encodes for, among other things, a set of 22 tRNAs. Because it doesn't use the set of nuclear-encoded tRNAs, it isn't restricted to the standard code. In fact, human mitochondrial codon use differs from nuclear codon use in 4 places. For example, in the isoleucine example above, the codon AUA codes for methionine in mitochondria (see table, reproduced from Stryer). This isn't news, but something I failed to appreciate before. A difference in codon usage between species might not be surprising (in fact the consistancy in usage among species is surprising - until you consider the far reaching effects a change in codon use would have: Every protein would be affected). A difference in usage within a single cell is more striking, unless you're familiar with endosymbiotic theory.

Endosymbiotic theory, popularized by Lynn Margulis, describes the origins of eukaryotic organelles: mitochondria and chloroplasts. These organelles were once autonomous organisms that were taken up by other cells in a symbiotic relationship. Both organelles have strong resemblences to the proposed parent prokaryotes, as detailed in the above link. Codon use separate from nuclear DNA can be added to that list.

Read more about the different codon usage sets here.
Codon preference numbers from here.


Anonymous said...

I had no idea that mitochondria were so messed up. I think I heard that eGFP is just GFP with codons optimized for expression in mammalian cells. I guess this means jellyfish have different codon usage than mammals.

kamel said...

I guess this means jellyfish have different codon usage than mammals.

It's possible, there are some minor variations from the standard code in some species if you check out the codon tables in the second-to-last link. But it could be the optimization is in terms of codon preference (see the isoleucine example above). The codons may be the same in the two species, but the eGFP has been changed to use ones that have more common tRNAs in mammalian cells for more efficient translation. I'm not sure which is the case.

Thomas Schoedl said...

If you want to get a graphical representation on the codon usage differences visit my website - graphical codon usage analyzer.