On Feb 26, 2007, A.Th.E.N.S. Presents

what determines gene expression levels?

by Grzegorz Kudla, PhD

An E. coli cell contains around 4,000 genes and 107 protein molecules. The abundance of individual proteins ranges from fewer than 10 to more than 104 copies per cell. Can we predict, using sequence data, which genes will be efficiently expressed? And can we identify the sequence features responsible for the expression characteristics of genes? I will present a new experimental platform to study these questions: a library of GFP genes with randomized codon usage. The library allows measuring the effects of coding sequence variation on gene expression levels, while controlling for promoter and noncoding sequence variation. I sequenced 120 clones from the library and expressed them in E.coli cells in a range of experimental conditions. I currently use various data mining approaches to interpret the results. Initial analyses suggest a large heterogeneity of sites with respect to their effects on expression, and an unexpectedly high effect of coding sequence on transcription efficiency.