Ribosome Traffic on mRNAs Maps to Gene Ontology: Genome-wide Quantification of Translation Initiation Rates and Polysome Size Regulation

The expression of genes can be considered as a two-stage process, beginning with transcription and the production of an mRNA, followed by translation of that mRNA into protein by the cell’s ribosome population. Gene expression must be tightly regulated to control protein composition, enabling the cell to rapidly respond to a wide range of environmental conditions. For this reason, cells exert fine control over gene expression, both at the transcriptional, and post-transcriptional level.


The expression of genes can be considered as a two-stage process, beginning with transcription and the production of an mRNA, followed by translation of that mRNA into protein by the cell’s ribosome population. Gene expression must be tightly regulated to control protein composition, enabling the cell to rapidly respond to a wide range of environmental conditions. For this reason, cells exert fine control over gene expression, both at the transcriptional [1], [2]and post-transcriptional level [3]–[6].

One key mechanism of post-transcriptional control of gene expression is translational regulation. The process of translation can be divided in three main phases, namely initiation, elongation and termination. Whereas termination is generally believed to be a fast process and therefore not limiting for translation [7], the respective contributions of initiation and elongation to translational regulation are still under debate [8].

On one hand, the translation initiation rate, or the rate at which ribosomes access the 5′ untranslated region (5′ UTR) and start translating the ORF, is regulated in part by formation of secondary structures in the 5′ leader [9], [10]. The presence of secondary structures inhibits the ability of an mRNA to sequester ribosomes, thereby lowering the effective translation initiation rate. The 5′ leader composition is characteristic of each mRNA, resulting in a heterogeneity of the ribosome recruitment process among the transcripts [11], [12]. Despite the importance of this process in gene expression regulation, there are currently no estimates of in vivo, mRNA-specific translation initiation rates based on refined traffic models, and how they regulate genome-wide patterns of protein expression. On the other hand, there is increasing evidence that translation elongation itself controls gene expression, being regulated by the rate of supply of tRNAs, particularly in microorganisms with codon biased genomes. Within families of isoacceptor tRNAs, members are not all present at the same concentration in the cell, leading to variation in delivery times, and the introduction of stochastic pauses [13]. Such pauses control ribosome transit, regulating ribosome queue formation. There is evidence that a ramp of slow codons near the 5′ end of some open reading frames regulates the flow of ribosomes onto an mRNA [14], [15], and pausing during elongation on any mRNA will affect queue dynamics, and thus the flux (or current) of ribosomes along the mRNA. However, there is no knowledge of how, on a genome-wide scale, the dynamic flux of ribosomes along an mRNA might be crucial in regulating protein expression.

Here, we address these two problems: first, we estimate mRNA-specific in vivo translation initiation rates on a genome-wide basis by integrating a computational model of mRNA translation with experimental datasets of ribosome occupancy. Crucially, we show that translation initiation rates are correlated with gene function. Second, we show that the translation dynamics response of each mRNA is characteristic of its gene ontology, by elucidating how ribosome traffic, moving with variable speed across the codon field, responds to a range of initiation rates. We also show that codon arrangement rather than codon usage, clearly separates mRNAs into distinct classes typified by their responses to variations of the translation initiation rate. This suggests that not only codon usage but also codon arrangement is a selectable determinant of gene expression.


The model

Our model describes how ribosomes bind to the mRNA, move along it performing the translation, and dissociate from the mRNA at the stop codon, releasing the finished protein into the cytoplasm [16]. The mRNA is represented by a unidimensional lattice, with each site denoting a codon. Ribosomes are represented by particles occupying 9 codons [15] that attempt to bind the mRNA with a rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e001 , provided that the binding region is not obstructed by another ribosome. The particle on-rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e002  mimics the initiation of translation, in which several processes have been condensed into just one step. The factors influencing the initiation of translation, such as secondary structures in the 5′UTR, concentration of initiation factors and ribosome availability, are all included in this parameter and will be discussed below. Subsequently, ribosomes advance on the polynucleotide chain (elongation) following a two-state dynamics: (1) recognition of the cognate tRNA with rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e003  depending on the codon  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e004 , and (2) translocation towards the next codon with rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e005  (see Figure 1). At the last codon, the ribosomes detach and release the protein with a rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e006  (termination).


Figure 1. The model.

Particles representing ribosomes move along a unidimensional lattice (the mRNA chain) in which each site represents a codon. For the sake of illustration, in the sketch a particle covers 3 codons, while in the model we considered particles occupying 9 codons [15]. (A) Schematic representation of ribosome dynamics: along the mRNA, ribosomes with the A site on codon  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e007  capture the cognate tRNA with a rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e008 , then keep it and advance with a rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e009 , provided that the following codon is empty. (B) The entire translation process can be viewed as particles moving on a lattice. Ribosomes attempt to initiate the translation with a rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e010 . Then they move according to the dynamical rules introduced above and at the end of the lattice the ribosomes detach with a termination rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e011 . Particles can queue if the bottlenecks in the lattice cannot support the incoming flow.

The cognate tRNA-capture rates  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e012  can be estimated from data on tRNA abundances, which are assumed to be proportional to their gene copy numbers [17], and by considering further corrections such as the wobble base pairing (see Supplementary Information, Text S1). Effects of competition for near-cognate and non-cognate tRNAS were found not to materially affect any of the conclusions of this study (see Supplementary Information, Text S1), and are therefore neglected. The translocation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e013  has been measured to be 35  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e014  [18], and is codon independent. The termination rate is determined by the concentration of the release factors; the termination process is assumed to be fast, comparable to the translocation [7].

Moreover, the model takes into consideration steric interactions between ribosomes, so that even if a ribosome sitting on codon  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e015  has already captured the cognate tRNA, it cannot translocate if the next codon is occupied by another ribosome. Hence, it is an exclusion process [19] exhibiting different regimes characterised by the flow of particles and by their density along the lattice. In particular, if the sequence contains slow sites, then queues of particles behind the slow sites or high density phases appear when the on-rate of particles is of the same magnitude as the bottleneck rate.

In contrast to commonly used exclusion models [14], [20], our model accounts for the processes involved in the mechano-chemical ribosome cycle, condensing them in two main steps: capture of the tRNA and translocation. It includes the crucial fact that ribosomes can capture a cognate tRNA while they wait for the next lattice position to become vacant. In contrast, ribosomes from simpler exclusion models unrealistically “lose” immediately the captured tRNA if they cannot move to the next codon. This is a key difference, which leads to different dynamics of ribosome traffic and transitions between traffic regimes [16]. This effect is further enhanced by the fact that the time scales related to the capture of the tRNA and translocation are strongly separated, with the translocation being much faster. Furthermore, the two-state ribosome reproduces the dwell-times observed in single-molecule experiments [21].

In summary, our model predicts the current of ribosomes  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e016  or translation rate, and the density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e017 of ribosomes on a particular mRNA (number of ribosomes divided by the ORF length), taking as input the specific sequence of codons of the mRNA. Both the translation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e018  and the ribosome density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e019  are predicted as a function of the translation initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e020 , i.e. the rate at which ribosomes arrive at the start AUG codon; their functional dependence on the initiation rate thus varies from sequence to sequence as a consequence of different codon compositions and codon arrangements.

Genome-wide prediction of translation initiation rates

The translation initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e021 , i.e. the rate at which ribosomes start translating the ORF, depends on many factors, such as the rate at which ribosomes attempt to bind the mRNA, the concentration of initiation factors and the presence of secondary structures in the 5′UTR region [9], [11], [12]. Despite the key role of this parameter, direct experimental evaluations are intractable, both in vivo and in vitro, with no direct measurements having been carried out to date.

Previous works, such as [20], could only estimate the translation initiation rate as the value that maximised the predicted correlation of the ribosome current with experimental data. Furthermore,  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e022  has usually been considered as a unique, fixed value (the same for each of the mRNAs), but it is well known that the translation initiation rate depends on several mRNA-specific factors, such as the structural properties of the mRNA leader region. Knowledge of mRNA-specific values of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e023 , therefore, would provide important insight into control of gene expression at the level of translation. Siwiak and Zielenkiewicz [22] present specific initiation rates, however with a simple model that neglects ribosome kinetics and traffic (the comparison is discussed in the Supplementary Text S2). Here we present a novel approach to identify the initiation rate of each individual mRNA for the whole genome.

We first apply our translation model to all mRNA sequences of S. cerevisiae. The model predicts the translation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e024  and the ribosome density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e025  on each mRNA as a function of the translation initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e026 . Then, by utilising genome-wide experimental data of ribosome density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e027  from [23] for yeast grown under non-stressed conditions, we identify the physiological translation initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e028  as the one which, when used in our simulations, replicates the experimentally observed density: http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e029(1)This yields a value of the translation initiation rate for each mRNA  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e030  as shown by the genome-wide distribution in Figure 2. Using the genome-wide experimental data of ribosome density from Arava et al. [7] yields a very similar distribution of initiation rates (see Section 4 of Supplementary Text S1). The knowledge of this distribution reveals how translational regulation of gene expression works at the level of initiation by correlating the values of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e031  with the biological functions of the corresponding genes, encoded in their Gene Ontology (GO) annotations. In Figure 2 we split up this distribution in four parts, from small to high  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e032  (i)–(iv). Strikingly, significantly enriched GO annotations are identifiable in each of the regions. Messenger RNAs with an initiation rate below  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e033  (region (i) of Figure 2) contain a highly disproportionate number of regulatory proteins and proteins linked to transcription from Pol II promoters, mainly located in the nucleus, chromosome, membrane or protein complexes. In the range of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e034  from  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e035  to  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e036  (region (ii) of Figure 2) we find other significantly over-represented terms such as cytoplasmic translation, ribosome biogenesis or oxoacid metabolic process, while genes with  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e037  from  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e038  to  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e039  (region (iii) of Figure 2) are primarily constituents of ribosomes. Very large initiation rates (region (iv) of Figure 2) are characteristic of genes associated with the respiratory chain. However, most of genes falling in this region are not annotated (a complete list of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e040  can be found in the Supplementary Table S1 and the details of the GO analysis, with the annotations found in each region and their enrichments, can be found in the Supplementary Table S2).


Figure 2. Distribution of the estimated initiation rates in S. cerevisiae.

The mean initiation rate is  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e041  and the median is  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e042 . Most of the mRNAs have an estimated initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e043  and therefore we show only this range.

The assignment of initiation rates correlates with protein abundances typical of given GO categories: regulatory proteins are usually present at low levels. In contrast, proteins involved in translation, ribosome biogenesis and metabolic processes are abundant. This result is a signature of the divergent translational control that distinct genes exhibit at the level of initiation, suggesting that factors influencing  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e044 , such as secondary structure in the 5′ leader region, have been shaped by evolution to contribute to the delicate balance of cellular protein composition.

To show that the procedure introduced above can be applied under different conditions, we carry out a similar analysis under pheromone treatment by using the corresponding measurements of ribosome densities from [23], and estimate the initiation rates  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e045  under these conditions. The initiation rates do not substantially change, consistent with the finding by Mackay et al. [23] that only a small number of mRNAs exhibit altered densities after pheromone treatment. However, with our analysis we identify two mRNAs, SAG1 and HO, which exhibit a radical change in their initiation rate value under pheromone treatment. Importantly, these two mRNAs have been shown to present altered 5′UTR sequences that explain their significant ribosome density change [24].

Translation initiation rates correlate with the lengths of the transcripts

Now we analyse the influence of the physical properties of the mRNA on the translation initiation rate by analysing the correlations of the identified  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e046  rates with the presence of secondary structures in the 5′UTR and the length of the transcript. The physiological estimates of the initiation rate show a small but significant correlation with the free energy of the secondary structures in the 5′UTR (Spearman’s rank http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e047 , p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e048 ) confirming that secondary structures may have an important regulatory role, as already suggested [9], [25], [26], see Supplementary Information Text S1.

Remarkably, we find a strong negative correlation between the initiation rate and the length of the ORF (Spearman’s rank http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e049 , p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e050 ), see Figure 3. Of relevance to this observation, Arava and coworkers [7] found that ribosome density counter-intuitively and systematically decreases with increasing the ORF length. In a subsequent work [27], they reported that the explanation most consistent with their experimental investigation was that lower initiation rates predominate on longer mRNAs, exactly as we estimate in this work.


Figure 3. Scatter plot of the ORF lengths http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e051 against the estimated http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e052‘s.

The log-log scatter plot shows possible signatures of a power-law dependence.

mRNA-specific responses of ribosome traffic correlate with gene ontology

Smooth and abrupt sequences.

Our genome-wide simulation generates, for each transcript, curves describing how the ribosome density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e053  (polysome size) and the ribosomal current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e054  depend on the initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e055 . According to these characteristic curves, mRNA sequences fall predominantly into either of two categories (Figure 4): the ribosome density of some mRNAs presents a steep increment with increasing  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e056  (we refer to these sequences as abrupt), while others present a gradual increase of the density with the initiation rate (smooth sequences). This classification coincides with the characteristic curve obtained for the ribosomal current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e057  against  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e058 . In abrupt sequences the current presents a kink before reaching its saturation value, whereas in smooth sequences, the saturation value of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e059  is reached gradually [28]. Therefore, mRNA sequences can be classified into two different types depending on how their overall translation rate and polysome size vary upon changes in the initiation rate.


Figure 4. Outcomes for some mRNAs obtained by stochastic simulations of the model.

Panels (A) and (C) show a sketch of the two different behaviours one can obtain for the density of ribosomes  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e060  and the current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e061 , respectively. The genes are divided in two categories, according to the shape of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e062 , as shown in (A): abrupt mRNAs (red, colour online) present a steep increase of the polysome size with increasing the initiation rate. On the other and, smooth sequences (blue, colour online) do not show this feature. The current (C) is also affected, with abrupt genes exhibiting a sudden change, or ‘kink’ in the current, while the current of smooth mRNAs does not suddenly saturate. Panels (B) and (D) show the outcome of numerical simulations of real sequences from S. cerevisiae. Genes YGL103W and YBL027W are ribosomal proteins while YHR030C and YBL105C are kinase regulatory proteins.  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e063  indicates the saturation value of the current (see text).

The origin of these two types of ribosome traffic lies in the codon arrangement: smoothbehaviour mRNAs contain either rare codons at the 5′ end, or no rare codons at all, whereas abrupt behaviour mRNAs contain rare codons or clusters of rare codons within the main body of the mRNA. These rare codons act as bottlenecks, causing queues to build up and consequently  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e064 -dependent abrupt phase transitions to occur. In contrast, if the bottleneck is right at the beginning of the mRNA, no queue can be formed and therefore the polysome size increases smoothly with the initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e065  [28].

The curves for the ribosome density versus  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e066  for the more than 6,000 S. cerevisiae mRNAs were analysed with an automated clustering algorithm [29] to classify them into abrupt or smooth sequences. The algorithm clearly classified 35% sequences to belong to the abruptcategory, and 38% to the smooth category. The remaining 27% sequences were marked as hybrid since they did not show pronounced features to justify a discrimination between the two categories (Supplementary Information Text S1).

Strikingly, these two categories, each with distinct initiation rate response criteria (smooth or abrupt) correlate with the biological function of the encoded proteins: GO annotations (process) related to translation are significantly over-represented in smooth sequences (cytoplasmic translation, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e067translation, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e068 ). Conversely, abruptsequences are connected to several processes, mainly involving regulation, e.g. biological regulation, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e069metabolic process, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e070  or cellular response to stimulus, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e071 . More details about the enrichment in each category can be found in the Supplementary Table S3.

If one considers the abundance of the transcripts in the cell (data from [30]), then 68% of the total mRNA population belongs to the smooth type. This indicates that highly transcribed genes have preferentially slow codons at the 5′ end rather than in the main body of the mRNA. In this way, highly transcribed genes avoid having queues of ribosomes which might deplete a large amount of essential cell resources. Abrupt sequences, on the other hand, constitute only 14% of the transcribed mRNAs; this is consistent with the fact that abrupt sequences typically encode regulatory proteins, which are in general of low abundance.

Responsiveness of translation rate to changes in initiation rate.

A further characteristic we can conjecture from our genome-wide simulations, is how the translation rate changes upon variations in the initiation rate around the physiological value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e072 quantified above. As previously mentioned, the value of the translation initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e073  depends on several factors, such as the amount of free ribosomes, initiation factors, and folding features of the 5′ UTRs, all of which are strongly influenced by stress and nutrient conditions [31]. The knowledge of the responsiveness of translation rate to variations of the initiation rate, therefore, theoretically provides key insight into the mechanisms of translational regulation of gene expression.

In order to study translation rate change responsiveness, or ‘gearing’, we study the combined role of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e074  and the presumed gradient  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e075  of the translation rate, which quantifies the responsiveness of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e076  around the physiological value of the initiation rate (see ‘Materials and Methods’). We find that these two quantities are highly correlated (Spearman’s rank http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e077 , p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e078 ), Figure 5. This indicates that, according to our model, genes characterised by a small initiation rate, such as regulatory genes, have in general a high translation rate gradient, suggesting that the corresponding proteins are produced at low levels under normal conditions but their synthesis can be rapidly increased upon changes in the initiation rate. Conversely, genes characterised by a high initiation rate, such as genes encoding proteins involved in translation and ribosome biogenesis, exhibit in general a medium-to-low value of the translation rate responsiveness, implying that their synthesis is tuned to be efficient, but stable against variations in initiating ribosomal subunit availability. Moreover, by dividing the distributions into quartiles, we identify sixteen different regions; a number of them exhibit significant enrichment in specific GO annotations determined principally on the basis of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e079  (Figure 5). However, by constraining the genes analysed to those with an  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e080  value lying within a specific range, the specific contribution of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e081 ) could be identified. This revealed that there is a further, separable enrichment of GO categories on the basis of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e082 . This in turn indicates that the gearing function, or responsiveness to ribosome availability, is also coupled to gene function. Thus regions 4, 7, 8, 12 and 16 from Figure 5 (regions are numbered starting from the top left one and proceeding left to right) show a significant enrichment in specific GO annotations with a P-value smaller than 0.01 (see Supplementary Table S4). The genes exhibiting a significant enrichment in region 16 were un-annotated. The results in Figure 5 (the estimated  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e083  plotted against the  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e084  values) are annotated with GO category enrichments influenced by the combination of the physiological initiation rate and the gearing factor.


Figure 5. Scatter plot of the estimated initiation rates http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e085 versus the slope of the protein production rate http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e086 evaluated at http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e087.

Both distributions of initiation rates and slopes have been subdivided in quartiles (dashed lines), defining 16 regions. Boxed annotations indicate those GO categories that are overrepresented in each quartile sector (P; GO process, F; function, C; component) with the P-value indicated as a power 10 exponent (E). The enrichments in each region are indicated in the square brackets as xx/yy, where ‘xx’ is the number of genes with that specific annotation and ‘yy’ the total number of genes in the region.

We would like to emphasise that, unlike all other results shown in this work, the gearing factor  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e088 , i.e. the responsiveness capacity of the mRNAs, remains a speculative and theoretical outcome of the model. Since genome-wide experimental setups changing the initiation rates of single transcripts and observing the variation of translation remain are nowadays a challenge, its biological relevance remains to be proven.

Maximal translation rate.

We also perform a genome-wide analysis of the maximal translation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e089  that a sequence can achieve, when an increase in the initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e090  does not yield any further change in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e091 (see ‘Materials and Methods’). We extract  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e092  for each mRNA sequence from our genome-wide simulations and analyse the mRNAs with largest and smallest  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e093  (first quartiles). We find that sequences with the largest maximal production rate are mRNAs involved in cytoplasmic translation (P-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e094 ), such as ribosomal and translational proteins. Conversely, proteins with regulatory functions (such as nucleic acid binding transcription factor activity, P-value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e095 ) are encoded by sequences with the smallest  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e096 . Supplementary Table S5 summarises this GO analysis.

In summary, our genome-wide analysis shows that the type of ribosome traffic on mRNA is significantly correlated with the biological function of the encoded protein: essential proteins that need to be constitutively produced, such as ribosomal proteins and proteins involved in translation, typically exhibit a smooth increase of polysome size upon increments of the initiation rate, a large physiological initiation rate and a high maximal overall translation rate. In contrast, proteins such as the ones involved in responses to stimuli typically exhibit an abrupt increase of the polysome size with the initiation rate, present a small physiological initiation rate and a low maximal overall translation efficiency. In the next section we discuss the fundamental role of codon arrangement in determining the translation efficiency.

Codon arrangement versus codon usage

In order to show that the genome-wide correlation between translational efficiency and biological function obtained above is not only the consequence of codon usage but is strongly influenced by the order in which codons are used in the mRNA, we simulate the translation of a library of randomised ORFs such that both amino acid sequence and codon composition remain identical. That means, two ORFs belonging to the library have exactly the same codon usage but the arrangement of these codons is different. Here we show that, even though all these randomised ORFs have exactly the same codon usage indices such as the CAI, codon adaptation index [32], and tAI, tRNA adaptation index [33], their predicted protein production rate can be very different. Figure 6 shows how the different values of predicted protein production rate (ribosomal current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e097  for a fixed initiation rate) are distributed for 2,000 synonymous randomised codon sequences of a typical Saccharomyces cerevisiae gene (YPL106C).


Figure 6. Normalised histogram of the simulated protein production rates of the YPL106C randomised ensemble.

We constructed the randomised ensemble by shuffling the YPL106C codon choice at each sequence position, generating 2,000 different variants, each time keeping the amino acid sequence and overall codon composition constant. For example, for the chosen gene, CAI = 0.521. The value of the chosen initiation is  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e098 .

Figure 6 clearly shows that the relative positioning of codons has a crucial effect on the translation efficiency, suggesting that very different cellular production rates can be achieved through evolution of the codon arrangement. For instance, in the case shown in Figure 6 there is an increase of about  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e099  from the lowest to the highest value of the translation rate. The variation of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e100  for different codon arrangements is a general result and does not depend on the gene or the chosen initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e101  (for more information see Supplementary Information, Text S1). We thus show that by randomising codon arrangement (i.e. randomly exchanging the position of synonymous codons in a sequence), different protein production rates are obtained, even though codon usage remains fixed. This indicates that the codon arrangement has a highly significant role in determining the efficiency of translation.

Validation of genome-wide translation rate prediction by experimental data

While several models of protein synthesis have been developed over the last decades [34], the role of codon sequence and stochastic ribosomal movement has been investigated only recently. But even recent models typically treat the initiation rate as a fixed parameter, identical for all mRNAs, despite its key role in determining translational efficiency. In contrast, our model predicts the protein production rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e102  as a function of the initiation rate. By then integrating genome-wide simulations with datasets of polysome sizes, we have identified the physiological value of the initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e103  for each mRNA. This set of values  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e104  then leads to the prediction of the protein production rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e105  for each transcript. This allows us to validate our model predictions with experimental data.

Figure 7A is a scatter plot of the genome-wide simulations versus measured protein abundance from [30]. The model predictions for  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e106 , where  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e107  denotes mRNA abundance, correlate very well with the experimental protein abundances (Spearman’s rank = 0.64, p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e108 ), compared to other attempts such as the tAIc (Spearman’s rank = 0.38, p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e109 ). Our outcome is further improved when considering just transcripts loaded onto polysomes (Spearman’s rank = 0.66, p-value http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e110 ), see ‘Materials and Methods’ and Supplementary Information, Table S1. Moreover, as it can be appreciated from Figure 7A, the predictions from our model correlate very well with measured protein abundance for all ranges of gene expression, in contrast to other translation efficiency indices (panels B,C,D), which exhibit a poor correlation for lowly expressed genes.


Figure 7. Scatter plots of different estimators of protein production rates.

(Ahttp://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e111  versus abundance of proteins. The mRNA abundances are from [30] and the experimentally measured protein levels from [51]. The plot shows a clear correlation between the model prediction of the amount of proteins in the cell and the experimental values. (B) CAI from [30] versus protein abundance. (C) and (D) show different variants of the tRNA adaptation index, tAIc and tAIp from [30], vs protein abundance. Our approach yields a better correlation between the predicted and measured protein abundance.


The phenotype exhibited by any cell is dictated by its proteomic composition. How much of each type of protein is expressed is governed by a range of factors, including the level of transcription and stability of the encoding mRNA, the half-life of the protein, and how efficiently its mRNA is translated. A number of strategies have been employed to predict translational efficiency, many of which utilise the observation that not all codons are used with equal frequency, and that codon usage frequency is proportional, at some level, to the abundance of the corresponding decoding tRNA species [35], [36]. Initially, measures such as the codon adaptation (CAI) index were developed [32], which correlate high protein abundance with over-use of the sub-set of codons found in a group of very-highly expressed genes, normally those encoding the ribosomal proteins. However, such approaches frequently struggle to predict the expression level of less abundant proteins. More recently, dynamic TASEP (Totally Asymmetric Simple Exclusion Process) models have been employed to simulate the flow of ribosomal traffic, including queuing interactions between adjacent ribosomes on the polysome [20], [37]–[41]. Even though these models represent a big step towards a more complete description of the translation process, most of them miss one essential component, namely the mechano-chemical ribosome cycle. By including this mechanism into an exclusion process we showed that the mathematical description of translation becomes much more accurate [16]. Here we applied this model to simulate the translation of every mRNA in the transcriptome of S. cerevisiae leading to the estimates of the individual translation initiation rates unique to each of the 6,000 genes in yeast. We furthermore showed that mRNA sequences can be classified according to their ribosome traffic characteristics, and crucially, this classification maps to gene ontology assignments.

Even though the role of the translation initiation rate has been shown to play a central role in translational control of gene expression [9], to our knowledge no genome-wide estimations of these rates have been reported, considering ribosome traffic effects. The translation initiation rate, i.e. the rate at which ribosomes start translating the ORF, condenses many factors, such as cytoplasmic ribosome availability, initiation factors and secondary structures on the 5′UTRs, all of them strongly dependent on nutrients and stress conditions. Some approaches consider the translation initiation rate to be fixed for every transcript, thereby neglecting the key factors that make the initiation rate unique to each transcript. In contrast, by considering traffic dynamics, we determined the first genome-wide estimate of initiation rates  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e112  for each and every mRNA (Figure 2) by integrating our stochastic model of ribosome traffic with data of ribosome densities across all mRNAs [23]. Our analysis showed a wide range of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e113  values under these non-stress conditions. Importantly, the  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e114  values are strongly correlated with gene function, explaining for example why translation of ribosomal protein mRNAs, which typically have a very high  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e115  value, is very efficient. These values of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e116  are expected to be influenced by the degree of secondary structure of the 5′ leader sequence, and indeed we did find a significant correlation with the free energies of the secondary structures. The strongest connection involving  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e117  was however a negative correlation with mRNA length, mirroring the findings from experimental research that described lower ribosome densities on longer mRNAs [7], [15]. In contrast to the explanation that the effect could be caused by bottlenecks of slow codons [15] (see Supplementary Information Text S1) this negative correlation supports the idea that due to the circular structure of mRNAs, the ends of shorter mRNAs can interact more easily than longer mRNAs, thereby promoting ribosome recycling [7], [42], [43]. Indeed following detailed experimental analysis using ribosome density mapping, Arava and colleagues concluded that lower densities on longer mRNAs are best explained by lower rates of translation initiation [27], mirroring our findings in this work. To summarise, we interpret the correlation between the estimated initiation rates and the ORF lengths as a possible indication of a regulatory mechanism that allows circularised mRNAs to load ribosomes more efficiently onto their transcripts, leading to the observed ribosome-ORF length relationship.

Our analysis furthermore identified two main distinct classes of mRNAs regarding their responsiveness to changes in the initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e118 : some sequences exhibited an abrupt change in the polysome size upon a change in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e119 , whereas smooth sequences showed a gradual increase. Calculations with artificial sequences revealed that sequences with rare codons in the main body of the ORF belong to the abrupt class, whereas sequences with either no rare codons or rare codons at the 5′ end belong to the smooth class [28]. Crucially, we note that the classification of mRNAs into smooth and abrupt responders maps onto particular gene ontological classifications. Smooth responder mRNAs as a class are highly over-populated with ribosomal protein mRNAs and translation factors. Conversely, the abrupt class contains disproportionate numbers of regulatory proteins, including nucleic acid-binding transcription factors, and cell cycle proteins. One reason why ribosomal protein mRNAs are predominantly of the smooth response type might relate to the massive manufacturing scale of ribosome biosynthesis; in yeast, ribosomal protein mRNAs account for nearly 30 percent of all mRNAs [44], [45]. Smooth-type responses to  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e120  must be of selective advantage for a cell, since if ribosome queues were established on such a large proportion of the cell mRNA population they would sequester a large numbers of ribosomes, with deleterious consequences for cell fitness. On the other hand, it has been recently found that cell-cycle regulated genes predominantly adopt non-optimal codon usage (with no ramp of slow codons at the beginning, and therefore of the abrupt class) to achieve elongation-limited mRNA translation; this can generate cell cycle-dependent oscillations in protein abundance induced by changes in the tRNA pool [46]. Therefore, it is apparent that the cell coordinates codon usage and codon arrangement to achieve translational gene expression control.

Our results also showed important differences in the computationally deduced slope of the  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e121 production rate curve in response to increasing  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e122 . Some mRNAs are what we term highly geared, that is, small increases in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e123  produce relatively large increases in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e124 . This type of super-responsive mRNA was significantly enriched in regulatory proteins, which also have a relatively small initiation rate. We speculate that this might be a mechanism to facilitate rapid responses to changed environmental conditions, allowing, for example, rapid synthesis of transcriptional repressors that in physiological conditions are severely limited by the initiation (low  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e125 ). Conversely, low geared mRNAs, where increases in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e126  produce proportionately lower responses in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e127 , were enriched in ribosomal proteins. Since ribosomal proteins are used to manufacture ribosomes, lower gearing of the  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e128  responsiveness to  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e129  may help prevent undesirable positive feedback effects. We furthermore classified mRNA sequences according to the maximal translation rate that they can achieve, i.e. their saturation value, and our analysis revealed that abrupt sequences have predominantly a small  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e130 , whereas smooth sequences are characterised by a large  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e131 . This correlates with the levels of the corresponding proteins: regulatory proteins are typically present in low abundance, whereas ribosomal proteins are highly abundant. Moreover, this might prevent possibly deleterious consequences of over-producing regulatory proteins, including cell cycle factors, during occasional bursts of ribosomal availability that would lead to a very large increase in the value of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e132 . In S. cerevisiae for example, this occurs upon sudden glucose depletion: translation initiation is rapidly inhibited [47] but some mRNAs (including those involved in carbohydrate metabolism) continue to be translated [48], thereby being exposed to a spike in ribosome availability. Similar complex translational re-programming, coincident with a partial cell-wide shut down of translation initiation, occurs in response to oxidative stress [49]. Hence, by having a high responsiveness to  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e133  and a low  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e134abrupt sequences can have a very rapid gene expression upregulation, on one hand, but a controlled maximum translation rate, on the other hand. Figure 8 summarises our findings on the initiation rates and the consequences of different (mRNA-specific) dependencies of the protein production rate on the initiation step.


Figure 8. Initiation rate: summary of the findings.

(A) For a given ‘physiological’ number of ribosomes  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e135  we found mRNA-specific initiation rates, distributed over a broad range of values (Figure 2). Different regions of the distributions can be mapped to certain GO annotations. For example, mRNAs with small physiological initiation rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e136  are regulatory proteins while genes involved in translation have a larger initiation rate. (B) Changes in initiation (induced, for instance, by variations in the ribosomal pool, e.g. available ribosomes increase to a value of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e137 ) are estimated by our modelling and theoretically perceived by the transcript in different ways, according to their current-initiation relationship  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e138 . In particular, some mRNAs have a large gearing factor  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e139 , such as regulatory proteins, while other messengers, such as translation associated ones, are less sensitive to changes of the initiation rate. (C) For very large initiation rates the protein production rates reach a maximal elongation-limited value, i.e. only depending on the sequence of codons. We discover that translation associated genes have a larger maximal production rate when compared to other mRNAs, such as regulatory proteins, whose production might need to be capped. (D) In general we find two main groups of sequences classified according to their current-initiation relationship  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e140 . Abrupt sequences, usually regulatory proteins, present an abrupt ‘kink’ in  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e141 , meaning that the protein production rate can quickly saturate above specific values (sequence-dependent) of the initiation rate. Genes involved in translation like ribosomal proteins are instead classified as smooth sequences, since their sequences are such that this abrupt crossover does not exist.

Incidentally, this classification of proteins according to their translation dynamics, coincides with the classification according to protein stability. In [50], the S. cerevisiae proteome was analysed using a clustering approach to classify proteins according to half-life, and the stable protein cluster was enriched with proteins involved in protein production, including ribosomal proteins and enzymes involved in amino acid metabolism. Moreover, the unstable protein cluster was enriched with cell cycle proteins and proteins involved in transcriptional regulation. Therefore, our analysis indicates that stable proteins tend to have a low responsiveness in their production rate to external changes which change the initiation rate, whereas unstable proteins production responses very effectively to external changes. Hence, our analysis strongly suggests that the cell coordinates dynamics of protein degradation with the dynamics of protein production.

In summary, we have shown how our stochastic model representing the ribosome traffic flow on mRNAs is able to discern and describe the biological interplay between translation initiation and elongation, at a single-codon level. We have illustrated how the application of this model across the entire genome can be used to infer mRNA-specific translation initiation rates in vivo, and that selection of codon arrangement is likely to be an important mechanism to tune the translation system to meet the competing demands of ribosome biosynthesis and translation of all other mRNAs in the cell. With our approach, mRNA sequences can be classified according to their translation dynamics, mapping to key gene ontological classifications; codon arrangement plays a fundamental role in this classification, indicating that it is optimized through evolution to match the corresponding gene function. Moreover, gene-specific physiological values of initiation rate can be used to determine the translational efficiency for each mRNA; this allows the prediction of genome-wide protein abundances with a significant increase in correlation when compared with previous approaches (Figure 7). We foresee this type of analysis will be of great value to understand how the economics of translation are regulated on a cell-wide basis, and how codon arrangement is optimised to control gene expression in response to the translational remodelling that occurs in response to many environmental stresses.

Materials and Methods

Stochastic simulations

For each mRNA sequence of S. cerevisiae we performed a stochastic simulation of translation, one mRNA at a time, following the rules explained above and summarised in Figure 1. Our algorithm is a continuous time Monte-Carlo based on the Gillespie algorithm, and therefore it gives the real-time dynamics of the system.

In each simulation of individual mRNAs we let the system reach the steady-state. Then we measured, at constant interval times, two quantities: the current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e142  of ribosomes along the mRNA, i.e., how many ribosomes per unit time finish translation, and the density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e143  of ribosomes on the mRNA, i.e., the total number of ribosomes  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e144  divided by the length  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e145  (in codons) of the mRNA. Therefore, the current  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e146  gives the translation rate, and the density  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e147 determines the polysome size. We then averaged these quantities over the entire time interval of the simulation. We ran the simulations for a broad range of initiation rates  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e148  between 0 and 5  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e149 , making sure that the plausible physiological regime for  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e150  variation was covered, and we fixed the other parameters as explained in the previous sections. The obtained curves  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e151  and  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e152  were then smoothed with a ten-points running average.

The gradient  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e153  of the translation rate at the physiological initiation rate is defined, for each mRNA, as the numerical derivative of the relation  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e154 , computed at  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e155 . It geometrically represents the slope of the curve  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e156  at the physiological value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e157 . Since both the distributed physiological values of the initiation rates and different codon sequences cause a different dependence of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e158  on  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e159 , the derivative  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e160  differs from mRNA to mRNA.

The maximal values  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e161  and  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e162  were defined as the mean of the last five simulation points of the current and the density, respectively, corresponding to the five largest values of  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e163 considered.

Translation rate prediction

The translation efficiency  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e164  of a transcript is defined as the protein production rate  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e165  computed at the physiological value  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e166http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e167 . Denoting by  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e168  the amount of a specific mRNA in the cell (data from [30]), for any protein the quantity  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e169  is an estimate of the protein abundance, see [20]. We also considered the effective amount of transcript involved in polysomes,  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e170 , where  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e171  can be found in [23]. The prediction  http://journals.plos.org/ploscompbiol/article/file?type=thumbnail&id=info:doi/10.1371/journal.pcbi.1002866.e172  slightly improves the correlation with measured protein abundance, as discussed in the ‘Results’ section.


  1. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, et al. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9: 3273–3297.
  2. Shalon D, Smith SJ, Brown PO (1996) A DNA microarray system for analyzing complex DNA samples using two-color uorescent probe hybridization. Genome Research 6: 639–645.
  3. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, et al. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292: 929–934.
  4. Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, et al. (2011) Global quantification of mammalian gene expression control. Nature 473: 337–342.
  5. Ghazalpour A, Bennett B, Petyuk VA, Orozco L, Hagopian R, et al. (2011) Comparative analysis of proteome and transcriptome variation in mouse. PLoS Genet 7: e1001393.
  6. Ghaemmaghami S, Huh W, Bower K, Howson RW, Belle A, et al. (2003) Global analysis of protein expression in yeast. Nature 425: 737–741.
  7. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, et al. (2003) Genome-wide analysis of mRNA translation profiles in saccharomyces cerevisiae. Proc Natl Acad Sci U S A 100: 3889–3894.
  8. Gingold H, Pilpel Y (2011) Determinants of translation efficiency and accuracy. Mol Syst Biol 7: 481.
  9. Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in escherichia coli. Science 324: 255–258.
  10. Livingstone M, Atas E, Meller A, Sonenberg N (2010) Mechanisms governing the control of mRNA translation. Physical Biology 7: 021001.
  11. Kozak M (1989) Circumstances and mechanisms of inhibition of translation by secondary structure in eucaryotic mRNAs. Mol Cell Biol 9: 5134–5142.
  12. Sagliocco FA, Vega Laso MR, Zhu D, Tuite MF, McCarthy JE, et al. (1993) The inuence of 5′-secondary structures upon ribosome binding to mRNA during translation in yeast. J Biol Chem 268: 26522–26530.
  13. Buchan JR, Stansfield I (2007) Halting a cellular production line: responses to ribosomal pausing during translation. Biology of the Cell 99: 475–487.
  14. Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, et al. (2010) An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141: 344–354.
  15. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324: 218–23.
  16. Ciandrini L, Stansfield I, Romano MC (2010) Role of the particle’s stepping cycle in an asymmetric exclusion process: A model of mRNA translation. Physical Review E 81: 051904.
  17. Percudani R, Pavesi A, Ottonello S (1997) Transfer RNA gene redundancy and translational selection in saccharomyces cerevisiae. Journal of Molecular Biology 268: 322–330.
  18. Savelsbergh A, Katunin VI, Mohr D, Peske F, Rodnina MV, et al. (2003) An elongation factor g-induced ribosome rearrangement precedes tRNA-mRNA translocation. Mol Cell 11: 1517–23.
  19. Chou T, Mallick K, Zia RKP (2011) Non-equilibrium statistical mechanics: from a paradigmatic model to biological transport. Reports on Progress in Physics 74: 116601.
  20. Reuveni S, Meilijson I, Kupiec M, Ruppin E, Tuller T (2011) Genome-Scale analysis of translation elongation with a ribosome ow model. PLoS Comput Biol 7: e1002127.
  21. Wen J, Lancaster L, Hodges C, Zeri A, Yoshimura SH, et al. (2008) Following translation by single ribosomes one codon at a time. Nature 452: 598–603.
  22. Siwiak M, Zielenkiewicz P (2010) A Comprehensive, Quantitative, and Genome-Wide Model of Translation. PLoS Comput Biol 6(7): e1000865.
  23. MacKay VL, Li X, Flory MR, Turcott E, Law GL, et al. (2004) Gene expression analyzed by high-resolution state array analysis and quantitative proteomics. Molecular & Cellular Proteomics 3: 478–489.
  24. Law GL, Bickel KS, MacKay VL, Morris DR (2005) The undertranslated transcriptome reveals widespread translational silencing by alternative 5′ transcript leaders. Genome Biology 6: R111.
  25. Ringnér M, Krogh M (2005) Folding free energies of 5′-UTRs impact Post-Transcriptional regulation on a genomic scale in yeast. PLoS Computational Biology 1: e72.
  26. Tuller T, Waldman YY, Kupiec M, Ruppin E (2010) Translation efficiency is determined by both codon bias and folding energy. Proceedings of the National Academy of Sciences 107: 3645–3650.
  27. Arava Y, Boas FE, Brown PO, Herschlag D (2005) Dissecting eukaryotic translation and its control by ribosome density mapping. Nucleic Acids Research 33: 2421–2432.
  28. Romano MC, Thiel M, Stansfield I, Grebogi C (2009) Queueing phase transition: Theory of translation. Physical Review Letters 102: 198104.
  29. Press W, Teukolsky S, Vetterling W, Flannery B (2007) Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge Univ Press.
  30. Beyer A, Hollunder J, Nasheuer H, Wilhelm T (2004) Post-transcriptional expression regulation in the yeast saccharomyces cerevisiae on a genomic scale. Molecular & Cellular Proteomics 3: 1083–1092.
  31. Mayer C, Grummt I (2006) Ribosome biogenesis and cell growth: mTOR coordinates transcription by all three classes of nuclear RNA polymerases. Oncogene 25: 6384–6391.
  32. Sharp PM, Li WH (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15: 1281–1295.
  33. Reis Md, Savva R, Wernisch L (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Research 32: 5036–5044.
  34. von der Haar T (2012) Mathematical and computational modelling of ribosomal movement and protein synthesis: an overview. Computational and Structural Biotechnology Journal 1.
  35. Ikemura T (1982) Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes: Differences in synonymous codon choice patterns of yeast and escherichia coli with reference to the abundance of isoaccepting transfer RNAs. Journal of Molecular Biology 158: 573–597.
  36. Dong H, Nilsson L, Kurland CG (1996) Co-variation of tRNA abundance and codon usage in escherichia coli at different growth rates. Journal of Molecular Biology 260: 649–663.
  37. Sharma AK, Chowdhury D (2011) Stochastic theory of protein synthesis and polysome: Ribosome profile on a single mRNA transcript. Journal of Theoretical Biology 289: 36–46.
  38. Mitarai N, Sneppen K, Pedersen S (2008) Ribosome collisions and translation efficiency: optimization by codon usage and mRNA destabilization. J Mol Biol 382: 236–245.
  39. Garai A, Chowdhury D, Chowdhury D, Ramakrishnan TV (2009) Stochastic kinetics of ribosomes: single motor properties and collective behavior. Phys Rev E 80: 011908.
  40. Dong JJ, Schmittmann B, Zia RKP (2007) Towards a model for protein production rates. Journal of Statistical Physics 128: 21–34.
  41. Brackley CA, Romano MC, Thiel M (2011) The dynamics of supply and demand in mRNA translation. PLoS Comput Biol 7: e1002203.
  42. Chou T (2003) Ribosome recycling, diffusion, and mRNA loop formation in translational regulation. Biophys J 85: 755–773.
  43. McCarthy JEG (1998) Posttranscriptional control of gene expression in yeast. Microbiology and Molecular Biology Reviews 62: 1492–1553.
  44. Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, et al. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95: 717–728.
  45. Warner JR (1999) The economics of ribosome biosynthesis in yeast. Trends in Biochemical Sciences 24: 437–440.
  46. Frenkel-Morgenstern M, Danon T, Christian T, Igarashi T, Cohen L, et al. (2012) Genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels. Molecular Systems Biology 8: 572.
  47. Ashe MP, De Long SK, Sachs AB (2000) Glucose depletion rapidly inhibits translation initiation in yeast. Molecular Biology of the Cell 11: 833–848.
  48. Castelli LM, Lui J, Campbell SG, Rowe W, Zeef LAH, et al. (2011) Glucose depletion inhibits translation initiation via eIF4A loss and subsequent 48S preinitiation complex accumulation, while the pentose phosphate pathway is coordinately up-regulated. Molecular Biology of the Cell 22: 3379–3393.
  49. Shenton D, Smirnova JB, Selley JN, Carroll K, Hubbard SJ, et al. (2006) Global translational responses to oxidative stress impact upon multiple levels of protein synthesis. J Biol Chem 281: 29011–29021.
  50. Belle A, Tanay A, Bitincka L, Shamir R, O’Shea EK (2006) Quantification of protein half-lives in the budding yeast proteome. Proceedings of the National Academy of Sciences 103: 13004–13009.
  51. Brockmann R, Beyer A, Heinisch JJ, Wilhelm T (2007) Posttranscriptional expression regulation: what determines translation rates? PLoS Computational Biology 3: e57.

Leave a Reply

Ribosome Traffic on mRNAs Maps to Gene Ontology: Genome-wide Quantification of Translation Initiation Rates and Polysome Size Regulation

The expression of genes can be considered as a two-stage process, beginning with transcription and the production of an mRNA, followed by translation of that mRNA into protein by the cell’s ribosome population. Gene expression must be tightly regulated to control protein composition, enabling the cell to rapidly respond to a wide range of environmental conditions. For this reason, cells exert fine control over gene expression, both at the transcriptional, and post-transcriptional level.

Scroll to top