How works ORFribo?#

ORFribo is designed to analyse, based on ribosome-profiling data, the translation activity of all the ORFs of a genome (coding and noncoding) that are annotated in a gff file. We strongly recommend using ORFtrack to generate the gff file since ORFtrack and ORFribo were jointly developed so that the output of ORFtrack fits the prerequisites of ORFribo. However, the user might provide its own gff. In this case, each ORF annotated in the input gff file must be associated with the name of its corresponding ORF category that can be different from those defined by ORFtrack (e.g. intergenic, ALT-ORF, foldable, xyz...). The ORF category must be indicated in the 3rd column of the input gff file and using the argument --intergenic-features so that ORFribo knows which ORF category is to be analyzed (more details in the dedicated section - an example of the gff output generated by ORFtrack can be found in the ORFmine/examples/database directory and may be used as guide to prepare your own gff).

The three main steps of ORFribo are:

Step 1: Determination of the P-site offset: For each read size or kmer, ORFribo tries to find the distance in nucleotides from the 5'-end of the read (begining of the Ribosome Protected Fragment - RPF) to the first base of the ribosome's P-site, thanks to riboWaltz [1]. The detection of the first base of the ribosome's P-site enables the identification of the translated codon and, thus, of the translated frame. This is necessary to get the quality controls about phasing in coding regions.

Step 2: Detection of kmers of good quality and filtering of low quality ones: Read phasing is a very important step to make sure we are able to detect the frame that is translated among the three frames of the RNA. In coding regions, the frame that is expected to be translated (i.e. the coding one) is already known and will be used to estimate the quality of the experiment and more precisely, to identify read sizes (i.e. kmers) of good quality. To do so, ORFribo will calculate for all kmers mapping on CDS coordinates, the fraction of kmers that indeed map on the coding frame of the CDSs (i.e. named P0 or FO frame). ORFribo will then retain for the analysis (Step 3), only kmers whose P-site position as predicted by riboWaltz indicates, for more than 70% of the corresponding kmers, codons that belong to the P0/F0 coding frame of the CDSs. 70% is the default value but it can be modified by the user through the config.yaml file.

Step 3: Analysis: alignment on all ORFs of interest: Read kmers which have passed the phasing filter on CDSs are aligned on the set of ORFs indicated by the user (e.g. intergenic ORFs, alternative ORFs...). The ORF categories to be analyzed by ORFribo must correspond to those identified by ORFtrack and provided in the 3rd column of its gff output (see an example examples/database/mapping_orf_Scer.gff) (see here for more details on the ORF categories and annotation process). If another gff file is provided, please have a look on the format of the ORFtrack gff output. The P-site offset detected for each kmer in the previous step allows to determine the proportion of reads in each phase of every tested ORF, thereby estimating its fractions of in-frame reads (F0 or P0), and in its +1 and +2 frames (F1 and F2 respectively). This can help the user identify ORFs with high levels of translation specificity (high fractions of F0 reads) or classify ORFs according to their levels of translation specificity (i.e. different levels of F0 reads).

More details on the complete pipeline of ORFribo can be found in the Figure S1 of Papadopoulos et al [2].

References#

Lauria F, Tebaldi T, Bernabò P, Groen EJN, Gillingwater TH, Viero G. riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data. PLoS Comput Biol. 2018 Aug 13;14(8):e1006169.
Papadopoulos, C., Arbes, H., Chevrollier, N., Blanchet, S., Cornu, D., Roginski, P., Rabier, C., Atia, S., Lespinet, O., Namy, O., Lopes, A. (submitted).