Run ORFribo#

ORFribo is very user-friendly. You can configure ORFribo directly via the command line using the following example:

1. Basic run#

You can configure ORFribo directly via the command line using the following example:

orfribo --fna examples/database/Scer.fna --gff examples/database/Scer.gff --gff-intergenic examples/database/mapping_orf_Scer.gff --fastq examples/fastq --out ORFribo --not-trimmed --singularity 

Make sure to communicate if your data are trimmed or not, if they are trimmed, replace --not-trimmed by trimmed. By Default, ORFribo will consider your data as not-trimmed.


2. ORFribo parameters#

For additional parameters, consult the help menu:

orfribo -h

Mandatory arguments#

--fna Path to the genome/transcriptome fasta file (Required).

(Example: reference_genome_sequences.fa: A FASTA file containing the nucleotide sequence of the complete genome.)

--gff Path to the GFF annotation file (Required).

(Example: reference_genome.gff: A gff file containing genome annotation)

--gff-intergenic Path to the GFF annotation mapping file (Required).

(Example: mapping_orf_Scer.gff: ORFtrack output GFF file with ORF coordinates for translation activity analysis.)

--fastq Path to the directory containing .fastq.gz files (Required)

--project-name Name for the experiment. you can provide a project name without spaces or special characters (Required).

--trimmed Flag indicating that the sequence adapters are already removed.

--not-trimmed Flag indicating that the sequence adapters are not removed.

--docker Flag used to run computations on a docker container.

--singularity Flag used to run computations on a singularity container

--out Base directory location for orfribo outputs.

Pipeline Option Selection#

  • Are Sequencing Adapters Removed?

--rna-to-exclude Path to a fasta file with nucleotide sequences to exclude (Optionnal).

(Example: NA_sequences_to_remove.fa: File containing sequences to exclude, like rRNAs.)

--aligner Choose your alignement tool : star or hisat2. Default : hisat2

--adapter Adapter sequence (Example: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA for Illumina TruSeqâ„¢ adapters.), if unkown, RiboDoc will try to find it for you, but this can sometimes lead to a wrong adapter sequence

Read Length (Kmers) Configuration#

Define the range of read sizes (kmers) to analyze. By default, ORFribo analyzes reads between 25 and 35 bases and selects high-quality ones (e.g., those mapping to coding regions with a median in-frame value of 70%). You can modify these parameters if needed.

--min-read-length Minimum read length for ribosome profiling. Default : 25

--max-read-length Maximum read length for ribosome profiling. Default : 35

Coding Features for Kmer Detection#

You might also need to specify the names, as indicated in the original gff file (not the one generated by ORFtrack), of the "CDS" feature and "gene" attribute so that ORFribo is able to identify the features/attributes that correspond to the CDSs and genes. That said, we strongly recommend using the "CDS" and "gene" keywords for the CDSs and genes respectively as they are used by ORFtrack in its gff output. Using different keywords for these two features might generate conflicts with the detection of good quality kmers (Step2) and the read mapping step (Step3) which is realized on the ORFtrack output or equivalent. To avoid such conflicts, we recommend modifying in the original gff file the names of the attributes/features used for the genes and CDSs to "gene" and "CDS" respectively.

--gff-feature Feature element to select during counting. Default : CDS

Statistical Thresholds#

  • Thresholds for Selecting Specific Read Lengths:

The selection of specific kmer sizes will depend on the thresholds provided by the user: - If only the mean_threshold is specified, the kmers will be selected based solely on the mean in-frame reads. - If only the median_threshold is specified, the selection will be based solely on the median in-frame reads. - If both thresholds are specified, the kmers will be selected based on both the mean and the median values.

(Default threshold for only median is 70.)

--mean-threshold Minimum mean of in-frame reads. Default : ''

--median-threshold Minimum median of in-frame reads. Default: 70

Intergenic ORF Analysis#

(List of ORF categories for which you want to investigate the translation activity (Step3). The ORF categories listed here must correspond to those provided by ORFtrack in the 3rd column of the output gff file (the ORF categories identified in your input genome can also be found in the summary.log of ORFtrack). Examples of these two files can be found in the ORFmine/examples/database/ directory as maping_orf_Scer.gff or summary.log files. By default, ORFribo will probe the translation activity of all intergenic ORFs which are referred to as "nc_intergenic" (see here for more details on the ORF categories and annotation process). If you want to also probe that of noncoding ORFs lying in the alternative frames of CDSs on the same strand for example, you must add the "nc_ovp_same-CDS" flag separated by a space as follows: "nc_intergenic nc_ovp_same-CDS".)

--intergenic-features List of features in the intergenic GFF. Default : ['nc_intergenic']

Alignement configuration#

--multi_alignement The maximum number of allowed multiple alignments for each read. Default : 10

--introns_length Intron length, applicable only when using STAR as the aligner. Default : 3000. This parameter defines the maximum intron length allowed during the mapping process. If STAR is used as the aligner, this corresponds to the --alignIntronMax option in STAR, which specifies the maximum intron size that STAR will consider during alignment.

Computational Resource Allocation#

--ram Maximum allowed RAM to use (Mb). Defaults: 2000.

--cores Number of provided cores. Defaults to 3.

--threads Maximum number of threads to use. Default : 3

--jobs, -j Use at most N CPU cluster/cloud jobs in parallel. For local execution this is an alias for --cores. (default: 1)

Snakemake Workflow Control#

-P, --preview Only dry-run the workflow (default False)

--dag Generate a DAG image of the worfklow ("dag.svg")

-F, --forceall Force all output files to be re-created (default False)

--debug Allow to debug rules with e.g. PDB. This flag allows to set breakpoints in run blocks.

--dry-run, -D Flag used to show the docker command line. Must be used in conjunction with '--docker' or '--singularity'

--dev Enables development mode. Binds the local package directory to its corresponding path inside the container, allowing for real-time testing and changes without rebuilding the container. Should be used with either '--docker' or '--singularity'.

Example:#

orfribo --fna reference.fa --gff annotation.gff --gff-intergenic intergenic.gff --fastq data/ --project-name project1 --not-trimmed --aligner star --out ORFribo --min-read-length 25 --max-read-length 35 --cores 4 --ram 8000 --singularity