Outputs

Output of ORFribo#

After a successful execution of the pipeline, ORFribo generates multiple output files stored in a user-defined directory or in the default outdir/ folder. The output directory contains three main subdirectories:

.
|-- DATA_PROCESSING
|-- RESULTS
`-- SUPPLEMENTARY_DATA

Overview of Output Directories#

DATA_PROCESSING: Contains step-by-step data generated during the pipeline execution, organized by method and sample.
RESULTS: Includes final analysis files essential for the user’s downstream analyses.
SUPPLEMENTARY_DATA: Contains benchmark files and logs for detailed tracking and debugging.

Main Output Table#

The main output table is located in the path:

RESULTS/Genome/all_samples_Genome.25-35.meanXX_medianYY_reads_concatenated.tab

(where XX and YY stand for the median and mean thresholds).

Description of the Main Table#

This table summarizes the results for each ORF, including the number and fraction of reads in the three frames of the ORF. If ORFribo is executed on multiple datasets, it also generates subdirectories named according to the dataset, each containing a table in the same format as all_samples_Genome.25-35.meanXX_medianYY_reads_concatenated.tab, but specific to the retained kmers of that dataset.

Example:

Genome/Sample1/Genome.25-35.meanXX_medianYY_reads_concatenated.tab

Columns in the Table#

Column	Description
Seq_ID	Identifier of the ORF
Num_reads	Total number of reads with a P-site aligned on the ORF
Num_p0	Number of reads with their P-site in phase 0 (frame 0) of the ORF
Num_p1	Number of reads with their P-site in phase 1 (frame +1) of the ORF
Num_p2	Number of reads with their P-site in phase 2 (frame +2) of the ORF
Perc_p0	Percentage of reads aligned in phase 0 (Num_p0 / total reads)
Perc_p1	Percentage of reads aligned in phase 1
Perc_p2	Percentage of reads aligned in phase 2

Output Directory Details#

RESULTS#

RESULTS contains essential results for the user’s analysis and includes the following subdirectories:

BAM#

Exome: BAM files and their indexes generated from the alignment of ribosome profiling data to the exome sequence, grouped by sample.
Genome: BAM files and their indexes generated from the alignment of ribosome profiling data to the genome sequence, grouped by sample.

These BAM files can be used for visualization in tools like IGV.

RESULTS
|-- BAM
|   |-- Exome
|   |   |-- Sample1
|   |   |   |-- Sample1.bam
|   |   |   `-- Sample1.bam.bai
|   |   `-- Sample2
|   |       |-- Sample2.bam
|   |       `-- Sample2.bam.bai
|   `-- Genome
|       |-- Sample1
|       |   |-- Sample1.bam
|       |   `-- Sample1.bam.bai
|       `-- Sample2
|           |-- Sample2.bam
|           `-- Sample2.bam.bai

Genome#

Contains tables summarizing the results for each ORF, calculated for the retained kmers of each dataset. It contains also the merged data for all the dataset (Description above).

Genome
|-- Sample1
|   `-- Genome.25-35.mean_median70_reads_concatenated.tab
|-- Sample2
|   `-- Genome.25-35.mean_median70_reads_concatenated.tab
`-- all_samples_Genome.25-35.mean_median70_reads_concatenated.tab

Psite#

Contains tables generated by RiboWaltz to determine the optimal P-site offset for each fragment length.

RESULTS
|-- Psite
|   |-- Sample1_psite_table.csv
|   `-- Sample2_psite_table.csv

Selected_Length_Exome#

Contains results for exomes for each kmer length retained for each sample.

RESULTS
|-- Selected_Length_Exome
|   |-- Sample1_28
|   |   |-- Exome_28_periodicity_all.tab
|   |   |-- Exome_28_periodicity_start.tab
|   |   |-- Exome_28_periodicity_stop.tab
|   |   |-- Exome_28_reads.stats
|   |   `-- Exome_28_reads.tab
|   `-- Sample2_28
|       |-- Exome_28_periodicity_all.tab
|       |-- Exome_28_periodicity_start.tab
|       |-- Exome_28_periodicity_stop.tab
|       |-- Exome_28_reads.stats
|       `-- Exome_28_reads.tab

report_analysis.txt#

Contains statistics for each dataset, including:
- Number of initial reads.
- Number of reads after trimming and or filtering unwanted sequence.
- Mapping statistics for both exome and genome.

DATA PROCESSING#

DATA_PROCESSING
|-- Bam2Reads_Exome
|-- Bam2Reads_Genome
|-- Edited_Gff
|-- Exome
|-- Exome_Fastq
|-- Genome_Fastq
|-- Mapping
|-- Quality_control
|-- RiboWaltz
|-- Selected_length
`-- Trimming

Bam2Reads_Exome#

It contains one subfolder per sample itself contain sub-folder per kmer (Sample_i with i standing for the kmer size) that itsefl contains a table exome_i_reads.tab. The information in this table is the same as that contained in the all_samples_genome.25-35.meanXX_medianYY_reads_concatenated.tab table described above but only for the CDSs annotated in the original gff file. Each CDS is associated with the fractions of reads that map on its coding frame (i.e. in-frame reads named also P0 reads) or in its alternative frames. These tables are used for the detection of good quality kmers (Step 2) but can also be used to probe the translation activity of CDSs. In this case, you just need to merge the tables of all retained kmers (the list of the retained kmers can be found in the file DATA_PROCESSING/Selected_length/Sample1/Sample1.txt) into a global table. This final output table will contain for each CDS, the number and fraction of reads mapping in-frame or in the +1 and +2 frames of each CDS.

DATA_PROCESSING/Bam2Reads_Exome/Sample1/
|-- Sample1_25
|   |-- Exome_25_periodicity_all.tab
|   |-- Exome_25_periodicity_start.tab
|   |-- Exome_25_periodicity_stop.tab
|   |-- Exome_25_reads.stats
|   `-- Exome_25_reads.tab
|-- Sample1_26
|   |-- Exome_26_periodicity_all.tab
|   |-- Exome_26_periodicity_start.tab
|   |-- Exome_26_periodicity_stop.tab
|   |-- Exome_26_reads.stats
|   `-- Exome_26_reads.tab
|-- Sample1_27
|   |-- Exome_27_periodicity_all.tab
|   |-- Exome_27_periodicity_start.tab
|   |-- Exome_27_periodicity_stop.tab
|   |-- Exome_27_reads.stats
|   `-- Exome_27_reads.tab
|-- Sample1_28
|   |-- Exome_28_periodicity_all.tab
|   |-- Exome_28_periodicity_start.tab
|   |-- Exome_28_periodicity_stop.tab
|   |-- Exome_28_reads.stats
|   `-- Exome_28_reads.tab
|-- Sample1_29
|   |-- Exome_29_periodicity_all.tab
|   |-- Exome_29_periodicity_start.tab
|   |-- Exome_29_periodicity_stop.tab
|   |-- Exome_29_reads.stats
|   `-- Exome_29_reads.tab
|-- Sample1_30
|   |-- Exome_30_periodicity_all.tab
|   |-- Exome_30_periodicity_start.tab
|   |-- Exome_30_periodicity_stop.tab
|   |-- Exome_30_reads.stats
|   `-- Exome_30_reads.tab
|-- Sample1_31
|   |-- Exome_31_periodicity_all.tab
|   |-- Exome_31_periodicity_start.tab
|   |-- Exome_31_periodicity_stop.tab
|   |-- Exome_31_reads.stats
|   `-- Exome_31_reads.tab
|-- Sample1_32
|   |-- Exome_32_periodicity_all.tab
|   |-- Exome_32_periodicity_start.tab
|   |-- Exome_32_periodicity_stop.tab
|   |-- Exome_32_reads.stats
|   `-- Exome_32_reads.tab
|-- Sample1_33
|   |-- Exome_33_periodicity_all.tab
|   |-- Exome_33_periodicity_start.tab
|   |-- Exome_33_periodicity_stop.tab
|   |-- Exome_33_reads.stats
|   `-- Exome_33_reads.tab
|-- Sample1_34
|   |-- Exome_34_periodicity_all.tab
|   |-- Exome_34_periodicity_start.tab
|   |-- Exome_34_periodicity_stop.tab
|   |-- Exome_34_reads.stats
|   `-- Exome_34_reads.tab
`-- Sample1_35
    |-- Exome_35_periodicity_all.tab
    |-- Exome_35_periodicity_start.tab
    |-- Exome_35_periodicity_stop.tab
    |-- Exome_35_reads.stats
    `-- Exome_35_reads.tab

Bam2Reads_Genome#

Contains the final outputs of the analysis (explained above).

Bam2Reads_Genome/Sample2/
|-- Sample2_25
|   `-- Genome_25_reads.tab
|-- Sample2_26
|   `-- Genome_26_reads.tab
|-- Sample2_27
|   `-- Genome_27_reads.tab
|-- Sample2_28
|   |-- Genome_28_periodicity_all.tab
|   |-- Genome_28_periodicity_start.tab
|   |-- Genome_28_periodicity_stop.tab
|   `-- Genome_28_reads.tab
|-- Sample2_29
|   `-- Genome_29_reads.tab
|-- Sample2_30
|   `-- Genome_30_reads.tab
|-- Sample2_31
|   `-- Genome_31_reads.tab
|-- Sample2_32
|   `-- Genome_32_reads.tab
|-- Sample2_33
|   `-- Genome_33_reads.tab
|-- Sample2_34
|   `-- Genome_34_reads.tab
`-- Sample2_35
    `-- Genome_35_reads.tab

Edited_Gff#

ORFribo requires a GFF annotation file of the reference sequence in a specific format. GFF file will be standardized using the Agat tool.

Edited_Gff/
`-- Named.CDS_Scer.gff

Exome#

Contains the results of the gff2prot script, which generates the exome as well as the .gtf and .gff files needed for subsequent steps.

Exome
|-- Exome_elongated.exons_Scer.fna
|-- Exome_elongated.exons_Scer.gtf
|-- Exome_elongated.gff
|-- Exome_elongated.nfasta
|-- Exome_elongated.nfasta.fai
`-- Exome_elongated_with_gene_features.gff

Exome_Fastq#

Contains the unmapped sequences from RiboSeq data on the exome (using STAR or Hisat2), after the trimming and/or filtering step.

Exome_Fastq/
|-- Sample1
|   `-- Sample1_Unmapped.fastq.gz
`-- Sample2
    `-- Sample2_Unmapped.fastq.gz

Genome_Fastq#

Contains the unmapped sequences from RiboSeq data on the Genome (using STAR or Hisat2), after the trimming and/or filtering step.

Genome_Fastq/
|-- Sample1
|   `-- Sample1_Unmapped.fastq.gz
`-- Sample2
    `-- Sample2_Unmapped.fastq.gz

Mapping#

Mapping/Exome: Contains the results of the alignment of Riboseq data on the exome. This directory includes a subdirectory for each alignment tool used (in case you choose STAR you'll have STAR folder instead of Hisat2), which themselves contain two subdirectories:

Index: Contains the exome index.
Results: Contains the alignment SAM file stored in a directory specific to each dataset.

Mapping/Genome: Contains the results of the alignment of Riboseq data on the genome. This directory includes a subdirectory for each alignment tool used, which themselves contain two subdirectories:

Index: Contains the genome index.
Results: Contains the alignment SAM file stored in a directory specific to each dataset.

Mapping/Mapping_Unwanted_Sequence_And_Filtering: Contains the alignment results when filtering out undesirable sequences (Generetead by Bowtie2)

Mapping/
|-- Exome
|   |-- Bowtie2
|   |   |-- Index
|   |   `-- Results
|   `-- Hisat2
|       |-- Index
|       `-- Results
`-- Genome
|    |-- Bowtie2
|    |   |-- Index
|    |   `-- Results
|    `-- Hisat2
|        |-- Index
|        `-- Results
|
|-- Mapping_Unwanted_Sequence_And_Filtering
|   |-- Index
|   `-- Results

Quality Control#

This directory contains the results of quality control using Multiqc. It includes summary report of all datasets in a single HTML file befor and after trimming.

Quality_control/
|-- After_Trimming
|   `-- multiqc_results
|       |-- multiqc_data
|       `-- multiqc_report.html
`-- Before_Trimming
    `-- multiqc_results
        |-- multiqc_data
        `-- multiqc_report.html

Ribowaltz#

RiboWaltz/
|-- Sample1
|   |-- Sample1.bam
|   |-- best_offset.txt
|   `-- psite_offset.csv
`-- Sample2
    |-- Sample2.bam
    |-- best_offset.txt
    `-- psite_offset.csv

SRRXXXXXX.bam: repository contains image files (tiff format) representing quality control plots for P-site offsets and periodicity for specific fragment lengths (e.g., 25, 26 nucleotides).
best_offset.txt: Contains the best offset value for the P-site, optimized for each fragment length. This file is used for downstream normalization.
psite_offset.csv: A CSV file summarizing the calculated offsets for different fragment lengths. It includes the following columns:
Length: Fragment length (e.g., 25, 26, 27 nucleotides).
Offset: Calculated optimal P-site offset.
Reads: Number of reads used to calculate the offset.

Selected_Length#

This directory contains subdirectories for each dataset. Each subdirectory includes a Selected_length.txt file listing the selected kmer sizes.

Selected_length/
|-- Sample1
|   `-- Selected_length.txt
`-- Sample2
    `-- Selected_length.txt

Trimming#

The Trimming/ directory contains the results of the trimming step performed on raw FASTQ files to remove adapter sequences and retain fragments of interest within a specific length range.

Trimming/Adapters: Contains a .txt file for each sample, listing the detected adapter sequences used during the trimming process.

Trimming/Trimmed_fastq: Contains the resulting FASTQ files after trimming, compressed in .gz format. The sequences are filtered to retain only those with lengths between 25 and 35 nucleotides (as specified during the trimming process).

Trimming/
|-- Adapters
|   |-- Sample1
|   |   `-- Sample1.txt
|   `-- Sample2
|       `-- Sample2.txt
`-- Trimmed_fastq
    |-- Sample1
    |   `-- Sample1.cutadapt.25-35.fastq.gz
    `-- Sample2
        `-- Sample2.cutadapt.25-35.fastq.gz

SUPPLEMENTARY_DATA#

This directory contains additional files to assist with debugging, benchmarking, and logging. Key contents include:
- Benchmarks: Performance statistics for various steps of the pipeline.
- Logs: Detailed logs for each step of the analysis.